**Importing all the libraries used in the Problem**

In [None]:
!pip install faiss-cpu

In [None]:
!pip install tensorflow_text

In [None]:
!pip install tiktoken

In [None]:
!pip install beautifulsoup4

In [None]:
!pip install sentence_transformers

In [None]:
!pip install --upgrade langchain

In [None]:
!pip install anthropic

In [None]:
!pip install unstructured

In [55]:
import os
import getpass
os.environ['ANTHROPIC_API_KEY'] = getpass.getpass('Anthropic API Key:')

Anthropic API Key:··········


In [56]:
import requests
from bs4 import BeautifulSoup
import faiss
import anthropic
import sentence_transformers
import langchain
import numpy as np
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

Scraping Articles from URLs and conerting them to Pages

In [58]:
def extract_text_from(url):
    html = requests.get(url).text
    soup = BeautifulSoup(html, features="html.parser")
    text = soup.find("div", {"id": "mw-content-text"}).text.strip()
    lines = (line.strip() for line in text.splitlines())
    return ' '.join(lines)

In [59]:
url = "https://simple.wikipedia.org/wiki/List_of_national_capitals"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find("table", {"class": "wikitable"})

capitals = []

for row in table.find_all("tr")[1:]:
    cells = row.find_all("td")
    capital = cells[0].text.strip()
    link = cells[0].find("a")["href"]
    capitals.append((capital, link))

pages = []
for capital, link in capitals:
    url = f"https://simple.wikipedia.org{link}"
    pages.append({'text': extract_text_from(url), 'source': url})

Splitting the exracted pages into Chunks called Docs and Embedding them using Hugging Face Embeddings function.
The embedding is stored in a vector called db.

In [None]:
db = {}
hf = HuggingFaceEmbeddings()
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap  = 50, separator="\n")
docs, metadatas = [], []
for page in pages:
    splits = text_splitter.split_text(page['text'])
    docs.extend(splits)
    metadatas.extend([{"source": page['source']}] * len(splits))
    print(f"Split {page['source']} into {len(splits)} chunks")

embeddings = hf
db = FAISS.from_texts(docs, embeddings, metadatas=metadatas)

TEST 1

In [62]:
query = "What is the average rainfall per year in Abu Dhabi?"
docs = db.similarity_search(query)
print(docs[0].page_content)

View of Abu Dhabi Satellite image of Abu Dhabi (March 2003)  This article is about the city. For the emirate, see Abu Dhabi (emirate). Abu Dhabi (Arabic: أبو ظبي, ʼAbū Ẓaby) is the capital city of the United Arab Emirates. It is in the emirate of Abu Dhabi. Abu Dhabi is one of the seven emirates which form the United Arab Emirates(UAE). The city is on a T-shaped island going into the Persian Gulf from the central western coast.  The city is 972 km2 in size.  The city had a population of 1.45 million people in 2022. Abu Dhabi is also the capital of UAE and is the largest emirates in UAE.[1]   History[change | change source] People started to live in the area and call it Abu Dhabi about 300 years ago.  In the 1970s, the Bani Yas tribe made Abu Dhabi their capital city.  Shakhbut bin-Dhiyab Al Nahyan became the leader of the city in 1818.  People found oil in 1958 in Abu Dhabi.  They started to sell the oil in the 1960s.  In 1971 December the 2nd, Abu Dhabi joined the United Arab Emirates

Using the retriever funciton to fetch relevant document form db. 
NOTE: Instead of retriever, similarity_search() can also be used.
Using LLM model form Anthopic to answer the given prompt. 

In [64]:
def answer_query(question, db):
    retriever = db.as_retriever()
    summaries = retriever.get_relevant_documents(question)
    prompt = f"{anthropic.HUMAN_PROMPT} Answer the given question, {question}, using {summaries} only {anthropic.AI_PROMPT}"
    c = anthropic.Client(os.environ["ANTHROPIC_API_KEY"])
    response = c.completion(
        prompt = prompt, 
        model = "claude-v1", 
        max_tokens_to_sample = 100,
        top_k = 1,
        temperature = 0.5,
        top_p = 1.0,
    )
    return response

TEST 2

In [65]:
question = "What is the capital of France?"
retriever = db.as_retriever()
summaries = retriever.get_relevant_documents(question)
print(summaries[0].page_content)
print(len(summaries))

For the character in mythology, see Paris (mythology). ParisCommune and departmentClockwise from top: skyline of Paris on the Seine with the Eiffel Tower, Notre-Dame de Paris, the Louvre and its large pyramid, and the Arc de Triomphe FlagCoat of armsMotto(s): Fluctuat nec mergitur "Tossed by the waves but never sunk"ParisLocation within FranceShow map of FranceParisLocation within EuropeShow map of EuropeParisParis (Earth)Show map of EarthCoordinates: 48°51′24″N 2°21′03″E﻿ / ﻿48.8567°N 2.3508°E﻿ / 48.8567; 2.3508Coordinates: 48°51′24″N 2°21′03″E﻿ / ﻿48.8567°N 2.3508°E﻿ / 48.8567; 2.3508CountryFranceRegionÎle-de-FranceDepartmentParisSubdivisions20 arrondissementsGovernment • MayorAnne Hidalgo (PS)Area • Commune and department105.4 km2 (40.7 sq mi)Population (January 1, 2019 (est))[1] • Commune and department2,140,526 • Density20,000/km2 (53,000/sq mi) • Metro[2]12,532,901Demonym(s)ParisianTime zoneUTC+1 (CET) • Summer (DST)UTC+2 (CEST)INSEE/postal code75001–75020, 75116Websitewww.paris.

Main Function to Run the query multiple times.

In [66]:
def main():
    while True:
        question = input("Enter your question (or 'q' to quit): ")
        if question.lower() == "q":
            break

        result = answer_query(question, db)
        print(result["completion"])

if __name__ == "__main__":
    main()

Enter your question (or 'q' to quit): What is the capital of Holland?
 Amsterdam
Enter your question (or 'q' to quit): What is the average rainfall in Abu Dhabi?
 The average annual rainfall in Abu Dhabi is 51 cm or 20 inches.
Enter your question (or 'q' to quit): What is the elevation of Baku?
 The elevation of Baku is -28 meters or -92 feet.
Enter your question (or 'q' to quit): q
