### DESCRIPTION:
    This example shows how to create embeddings from urls and saves it in a faiss DB in the /dbs/urls/ folder
    Then query these faiss index and get an answer using OpenAI GPT3.5
### REQUIREMENTS:
    Create an .env file with your OpenAI API key and save it in the root directory of this project.

  For more information about Faiss index, see:
      https://github.com/facebookresearch/faiss

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain

import utils
from dotenv import load_dotenv

In [None]:
# IMPORTANT!!
# you only need to run this once to create the embeddings, then you can use the faiss_index for retrieval
# you can use the already created faiss_index in the dbs folder - ./dbs/urls/faiss_index

# create embeddings from urls and save it to a FAISS index.
utils.init_OpenAI()
embeddings = utils.init_embeddings()

# you can add as many urls as you want, but for this example we will only use one
# moby dick book online 
urls = ["https://www.gutenberg.org/files/2701/2701-0.txt"]

loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, separators= ["\n\n", "\n", ".", ";", ",", " ", ""])
texts = text_splitter.split_documents(documents)
db = FAISS.from_documents(documents=texts, embedding=embeddings)
db.save_local("./dbs/urls/faiss_index")

In [2]:

llm = utils.init_llm()
embeddings = utils.init_embeddings()
vectorStore = FAISS.load_local("./dbs/urls/faiss_index", embeddings)
retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=False)

In [3]:
qa({"query": "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?"})

{'query': "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?",
 'result': "There is no direct connection between Queequeg's coffin and Ishmael's life buoy. The life-buoy that Ishmael clings to is described as a long slender cask that had been hanging from the stern of the ship, but it had filled with water and sank with the sailor who was trying to grab it. Queequeg's coffin, on the other hand, was prepared for him by his own request and contained his personal belongings and supplies for his afterlife."}

In [4]:
qa({"query": "Why does Ahab pursue Moby Dick so single-mindedly?"})

{'query': 'Why does Ahab pursue Moby Dick so single-mindedly?',
 'result': 'Ahab is consumed with the hot fire of his purpose, and in all his thoughts and actions ever had in view the ultimate capture of Moby Dick. It is suggested that his vindictiveness towards the White Whale might have possibly extended itself in some degree to all sperm whales, and that the more monsters he slew by so much the more he multiplied the chances that each subsequently encountered whale would prove to be the hated one he hunted.'}

In [5]:
qa({"query": "Why does the novel's narrator begin his story with 'Call me Ishmael'?"})

{'query': "Why does the novel's narrator begin his story with 'Call me Ishmael'?",
 'result': 'The novel\'s narrator begins his story with "Call me Ishmael" to introduce himself to the reader by using a pseudonym.'}