### DESCRIPTION:
    This example shows how to create embeddings from "Moby-Dick; or The Whale" - a novel by Herman Melville in txt format and saves it in the /dbs/books/ folder to a FAISS index.
    Then query these documents and get an answer using OpenAI GPT3.5 with chat
### REQUIREMENTS:
    Create an .env file with your OpenAI API key and save it in the root directory of this project.

  For more information about Faiss index, see:
      https://github.com/facebookresearch/faiss

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain

import utils
from dotenv import load_dotenv

In [None]:
# IMPORTANT!!
# you only need to run this once to create the embeddings, then you can use the faiss_index for retrieval
# you can use the already created faiss_index in the dbs folder - ./dbs/documentation/faiss_index

# create embeddings from Azure Documentation files in PDF format in the /data/documentation/ folder and save it to a FAISS index.
utils.init_OpenAI()
embeddings = utils.init_embeddings()

dataPath = "./data/books/"

# you can add as many txt files as you want to the files list
files = ["moby dick - the whale.txt"]
for file_name in files:
    fileName = dataPath + file_name
    loader = TextLoader(fileName, encoding="utf-8")
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, separators= ["\n\n", "\n", ".", ";", ",", " ", ""])
    texts = text_splitter.split_documents(documents)
    db = FAISS.from_documents(documents=texts, embedding=embeddings)
    db.save_local("./dbs/books/faiss_index")

In [2]:

llm = utils.init_llm()
embeddings = utils.init_embeddings()
vectorStore = FAISS.load_local("./dbs/books/faiss_index", embeddings)
retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=False)

In [8]:
qa({"query": "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?"})

{'query': "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?",
 'result': "There is no indication in the provided context that the coffin prepared for Queequeg becomes Ishmael's life buoy once the Pequod sinks. The life buoy mentioned is described as a long slender cask that was dropped from the stern, and it eventually sinks to the bottom of the sea."}

In [9]:
qa({"query": "Why does Ahab pursue Moby Dick so single-mindedly?"})

{'query': 'Why does Ahab pursue Moby Dick so single-mindedly?',
 'result': "Ahab pursues Moby Dick so single-mindedly because he seeks revenge against the whale for taking his leg in a previous encounter. Additionally, it is suggested that Ahab's vindictiveness may have extended to all sperm whales, and that he identified with Moby Dick as the embodiment of all his woes and frustrations."}

In [10]:
qa({"query": "Why does the novel's narrator begin his story with 'Call me Ishmael'?"})

{'query': "Why does the novel's narrator begin his story with 'Call me Ishmael'?",
 'result': "The novel's narrator begins his story with 'Call me Ishmael' because he wants to introduce himself to the readers and create a personal connection with them. He wants to establish his identity and let the readers know that they will be hearing his story from his perspective."}