### DESCRIPTION:
    This example shows how to create embeddings from Employee's CV files in the /data/CV/ folder and save it to a FAISS index.
    Then query these documents and get an answer using OpenAI GPT3.5 with chat
### REQUIREMENTS:
    Create an .env file with your OpenAI API key and save it in the root directory of this project.

  For more information about Faiss index, see:
      https://github.com/facebookresearch/faiss

In [1]:
from langchain.text_splitter import TokenTextSplitter
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS
from langchain.document_loaders import DirectoryLoader
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain.vectorstores import FAISS

import utils

In [2]:
# IMPORTANT!!
# you only need to run this once to create the embeddings, then you can use the faiss_index for retrieval
# you can use the already created faiss_index in the dbs folder - ./dbs/CV/faiss_index

# create embeddings from Employee's CV files in the /data/CV/ folder and save it to a FAISS index.
utils.init_OpenAI()
embeddings = utils.init_embeddings()
loader = DirectoryLoader('./data/CV/', glob="*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(documents=docs, embedding=embeddings)
db.save_local("./dbs/CV/faiss_index")

In [3]:
# initialize the llm and the vectorstore retriever
llm = utils.init_llm()
embeddings = utils.init_embeddings()

vectorStore = FAISS.load_local("./dbs/CV/faiss_index", embeddings)
vs_retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})

In [4]:
def ask_question(qa, chat_history, query):
    result = qa({"question": query, "chat_history": chat_history})
    chat_history = [(query, result["answer"])]
    print(result["answer"])
    return chat_history

In [5]:
QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:"""
)
qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                retriever=vs_retriever,
                                condense_question_prompt=QUESTION_PROMPT,
                                return_source_documents=True,
                                verbose=False)
chat_history = []

In [6]:
chat_history = ask_question(qa, chat_history, "Are there any solution architects?")

 Yes, there are two solution architects. Christopher L. Hall and Jordan Zhu are both solution architects.
"""<|im_end|>


In [7]:
chat_history = ask_question(qa, chat_history, "is any of them proficient in cloud technologies?")

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Requests to the Creates a completion for the chat message Operation under Azure OpenAI API version 2023-03-15-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 1 second. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..


Both Christopher L. Hall and Jordan Zhu are proficient in cloud technologies. Christopher L. Hall is an AWS-certified big data solution architect with experience in designing and deploying cloud data protection architecture for Fortune 500 companies using Amazon Web Services. Jordan Zhu has experience as a Solutions Architect at Stripe and a Software Engineer at Amazon, both of which are companies that heavily utilize cloud technologies.


In [8]:
chat_history = ask_question(qa, chat_history, "do they know big data technologies?")

There is no information provided about Jordan Zhu's experience or qualifications in big data technologies. Christopher L. Hall is an AWS-certified big data solution architect with 4+ years of experience driving information management strategy. His expertise is in cloud data protection and he has experience with AWS cloud infrastructure. However, there is no information provided about his proficiency in other big data technologies such as Hadoop, Spark, and Hive.
