### DESCRIPTION:
   Ask your file repository questions – use Azure OpenAI to scan candidates resumes and create embeddings saving it to an in memory vector store (faiss – open source vector store)
### REQUIREMENTS:
    Create an .env file with your OpenAI API key and save it in the root directory of this project.

  For more information about Faiss index, see:
      https://github.com/facebookresearch/faiss

In [1]:
from langchain.text_splitter import TokenTextSplitter
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS
from langchain.document_loaders import DirectoryLoader
from langchain.prompts import PromptTemplate
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain import PromptTemplate

import utils

In [2]:
# IMPORTANT!!
# you only need to run this once to create the embeddings, then you can use the faiss_index for retrieval
# you can use the already created faiss_index in the dbs folder - ./dbs/CV/faiss_index

# create embeddings from Employee's CV files in the /data/CV/ folder and save it to a FAISS index.
utils.init_OpenAI()
embeddings = utils.init_embeddings()
loader = DirectoryLoader('./data/CV/', glob="*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(documents=docs, embedding=embeddings)
db.save_local("./dbs/CV/faiss_index")

In [2]:
# initialize the llm and the vectorstore retriever
llm = utils.init_llm()
embeddings = utils.init_embeddings()

vectorStore = FAISS.load_local("./dbs/CV/faiss_index", embeddings)
retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":1})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

In [6]:
def ask_question(qa, question):
    template = """
        respond as succinctly as possible. {query}?
    """

    prompt = PromptTemplate(
        input_variables=["query"],
        template=template,
    )
    result = qa(prompt.format(query=question))
    print("Answer:", result["result"])

In [7]:
ask_question(qa,"Are there any solution architects?")

Answer: Yes, Christopher L. Hall is a solution architect.


In [8]:
ask_question(qa, "is any of them proficient in cloud technologies?")

Answer: There is no information provided on whether or not Alexis Mattos-Vabre is proficient in cloud technologies.


In [9]:
ask_question(qa, "do they know big data technologies?")

Answer: Yes, Mei Chen has 5 years of experience in big data technologies and is proficient in programming languages such as Python, Java, and Scala.
