# In this lab, you will understand how to use vector stores and RAG

**STORING DOCUMENTS IN CHROMA**

In [None]:
pip install langchain-huggingface  langchain_core langchain_openai langchain_chroma langchain_community faiss-cpu pypdf gradio langsmith

If you are using Jupyter notebook, follow the below instructions. Else skip this step and go to next step

**Open .env file in this folder and observe that we have configured OPENAI_API_KEY. Replace it with your own key or key given by me**

The Code in the below cell will load the .env file and set environment variables.

**Write the code in the below cell and execute it**

In [1]:

from dotenv import load_dotenv

load_dotenv()

True

**Creating Documents manually**

 **Document**  represents a unit of text and associated metadata.

 It has two attributes:

**pagecontent** string representing the content

**metadata** a dict containing arbitrary metadata.

In [12]:
from langchain_core.documents import Document



documents = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata={"year": 1995, "genre": "animated"},
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata={
            "year": 1979,
            "director": "Andrei Tarkovsky",
            "genre": "thriller",
            "rating": 9.9,
        },
    ),
]


# **Creating  a vector store(Chroma) and Persisting Documents using OpenAIEmbeddings**

**Chroma is a AI-native open-source vector database**

Chroma runs in various modes:

**in-memory**

**in-memory with persistance**  - in a script or notebook and save/load to disk

**in a docker container** -- as a server running your local machine or in the cloud.

Below code creates a vectorstore from the given documents and persists data in given directory.

Execute the below code and observe that a directory with name chromadb1 is created

In [13]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embeddings = OpenAIEmbeddings()

# embeddings= HuggingFaceEmbeddings()

vectorstore = Chroma.from_documents(
    documents, embedding=embeddings , persist_directory="chromadb4"
)

**Let us understand how to create a vector store using the persist_directory**

In [14]:
vectorstore = Chroma(
    persist_directory="chromadb4", embedding_function=embeddings
)

**Using Vector Store as a retriever**

LangChain VectorStore objects do not subclass Runnable
LangChain Retrievers are Runnables. So, we can use them in chains in LCEL

Vectorstores implement an **as_retriever** method that will generate a Retriever, specifically a **VectorStoreRetriever**

Below code shows how to get a retriever from vectorstore object
see the documentation at (https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.chroma.Chroma.html#langchain_community.vectorstores.chroma.Chroma.as_retriever)

In [15]:
retriever = vectorstore.as_retriever(
    search_type="similarity",  # mmr or similarity_score_threshold
    search_kwargs={"k": 1},
)

retriever.invoke("tell me about movie directed by Satoshi Kon")

#result4 = retriever.batch(["Scientist","toys"])
#print(result4)

[Document(id='1591b4fe-6d73-419f-b1e1-08c3e0ff6ae3', metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]

## Now let us see how to load a PDF into vector store and persist it.  This time we will use FAISS

 We will be using PyPdfLoader to load documents from pdf.

 There many document loaders available. See (https://api.python.langchain.com/en/latest/community_api_reference.html#module-langchain_community.document_loaders)

<font color="red"> **Before executing below code, travel.pdf  given to u**




In [16]:

from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter


#loader = PyPDFLoader(file_path="drive/MyDrive/Colab Notebooks/ds-book.pdf")
loader = PyPDFLoader(file_path="travel-policy.pdf")
documents = loader.load()


In [17]:
documents[1]

Document(metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 1, 'page_label': '2'}, page_content='○ Taxis:Reimbursableonlywithpre-approvalforspecificcircumstances.●● Rentalcars:○ Economyorcompactclassisrequired.○ Mileageallowanceof$0.50permile.○ Gasandtollsarereimbursablebasedonreceipts.●\nAccommodation:\n● Company-approvedhotels:Selectfromapre-selectedlistofhotelswithinthefollowingbudgetguidelines:○ Majorcities:$150-200pernight○ Smallercities/towns:$100-150pernight●● Sharedaccommodations:Encouragesharedaccommodationsformulti-persontripstomaximizecostsavings.● Mealperdiems:Followestablishedperdiemratesformealsinsteadofitemizedexpenseclaims:○ Breakfast:$15○ Lunch:$20○ Dinner:$30●\nOtherExpenses:\n● Businessmeals:○ Onlyreimbursablewhenaccompaniedbyaclientorcolleague.○ Maximum$50perpersonpermeal.○ Alcoholicbeveragesnotreimbursable.●')

**After all documents are loaded, we have to split them into chunks.**

**Observe the below code and understand how we are splitting documents in to chunks**

In [18]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50, separator="\n")
docs = text_splitter.split_documents(documents=documents)

Created a chunk of size 667, which is longer than the specified 500


**Observe the below code and understand how we are using **FAISS** to create vector store and save the documents**

**Execute the below code in a cell**

In [21]:
from langchain_community.vectorstores import FAISS
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embeddings = OpenAIEmbeddings()

# embeddings= HuggingFaceEmbeddings()

vectorstore = FAISS.from_documents(docs, embeddings)
vectorstore.save_local("faiss_store1")

**Creating a Retrieval Chain using Vector Store as retriever**

see the below code and understand how we are creating a retrieval chain using vector store as retirver

Execute the below code to  create a chain

In [22]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents.stuff import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

vectorstore = FAISS.load_local(
    "faiss_store1", embeddings=embeddings, allow_dangerous_deserialization=True
)

retriever = vectorstore.as_retriever(
    search_type="similarity",  # mmr or similarity_score_threshold
    search_kwargs={"k": 3},
)
message = """
        Answer this question using the provided context only.
        If the information is not available in the context, just reply with "i dont know"
        {input}
        Context:
        {context}
        """
prompt = ChatPromptTemplate.from_messages(
    [("human", message)],
)
llm = ChatOpenAI()
question_answer_chain = create_stuff_documents_chain(llm, prompt)
#print(question_answer_chain)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
print(rag_chain)

# rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm
# rag_chain

bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000024479B9F150>, search_kwargs={'k': 3}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\n        Answer this question using the provided context only.\n        If the information is not available in the context, just reply with 

**<font color="red"> Understand about create_map_reduce_documents_chain,create_refine_documents_chain,create_map_rerank_documents_chain.  Better u chat gpt to get summary of usecases,usage,etc**

**Invoking Retrieval chain**

Execute the below code to invoke the chain

In [23]:
response = rag_chain.invoke({"input": "tell me about all the reimbursement policies"})
print(response)
print(response['answer'])
for  doc in response["context"]:
    print(doc.page_content)

{'input': 'tell me about all the reimbursement policies', 'context': [Document(id='0ea570be-9746-4bc3-a26e-8e7c81b15556', metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 1, 'page_label': '2'}, page_content='○ Taxis:Reimbursableonlywithpre-approvalforspecificcircumstances.●● Rentalcars:○ Economyorcompactclassisrequired.○ Mileageallowanceof$0.50permile.○ Gasandtollsarereimbursablebasedonreceipts.●\nAccommodation:'), Document(id='a4097bce-2ec2-485d-a0dc-23ac0d665b38', metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 2, 'page_label': '3'}, page_content='ReportingandReimbursement:\n● Submitexpensereportspromptly:Submitdetailedexpensereportswithin5businessdaysafterthetripthroughthedesignatedcompanysystem.● M

# BUILDING AN APP SIMILAR TO CHATPDF

We want to build an app similar to chatpdf.
We need a function which will take a file, and store all the documents into vectorstore

Understand the below function and execute it

In [31]:
import tempfile
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

def load_pdf_into_vectorstore(file: tempfile) -> str:
    try:
        print("======Loading file==================")
        file_path = file.name
        loader = PyPDFLoader(file_path=file_path)
        documents = loader.load()
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30, separator="\n")
        docs = text_splitter.split_documents(documents=documents)
        embeddings = OpenAIEmbeddings()

        #vectorstore = FAISS.from_documents(docs, embeddings)


        vectorstore = Chroma.from_documents(
            documents, embedding=embeddings , persist_directory="chromadb11"
        )
       # vectorstore.save_local("pdf_store")

        print("======File Loaded================== ")

        return 'Document uploaded and index created successfully. You can chat now.'
    except Exception as e:
        print(e)
        return e

**Now let us build an app using gradio which allows to upload pdf and chat**

in the below code, we are defining a function which will be invoked when user sends a message to chatbot.

Understand the code in the below cell and execute it

In [32]:
import gradio as gr
from langchain import OpenAI, PromptTemplate
from langchain.document_loaders import PyPDFLoader

from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage


model = ChatOpenAI()


def getresponse(query, history:list) -> tuple:

   vectorstore = Chroma(
      persist_directory="chromadb11", embedding_function=embeddings
  )

 #  vectorstore = FAISS.load_local("pdf_store", embeddings= OpenAIEmbeddings(), allow_dangerous_deserialization=True)

   message = """
    Answer this question using the provided context . If information is not available in the context,
      Just respond saying "I dont know"
    {input}
    Context:
    {context}
    """

   prompt = ChatPromptTemplate.from_messages([("human", message)])
   llm = ChatOpenAI()


   question_answer_chain = create_stuff_documents_chain(llm, prompt)
#print(question_answer_chain)
   rag_chain = create_retrieval_chain(vectorstore.as_retriever(), question_answer_chain)

   # rag_chain = (
   #    {"context": vectorstore.as_retriever(), "question": RunnablePassthrough()}
   #    | prompt
   #    | llm
   # )



   response1= rag_chain.invoke({"input":query})
   print(response1)
   history.append((query, response1['answer']))
   return "",history


**Understand and execute the below code.**

**Once u go to the given url , upload a file and send queries and chat**

In [33]:
with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            file = gr.components.File(
                label='Upload your pdf file',
                file_count='single',
                file_types=['.pdf'])
            #with gr.Row():
            upload = gr.components.Button(
                    value='Upload', variant='primary')

        label = gr.components.Textbox()
    chatbot = gr.Chatbot(label='Talk to the Document')


    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])
    vectorStore =None

    upload.click(load_pdf_into_vectorstore,[file],[label])

    msg.submit(getresponse, [msg,chatbot], [msg, chatbot])

if __name__ == '__main__':
    demo.launch(debug=True)

  chatbot = gr.Chatbot(label='Talk to the Document')


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


{'input': 'What is the document about?', 'context': [Document(id='d8da7a4e-f897-49f3-9cb0-d69c9211bbfe', metadata={'creationdate': '2025-02-01T15:54:45+00:00', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36', 'moddate': '2025-02-01T15:54:45+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m132', 'source': 'C:\\Users\\Dell\\AppData\\Local\\Temp\\gradio\\deb778c6fac087ced8772ada6cc0d10e00676c1ca1cfbb66aa49144d5c93e9d4\\Admit Card-AEEE.pdf', 'title': 'Admit Card', 'total_pages': 1}, page_content='AMRITA ENTRANCE EXAMINATION ENGINEERING 2025 : Phase I\nAdmit Card\n3302017194\nRegistration No. : 3302017194\nPassword : ALE5NK\nCandidate Name : Burlagadda Sai Yeshwanth\nGender : Male\nDate of Birth : 29 Aug 2007\nCandidate Should Produce the Admit card (Original)\nduly signed by the candidate and invigilator at the time of admission.\nI have read the instruction given under and agree to abide by them.\nExa

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


{'input': 'List one instruction', 'context': [Document(id='d8da7a4e-f897-49f3-9cb0-d69c9211bbfe', metadata={'creationdate': '2025-02-01T15:54:45+00:00', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36', 'moddate': '2025-02-01T15:54:45+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m132', 'source': 'C:\\Users\\Dell\\AppData\\Local\\Temp\\gradio\\deb778c6fac087ced8772ada6cc0d10e00676c1ca1cfbb66aa49144d5c93e9d4\\Admit Card-AEEE.pdf', 'title': 'Admit Card', 'total_pages': 1}, page_content='AMRITA ENTRANCE EXAMINATION ENGINEERING 2025 : Phase I\nAdmit Card\n3302017194\nRegistration No. : 3302017194\nPassword : ALE5NK\nCandidate Name : Burlagadda Sai Yeshwanth\nGender : Male\nDate of Birth : 29 Aug 2007\nCandidate Should Produce the Admit card (Original)\nduly signed by the candidate and invigilator at the time of admission.\nI have read the instruction given under and agree to abide by them.\nExam\nDate

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


{'input': 'give the details of the candidate?', 'context': [Document(id='d8da7a4e-f897-49f3-9cb0-d69c9211bbfe', metadata={'creationdate': '2025-02-01T15:54:45+00:00', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36', 'moddate': '2025-02-01T15:54:45+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m132', 'source': 'C:\\Users\\Dell\\AppData\\Local\\Temp\\gradio\\deb778c6fac087ced8772ada6cc0d10e00676c1ca1cfbb66aa49144d5c93e9d4\\Admit Card-AEEE.pdf', 'title': 'Admit Card', 'total_pages': 1}, page_content='AMRITA ENTRANCE EXAMINATION ENGINEERING 2025 : Phase I\nAdmit Card\n3302017194\nRegistration No. : 3302017194\nPassword : ALE5NK\nCandidate Name : Burlagadda Sai Yeshwanth\nGender : Male\nDate of Birth : 29 Aug 2007\nCandidate Should Produce the Admit card (Original)\nduly signed by the candidate and invigilator at the time of admission.\nI have read the instruction given under and agree to abide by the

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


{'input': 'who is the prime minister of india?', 'context': [Document(id='d8da7a4e-f897-49f3-9cb0-d69c9211bbfe', metadata={'creationdate': '2025-02-01T15:54:45+00:00', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36', 'moddate': '2025-02-01T15:54:45+00:00', 'page': 0, 'page_label': '1', 'producer': 'Skia/PDF m132', 'source': 'C:\\Users\\Dell\\AppData\\Local\\Temp\\gradio\\deb778c6fac087ced8772ada6cc0d10e00676c1ca1cfbb66aa49144d5c93e9d4\\Admit Card-AEEE.pdf', 'title': 'Admit Card', 'total_pages': 1}, page_content='AMRITA ENTRANCE EXAMINATION ENGINEERING 2025 : Phase I\nAdmit Card\n3302017194\nRegistration No. : 3302017194\nPassword : ALE5NK\nCandidate Name : Burlagadda Sai Yeshwanth\nGender : Male\nDate of Birth : 29 Aug 2007\nCandidate Should Produce the Admit card (Original)\nduly signed by the candidate and invigilator at the time of admission.\nI have read the instruction given under and agree to abide by th

**Let us try to understand how to retrieve data from a given web url**

In [35]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load blog post
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# VectorDB
embedding = HuggingFaceEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
vectordb

<langchain_chroma.vectorstores.Chroma at 0x2441da3f7d0>

**Now Let us try to Understand MultiQueryRetriever**

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents.

In [37]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

question = "What are the approaches to Task Decomposition?"
llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(), llm=llm
)
retriever_from_llm.invoke(question)

[Document(id='d7d52fd5-1728-4c3c-8579-ff45708a758b', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nShort-term memory: I would consider all the in-context 

In [None]:
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [43]:
unique_docs = retriever_from_llm.invoke(question)
unique_docs

[Document(id='d7d52fd5-1728-4c3c-8579-ff45708a758b', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nShort-term memory: I would consider all the in-context 

**(Optional)See how to build  conversational rag at https://python.langchain.com/docs/how_to/qa_chat_history_how_to/**

## Conversational-RAG

In [44]:
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

In [45]:
from dotenv import load_dotenv
load_dotenv()

True

In [46]:
llm = ChatOpenAI()

In [47]:
loader = PyPDFLoader("travel-policy.pdf")
documents = loader.load()
documents

[Document(metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='TravelExpenseGuidelines\nBooking:\n● Usepreferredvendors:Bookflights,hotels,andcarrentalsthroughcompany-approvedvendorstoensurecompetitiveratesandcompliancewithtravelpolicies.(e.g.,Expedia,Travelocity, Concur)● Bookinadvance:Planandbooktripsatleasttwoweeksinadvancefordomestictravelandfourweeksforinternationaltraveltosecurebetterratesandavoidlast-minutecosts.● Seekpre-approval:Obtainapprovalforallbusinesstripsbeforebooking,especiallyforinternationaltravelorexceedingbudgetlimits.Thiscanbedonethroughadepartmentheadoradedicatedtravelcoordinator.● Maintaindocumentation:Keepalloriginalreceiptsandinvoicesfortravelexpensesforreimbursementpurposes.Scananduploadthemtothedesignatedexpensereportingsystem.\nTransportation:\n● Airfare:○ Domesticflights:Economyclas

In [48]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50, separator="\n")
splits = text_splitter.split_documents(documents=documents)

Created a chunk of size 667, which is longer than the specified 500


In [49]:
vectorstore = FAISS.from_documents(splits, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [74]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.chains.combine_documents.stuff import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import MessagesPlaceholder

message = """
Answer this question using the provided context only. If you don't know the answer, just say 'I don't know'.
{input}
Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder("chat_history"),
    ("human",message)
])
llm = ChatOpenAI(model='gpt-4o-mini')

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

chat_history = []
input = """Give me the guidelines to be followed in travel expenses. Give as bulletin points"""
response = rag_chain.invoke({"input": input, "chat_history": chat_history})

response["answer"]

'- Submit expense reports promptly within 5 business days after the trip through the designated company system.\n- Maintain original receipts for all claimed expenses.\n- Be aware of non-reimbursable expenses, such as:\n  - Personal purchases\n  - Late fees\n  - Upgrades not pre-approved\n  - Mini-bar charges\n  - Room service fees\n- Encourage environmentally responsible travel choices, such as using public transportation, opting for eco-friendly hotels, and minimizing paper receipts.\n- Use preferred vendors to book flights, hotels, and car rentals to ensure competitive rates and compliance with travel policies (e.g., Expedia, Travelocity, Concur).\n- Book trips at least two weeks in advance for domestic travel and four weeks for international travel to secure better rates and avoid last-minute costs.\n- Obtain approval for all business trips before booking, especially for international travel or exceeding budget limits, through a department head or dedicated travel coordinator.\n- K

In [57]:
from langchain_core.messages import AIMessage,HumanMessage
chat_history = []
chat_history.extend(
    [
        HumanMessage(content=input),
        AIMessage(content=response["answer"])
    ]
)
input = """Give me the guidelines to be followed in travel expenses. Give as bulletin points"""
response = rag_chain.invoke({"input": input, "chat_history": chat_history})

print(response["answer"])

- Submit expense reports promptly: Submit detailed expense reports within 5 business days after the trip through the designated company system.
- Maintain original receipts: Provide original receipts for all claimed expenses.
- Be aware of non-reimbursable expenses: These include personal purchases, late fees, upgrades not pre-approved, mini-bar charges, and room service fees.
- Encourage environmentally responsible travel choices: Use public transportation, opt for eco-friendly hotels, and minimize paper receipts.
- Use preferred vendors: Book flights, hotels, and car rentals through company-approved vendors to ensure competitive rates and compliance with travel policies.
- Book in advance: Plan and book trips at least two weeks in advance for domestic travel and four weeks for international travel to secure better rates and avoid last-minute costs.
- Seek pre-approval: Obtain approval for all business trips before booking, especially for international travel or exceeding budget limit

In [58]:
from langchain_core.prompts import MessagesPlaceholder
from langchain.chains import create_history_aware_retriever

contextualie_system_prompt = """Given above chat history and the below latest user question
which might reference context in the chat history,formulate a standalone question which can be 
understood without the chat history. DO NOT answer the question,just reformulate it if needed 
and otherwise it as is. 
Below is the latest question:
{input}
"""

contextualize_prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder("chat_history"),
    ("human",f"""{contextualie_system_prompt}""")])

history_aware_retriever = create_history_aware_retriever(llm,retriever,contextualize_prompt)

In [61]:
result = history_aware_retriever.invoke({"input":"Can you explain the 1st point","chat_history":chat_history})
result

[Document(id='ef436e93-a8fe-4e78-bef1-ad28f7399283', metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='TravelExpenseGuidelines\nBooking:'),
 Document(id='2f92daa6-4ae5-4662-8902-ae626db14889', metadata={'producer': 'Skia/PDF m121 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Corporate Travel Policy', 'source': 'travel-policy.pdf', 'total_pages': 4, 'page': 2, 'page_label': '3'}, page_content='ReportingandReimbursement:\n● Submitexpensereportspromptly:Submitdetailedexpensereportswithin5businessdaysafterthetripthroughthedesignatedcompanysystem.● Maintainoriginalreceipts:Provideoriginalreceiptsforallclaimedexpenses.● Non-reimbursableexpenses:Beawareofexpensesnotcoveredbythetravelpolicy, suchas:○ Personalpurchases○ Latefees○ Upgradesnotpre-approved○ Mini-barcharges○ Roomservicefees●\nAd

In [69]:
system_prompt = (
    """You are an assistant for question-answering tasks.
    Answer this question using the provided context only. If you don't know the answer, just say I don't know
    \n\n
    "content":{context}"""
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system",system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human","{input}")
    ]
)

question_answer_chain = create_stuff_documents_chain(llm,qa_prompt)
contextual_rag_chain = create_retrieval_chain(history_aware_retriever,question_answer_chain)

In [70]:
from langchain_core.messages import AIMessage,HumanMessage

chat_history = []
question = """Give me the travel policy norms. I need those in bullet points"""
print(question)
ai_message_1 = contextual_rag_chain.invoke({"input":question,"chat_history":chat_history})
print(ai_message_1["answer"])

Give me the travel policy norms. I need those in bullet points
- Use preferred vendors for booking flights, hotels, and car rentals to ensure competitive rates and compliance with travel policies (e.g., Expedia, Travelocity, Concur).
- Book trips at least two weeks in advance for domestic travel and four weeks for international travel to secure better rates and avoid last-minute costs.
- Obtain pre-approval for all business trips before booking, especially for international travel or exceeding budget limits, through a department head or a dedicated travel coordinator.
- Keep all original receipts and invoices for travel expenses for reimbursement purposes. Scan and upload them to the designated expense reporting system.
- Encourage environmentally responsible travel choices, such as using public transportation, opting for eco-friendly hotels, and minimizing paper receipts.
- Domestic flights: Economy class is required. Tickets exceeding $500 require pre-approval from a department head.

In [71]:
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_message_1["answer"])
    ]
)

second_question = """Explain in elaborate about the 4th point"""
print(second_question)
ai_message_2 = contextual_rag_chain.invoke({"input":second_question,"chat_history":chat_history})
print(ai_message_2["answer"])

chat_history.extend(
    [HumanMessage(content=second_question), AIMessage(content=ai_message_2["answer"])]
)

Explain in elaborate about the 4th point
The 4th point emphasizes the importance of maintaining documentation for travel expenses in order to facilitate reimbursement. Here are the key aspects of this guideline:

1. **Original Receipts**: Employees are required to keep all original receipts related to their travel expenses. This is crucial because receipts serve as proof of the expenses incurred during the trip. Original receipts help validate claims made in expense reports and are necessary for verifying that the expenses conform to the company's travel policies.

2. **Invoices**: In addition to receipts, any invoices that may pertain to travel-related expenses should also be retained. This might include invoices for accommodations, conference fees, or other business-related costs incurred while traveling.

3. **Reimbursement Purposes**: The documentation of expenses is specifically aimed at ensuring that employees can be reimbursed for the costs they incurred during business travel. 

In [72]:
chat_history

[HumanMessage(content='Give me the travel policy norms. I need those in bullet points', additional_kwargs={}, response_metadata={}),
 AIMessage(content='- Use preferred vendors for booking flights, hotels, and car rentals to ensure competitive rates and compliance with travel policies (e.g., Expedia, Travelocity, Concur).\n- Book trips at least two weeks in advance for domestic travel and four weeks for international travel to secure better rates and avoid last-minute costs.\n- Obtain pre-approval for all business trips before booking, especially for international travel or exceeding budget limits, through a department head or a dedicated travel coordinator.\n- Keep all original receipts and invoices for travel expenses for reimbursement purposes. Scan and upload them to the designated expense reporting system.\n- Encourage environmentally responsible travel choices, such as using public transportation, opting for eco-friendly hotels, and minimizing paper receipts.\n- Domestic flights: