# Document RAG

### What is RAG?

RAG stands for retrieval augmented generation. It's a process where the user question is first used to retrieve the information, then both the question and the retrieved information is passed to an LLM as context. It allows AI models to access and use external information to improve their responses. It combines the power of large language models with the ability to retrieve relevant data, making their outputs more accurate, informative, and up-to-date.

The below is the high level representation of how a RAG system works.

![](rag_arch_diagram.png)

### Why RAG?

LLM'sare highly effective at processing and generating general information but may lack specialized domain expertise. Additionally, their knowledge is typically limited to information available up to their training cutoff date, rendering them unable to address events or updates that occurred thereafter. However, by providing relevant and context-specific data, these models can be equipped with the necessary domain knowledge to respond effectively to specialized queries. Think of it like having an AI assistant that can quickly look up facts or consult expert sources before giving you an answer. This makes the AI more reliable and helpful, especially when dealing with specific topics or needing the latest information.

### How is it implemented? 

Well, this notebook runs you through exactly that

In [1]:
!pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

import os
from langchain import hub
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
# from langchain_nomic import NomicEmbeddings
from langchain_openai import OpenAIEmbeddings
# from langchain_anthropic import ChatAnthropic
# from langchain_google_vertexai import ChatVertexAI
# from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
# from langchain_core.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain

load_dotenv()


True

## Initialize the required models for LLM, embedding and vector store

- LLM for the querying
- Embedding model for embedding the queries
- Vector store to store the embeddings

Uncomment the other options that are available.

The LLM can be OpenAI, Anthropic or Google Vertex AI models.

The embeddings can be of OpenAI or Nomic (Open Source)

The vector store can be in memory, Meta's FAISS or Chroma DB

In [3]:
# llm = ChatAnthropic(model="claude-3-5-haiku-20240620")
# llm = ChatVertexAI(model="gemini-1.5-flash")
llm = ChatOpenAI(model="gpt-4o-mini")

# embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# vector_store = InMemoryVectorStore(embeddings)
# vector_store = FAISS(embedding_function=embeddings)
vector_store = Chroma(embedding_function=embeddings)

## Load the documents into the vector store

### Steps:
- Load the PDF document using the PDF Loader
- Split the documents into chunks of 1000 characters with a 200 character overlap to avoid missing information
- Store the split documents in the Vector Store

In [4]:

document_loader = PyPDFLoader("./assets/idfc_first_select_cc.pdf")
pages = []
for page in document_loader.load():
    pages.append(page)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,
)
document_splits = text_splitter.split_documents(pages)

stored_document_ids = vector_store.add_documents(documents=document_splits)

In [5]:
question = "What is the rate of rewards this card give me?"

## Sending the question to the LLM directly

Let's first prompt the LLM with the user question and see what information that it responds with


As you can see, the LLM is not able to get a valid response as it does not have the context of what the user is asking.

In [6]:
llm_response = llm.invoke(question)
print(llm_response.content)

To provide accurate information about the rewards rate for a specific card, I would need to know the name of the card or the issuing bank. Different credit cards offer various rewards programs, including cash back, points, or miles, and the rates can vary. Please provide the name of the card or any details you have!


## Similarity Search on the Vector Store

Similarity search is a functionality where the vector of the question is compared with the vectors of the text chunks in the Vector Store. The text chunks which are very similar to the question are retrived from the DB.

This is the retrieval step in the RAG pipeline where the relevant information is retrieved from the Vector Store which is passed along with the user question to the LLM to get the results.

In [7]:
similar_results = vector_store.similarity_search(question, k=1)
for result in similar_results:
    print(result)

page_content='FIRST SWYP Credit Card
Not Applicable Not Applicable
(Customer has to either pay Total 
Amount Due in full or convert 
eligible due amount into EMI)  
(Customer has to either pay Total 
Amount Due in full or convert 
eligible due amount into EMI)  
All other IDFC FIRST Bank
Credit Cards
Monthly Rate - 0.75% - 3.65%
Annual Rate - 9% - 43.8%
Monthly Rate - 3.99%
Annual Rate - 47.88%' metadata={'page': 2, 'source': './assets/idfc_first_select_cc.pdf', 'start_index': 1589}


## Retrieve the relevant information from the vector store and ask questions on the data stored

### Steps:
- Take the query, and retrieve the relavant information from the vector store
- Pass the context and the question to the LLM to get the information

In [9]:
prompt_template = """You are a banker with an masters degree in finance and you are an expert at reading through terms and conditions of credit card and answering the questions asked by the user related to that credit card. 
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, then kindly answer with an 'The document does not outline the information that you requested'. Use three sentences maximum. Keep the answer as concise as possible.

{context}

Question: {question}
Helpful Answer:"""

prompt = PromptTemplate.from_template(prompt_template)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

result = qa_chain.invoke({"query": question})
print(result['result'])


The document does not outline the information that you requested.
