<a href="https://colab.research.google.com/github/darkwingpatil/Ml_hackethons/blob/main/RAG_Based_document_retrival.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **RAG : Retrieval Augmented Generation**

OBJECTIVES:

1. Load the Documents
2. Splitting the documents into chunks
3. Embedding the chunks and storing them in vector db
4. Retrieving the relevant chunks to the query
 * Addressing Diversity
 * Addressing Specificity
5. Connecting with LLM to get a final grounded answer
6. Re-ranking example with open source model

In [None]:
!pip install openai
!pip install langchain-core
!pip install langchain-openai
!pip install langchain-community
!pip install chromadb
!pip install pypdf

Collecting langchain-openai
  Downloading langchain_openai-0.2.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<0.4.0,>=0.3.15 (from langchain-openai)
  Downloading langchain_core-0.3.15-py3-none-any.whl.metadata (6.3 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading langchain_openai-0.2.5-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-0.3.15-py3-none-any.whl (408 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m408.7/408.7 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collect

In [None]:
import openai
import os

In [None]:
f = open('/content/ts_openapi_key.txt')
api_key = f.read()
os.environ['OPENAI_API_KEY'] = api_key
openai.api_key= os.getenv('OPENAI_API_KEY')

In [None]:
from langchain_openai import ChatOpenAI

### **Loading the documents**

[PDF Loader](https://python.langchain.com/docs/how_to/document_loader_pdf/)

In [None]:
from langchain_community.document_loaders import PyPDFLoader
# Load PDF
loaders = [
    # Duplicate documents on purpose
    PyPDFLoader("/content/pca_d1.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [None]:
print(docs[0].page_content)

### **Splitting of document**

[Recursively split by character](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

[Split by character](https://python.langchain.com/docs/how_to/character_text_splitter/)

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
# Split
#from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

In [None]:
splits = text_splitter.split_documents(docs)
print(len(splits))
print(len(splits[0].page_content) )
splits[0].page_content

In [None]:
splits

### **Embeddings**

Let's take our splits and embed them.

In [None]:
from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(model='text-embedding-3-small')

### **Understanding similarity search with a toy example**

In [None]:
sentence1 = "i like dogs"
sentence2 = "i like cats"
sentence3 = "the weather is ugly, too hot outside"

In [None]:
embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [None]:
len(embedding1), len(embedding2), len(embedding3)

In [None]:
embedding1

[0.016543298959732056,
 -0.03333469107747078,
 3.722277324413881e-05,
 0.0063320123590528965,
 0.02789919637143612,
 -0.011942299082875252,
 -0.007651416584849358,
 0.037259072065353394,
 -0.07275893539190292,
 -0.022069009020924568,
 -0.004648926202207804,
 -0.009630522690713406,
 -0.016453083604574203,
 0.024673985317349434,
 -0.0026162543799728155,
 0.023636505007743835,
 -0.04172474890947342,
 -0.02277945727109909,
 0.024899525567889214,
 0.0009233009768649936,
 -0.009748931042850018,
 0.05070120841264725,
 -0.0051620276644825935,
 -0.010792049579322338,
 -0.0029912132304161787,
 0.02812473475933075,
 0.02778642624616623,
 0.019633185118436813,
 0.006647767033427954,
 -0.021437497809529305,
 -0.010589064098894596,
 -0.010645449161529541,
 -0.010701834224164486,
 -0.019948940724134445,
 0.051197391003370285,
 -0.013363196514546871,
 -0.017614610493183136,
 -0.017276300117373466,
 -0.0001247513573616743,
 0.04490485042333603,
 -0.012664024718105793,
 -0.03308659791946411,
 0.04039406

In [None]:
import numpy as np

def cosine_similarity(vector1, vector2):
    # Ensure that the vectors are numpy arrays
    vector1 = np.array(vector1)
    vector2 = np.array(vector2)

    # Calculate the dot product of the vectors
    dot_product = np.dot(vector1, vector2)

    # Calculate the magnitude (norm) of the vectors
    norm_vector1 = np.linalg.norm(vector1)
    norm_vector2 = np.linalg.norm(vector2)

    # Compute cosine similarity
    if norm_vector1 == 0 or norm_vector2 == 0:
        return 0  # Avoid division by zero
    return dot_product / (norm_vector1 * norm_vector2)

In [None]:
cosine_similarity(embedding1, embedding2), cosine_similarity(embedding1, embedding3), cosine_similarity(embedding2, embedding3)

### **Vectorstores**

In [None]:
from langchain_community.vectorstores import Chroma # Light-weight and in memory

In [None]:
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any


In [None]:
vectordb = Chroma.from_documents(
    documents=splits, # splits we created earlier
    embedding=embedding,
    persist_directory=persist_directory, # save the directory
)

In [None]:
print(vectordb._collection.count()) # same as number of splites

### **Similarity Search**

In [None]:
question = "how does ensemble method works?"

In [None]:
docs = vectordb.similarity_search(question,k=6) # k --> No. of doc as return
print(len(docs))
for i in range(len(docs)):
  print(docs[i].page_content,'\n','####################')

### **Edge case where failure may happen**

1. Lack of Diversity : Semantic search fetches all similar documents, but does not enforce diversity.

    - Notice that we're getting duplicate chunks (because of the duplicate `ens_d2.pdf` in the index). `docs[0]` and `docs[1]` are indentical.

  **Addressing Diversity - MMR-Maximum Marginal Relevance**

2. Lack of spefificity:  The question may be from a particular doc but answer may contain information from other doc.

  **Addressing Specificity: Working with metadata - Manually**

  **Working with metadata using self-query retriever -Automatically**

In [None]:
question= 'how ensemble method works?'
docs = vectordb.similarity_search(question,k=3) # Without MMR
for i in range(len(docs)):
  print(docs[i].page_content,'\n','####################')

**Example 1. Addressing Diversity - MMR-Maximum Marginal Relevance**

In [None]:
docs_with_mmr=vectordb.max_marginal_relevance_search(question, k=3, fetch_k=6) # With MMR
for i in range(len(docs_with_mmr)):
  print(docs_with_mmr[i].page_content,'\n','####################')

**Example 2. Addressing Specificity: Working with metadata - Manually**

In [None]:
# Without metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(question,k=7)
for doc in docs:
    print(doc.metadata) # metadata contains information about from which doc the answer has been fetched

Notice above, the 2nd last information is from 'ens_d2' doc.

In [None]:
# With metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(
    question,
    k=7,
    filter={"source":'/content/pca_d1.pdf'} # manually passing metadata, using metadata filter.
)

for doc in docs:
    print(doc.metadata)

[**Addressing Specificity -Automatically: Working with metadata using self-query retriever**](https://python.langchain.com/docs/how_to/self_query/)

### **Additional tricks: Compression**

Another approach for improving the quality of retrieved docs is compression. Information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

[Contextual compression](https://python.langchain.com/docs/how_to/contextual_compression/) is meant to fix this.

### **Better Approach**

**[Vectorstore as a retriever](https://python.langchain.com/docs/how_to/vectorstore_retriever/)**

In [None]:
# Without MMR
question = "What is principal component analysis?"
retriever = vectordb.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke(question)
docs

In [None]:
# With MMR
retriever = vectordb.as_retriever(search_type="mmr",search_kwargs={"k": 5, "fetch_k":10})
docs = retriever.invoke(question)
docs

### **Retrieval + Question Answering :  Connecting with LLMs**

In [None]:
retriever = vectordb.as_retriever(search_type="mmr",search_kwargs={"k": 7, "fetch_k":15})

In [None]:
llm_name = "gpt-4o-mini"
print(llm_name)

gpt-4o-mini


In [None]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)

In [None]:
from langchain_core.prompts import PromptTemplate     #
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

In [None]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

### **Creating final RAG Chain**

In [None]:
from langchain_core.runnables import RunnableLambda

In [None]:
rag_chain= {"context":RunnableLambda(lambda x:x["question"])| retriever,
         "question": lambda x:x["question"]}|QA_PROMPT | llm |StrOutputParser()

In [None]:
# Question 1
response=rag_chain.invoke({"question" :"What is principal component analysis?"})
print(response)

* Checking what is retreived from the retriever:

In [None]:
(RunnableLambda(lambda x:x["question"])| retriever).invoke({"question" :"What is principal component analysis?"})

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [None]:
# Question 2
response=rag_chain.invoke({"question" :"how ensemble method works?"})
print(response)

* Checking what is retreived from the retriever:

In [None]:
(RunnableLambda(lambda x:x["question"])| retriever).invoke({"question" :"how ensemble method works?"})

In [None]:
# Question 3
response=rag_chain.invoke({"question" :"What is Linear Regression? "})
print(response)

* Checking what is retreived from the retriever:

In [None]:
(RunnableLambda(lambda x:x["question"])| retriever).invoke({"question" :"What is Linear Regression? "})

[**Details of Chroma through LangChain**](https://python.langchain.com/docs/integrations/vectorstores/chroma/)

### **Download the vector DB**

In [None]:
# Zip the entire folder
!zip -r /content/docs.zip /content/docs

In [None]:
from google.colab import files
files.download("/content/docs.zip")

### **Upload the vector db from previous step and unzip**

In [None]:
!unzip /content/docs.zip  -d /

In [None]:
embedding = OpenAIEmbeddings()

vectordb = Chroma(persist_directory = 'docs/chroma/',
                  embedding_function = embedding
                  )

### **Re-ranking example wiht Open Source model**

* [Retrieve & Re-Rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
* [MS MARCO Cross-Encoders](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html) for Re-ranking
  * Usage with **SentenceTransformers
Pre-trained models** can be used like this:

In [None]:
!pip install sentence-transformers



In [None]:
import torch

In [None]:
# Define a query and some candidate sentences
query = "I love programming in Python."

# Some toy data representing candidate sentences/documents
candidates = [
    "Python is a great programming language.",
    "I enjoy long walks on the beach.",
    "Machine learning can be used to build models.",
    "I like writing code in Python.",
    "Artificial intelligence is fascinating."
]

In [None]:
Paragraph1=candidates[0]
Paragraph2=candidates[1]
Paragraph3=candidates[2]

In [None]:
from sentence_transformers import CrossEncoder
model_name='cross-encoder/ms-marco-TinyBERT-L-2-v2'
model = CrossEncoder(model_name, max_length=512)
scores = model.predict([(query, Paragraph1), (query, Paragraph2), (query, Paragraph3)])
print(scores)

* **Usage with Transformers**

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

features = tokenizer([query, query, query], [Paragraph1, Paragraph2, Paragraph3], padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)