<a href="https://colab.research.google.com/github/deepakk7195/IISC_CDS_DS/blob/Scalable_ML_GenAI/AST_09_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A Program by IISc and TalentSprint
### Assignment 9: Retrieval Augmented Generation (RAG)  ü¶úüîó

## Learning Objectives

At the end of the experiment, you will be able to:

* **Understand the role of the Hugging Face API in natural language processing tasks:** You will learn how to use the Hugging Face API to access pre-trained language models, embeddings, and other NLP-related functionalities.

* **Explore embeddings using the Hugging Face API:** You will learn how to use Hugging Face embeddings to represent text data in a vector space, enabling various NLP tasks such as similarity search and clustering.

* **Perform information retrieval with maximum marginal relevance (MMR):** In this notebook, you will learn how to implement an information retrieval system using maximum marginal relevance (MMR) to retrieve relevant documents based on a given query.

* **Generate text responses using a language model (LLM):** This notebook will give a fair idea how to use a Hugging Face language model (LLM) for text generation tasks, including setting parameters for maximum new tokens, top-k sampling, and temperature.

* **Create a chatbot using a language model:** You will be able to integrate a Hugging Face language model into a chatbot application, enabling conversational interactions with users.

* **Implement a retrieval-based question answering (QA) system:** You will be learning how to build a retrieval-based QA system using a combination of a language model and an information retrieval component.

* **Understand and create RAG prompts:** This assignment will help you to construct prompts for **Retrieval Augmented Generation (RAG)** models, providing context and questions to generate informative and relevant responses.

* **Apply learned concepts to real-world NLP tasks:** From this exercise, you will gain hands-on experience by applying the learned concepts to practical NLP tasks such as text generation, question answering, and chatbot development using the Hugging Face API.

* **Use open source LLMs:** This notebook will show you the implementations through HuggingFaceHub with LangChain

* **Implementing retrieval-based question answering:** In this experiemt, you will know how to create a retrieval-based question answering (QA) system using the **ConversationBufferMemory**


### Setup Steps:

In [1]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2301931" #@param {type:"string"}

In [2]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "9665220904" #@param {type:"string"}

In [3]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M3_AST_09_RAG_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    ipython.magic("sx wget https://cdn.exec.talentsprint.com/static/cds/content/pca_d1.pdf")
    ipython.magic("sx wget https://cdn.exec.talentsprint.com/static/cds/content/ens_d2.pdf")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")

Setup completed successfully


### Steps for Creating Hugging Face access tokens:

* **Visit the Hugging Face Website:** Head to the Hugging Face website (https://huggingface.co/) to begin the account creation process.

* **Click on ‚ÄúSign Up‚Äù:** Locate the ‚ÄúSign Up‚Äù button on the top right corner of the homepage and click on it.

* **Choose a Sign-Up Method:** Hugging Face offers multiple sign-up methods, including Google, GitHub, and email. Select your preferred method and follow the prompts to complete the registration.

* **Verify Your Email (if applicable):** If you choose to sign up via email, verify your email address by clicking on the confirmation link sent to your inbox.

* **Complete Your Profile:** Enhance your Hugging Face experience by completing your profile. Add a profile picture, a short bio, and any other details you‚Äôd like to share with the community.

* **Create Your Access Token:** Go to the link (https://huggingface.co/settings/tokens)

* **Click on the option 'Access Tokens' from the left pane.**

* **Then under the User Access Tokens, click on the button 'New token'.** The Hugging Face access token will be generated. Copy and paste the access token in your Google Colab Notebook.

### **Installing and importing packages**

In [4]:
!pip install huggingface-hub
!pip -q install langchain
!pip -q install langchain_community
!pip -q install pypdf
!pip -q install chromadb
!pip -q install tiktoken
!pip install langchainhub

[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m973.7/973.7 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m307.9/307.9 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m121.4/121.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m49.3/49.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m53.0/53.0 kB[0m [31m5.7 MB/s[0m eta [36m0:00:

**Install the above packages and then Restart the runtime**

In [5]:
import os
import numpy as np
from getpass import getpass
from langchain import hub
from langchain_community.llms import HuggingFaceEndpoint
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceHubEmbeddings
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from google.colab import files

#### **Authentication for Hugging Face API**

In [6]:
hfapi_key = getpass("Enter you HuggingFace access token:")
os.environ["HF_TOKEN"] = hfapi_key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hfapi_key

Enter you HuggingFace access token:¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


### **Loading the documents**
The PyPDFLoader class is to load text content from PDF documents into a format suitable for further analysis or processing. This class is a part of the langchain_community library.

* The PyPDFLoader class contains methods for parsing PDF files and extracting text content from them.
* It may handle various PDF file structures, including text, images, and tables, and extract text from them accordingly.
* Additionally, it may handle cases where PDF documents contain multiple pages or complex formatting.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

In [7]:
# Load PDF
loaders = [
    # Duplicate documents on purpose
    PyPDFLoader("/content/pca_d1.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In the above code cell, we are loading PDF documents. The code intends to load PDF documents using the PyPDFLoader class.

Three instances of PyPDFLoader are created with paths to three different PDF files: pca_d1.pdf and ens_d2.pdf.

**Note:** Please note that in the above code section, the **ens_d2.pdf** is included twice in the list of loaders to simulate duplicate documents. It was intentional to mimic data discrepancy that will give duplicate chunks in results.
Later in this experiment, the Maximal Marginal Relevance (MMR) will handle it.
* MMR will handle the duplicate chunks and give output only the diverse set.
* The idea behind MMR is to find the most diverse set of documents, while also keeping the most relevant ones.

In [8]:
print(docs[0].page_content)

 
1  
 N  
1 Principal  Component  Analysis  
In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  
data and find various patterns in it or use it to train some machine learning models.  One way to  
think  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as 
a physical  object  then  dimensions  are merely  a basis  of view,  like where  is the data  located  when  
it is observed  from  horizontal  axis or vertical  axis.  
As the dimensions  of data  increases,  the difficulty  to visualize  it and perform  computations  on 
it also increases.  So, how  to reduce  the dimensions  of a data: - 
‚Ä¢ Remove  the redundant  dimensions  
‚Ä¢ Only keep the most important dimensions  
Let us first try to understand  some  terms: - 
Variance : It is a measure of the variability or it simply measures how spread the data set is.  
Mathematically,  it is the average  squared  deviation  from  the

In the above code cell, the docs[0] represents the first document object in the list of documents (docs).

Each document object contains information about the PDF file it represents, such as metadata, text content, page structure, etc.

In [9]:
# Split
#from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

In the above approach, the Recursive Character Text Splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough.

* The technique recursively splits a given text string into its individual characters.
* It starts by breaking down the text into smaller substrings, each containing fewer characters.
* This process continues recursively until each substring consists of only one character.
* The recursion stops when the length of the substring becomes 1, indicating that it is a single character.
At this point, the recursion stops, and the single character is returned.

**chunk_size:** This parameter specifies the maximum number of characters in each chunk or substring after splitting. It basically determines the size of each chunk of text after splitting. We have kept chunk_size = 500. So, each chunk will contain at most 500 characters.

**chunk_overlap:** This parameter specifies the number of characters that overlap between adjacent chunks. It helps ensure continuity and coherence between adjacent chunks by including some overlapping context. Here, we have kept chunk_overlap = 50. So, the last 50 characters of one chunk will overlap with the first 50 characters of the next chunk.

In [10]:
splits = text_splitter.split_documents(docs)
print(len(splits))
splits

27


[Document(page_content='1  \n N  \n1 Principal  Component  Analysis  \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as \na physical  object  then  dimensions  are merely  a basis  of view,  like where  is the data  located  when', metadata={'source': '/content/pca_d1.pdf', 'page': 0}),
 Document(page_content='it is observed  from  horizontal  axis or vertical  axis.  \nAs the dimensions  of data  increases,  the difficulty  to visualize  it and perform  computations  on \nit also increases.  So, how  to reduce  the dimensions  of a data: - \n‚Ä¢ Remove  the redundant  dimensions  \n‚Ä¢ Only keep the most important dimensions  \nLet us first try to understand  some  terms: - \nVariance : It is a measure of the variability or 

### **Embeddings**

Let's take our splits and embed them.

The **HuggingFaceHubEmbeddings** class is responsible for generating embeddings (vector representations) of text using pre-trained models available on the Hugging Face Model Hub.
* HuggingFaceHubEmbeddings utilizes pre-trained language models (BERT, RoBERTa, GPT, etc.) from the Hugging Face Model Hub to generate embeddings for input text.
* These embeddings capture semantic information about the text, representing it in a continuous vector space.
* By making use of the pre-trained models from the Hugging Face Model Hub, **HuggingFaceHubEmbeddings** allows for the generation of high-quality embeddings without the need for extensive training data or computational resources.
* So, it provides a convenient and efficient way to incorporate state-of-the-art semantic representations into NLP pipelines and applications.
* After instantiation, the **HuggingFaceHubEmbeddings** instance can be used to generate embeddings for input text by calling its methods like **embed_query()**.
* These embeddings can then be used as input features for the downstream natural language processing task **(similarity search)**.

In [11]:
embedding = HuggingFaceHubEmbeddings()

In [12]:
embedding

HuggingFaceHubEmbeddings(client=<InferenceClient(model='sentence-transformers/all-mpnet-base-v2', timeout=None)>, async_client=<InferenceClient(model='sentence-transformers/all-mpnet-base-v2', timeout=None)>, model='sentence-transformers/all-mpnet-base-v2', repo_id='sentence-transformers/all-mpnet-base-v2', task='feature-extraction', model_kwargs=None, huggingfacehub_api_token=None)

### **Understanding similarity search with a toy example**

1. **Vector Space Representation:**
* Each sentence is converted into a high-dimensional vector representation in a continuous vector space.
* This transformation is performed using embedding techniques, where words or tokens in the sentences are mapped to vectors in the vector space.
* The resulting vectors capture semantic information about the sentences, with similar sentences being mapped closer together in the vector space.

2. **Dot Product:**
* The dot product is a mathematical operation that measures the similarity or alignment between two vectors.
* It calculates the cosine of the angle between the two vectors, with higher values indicating greater similarity.
* For two vectors **$a$ and $b$**, the dot product **$a.b$** is calculated as the sum of the element-wise products of the two vectors.

3. **Similarity Measurement:**

* In the provided code, the dot product is calculated between the embeddings of different sentences to measure their similarity.
* For example, **np.dot(embedding1, embedding2)** computes the similarity between **sentence1 and sentence2**.
* Similarly, **np.dot(embedding1, embedding3) and np.dot(embedding2, embedding3)** compute the similarities between **sentence1 and sentence3**, and between **sentence2 and sentence3**, respectively.

4. **Interpretation:**
* A higher dot product value indicates greater similarity between the corresponding sentences in the vector space.
* By comparing the dot product values between different pairs of sentences, we can assess their semantic similarity or relatedness.
* In the toy example provided below, we evaluate the similarities between **sentence1 & sentence2**, **sentence1 & sentence3**, and **sentence2 & sentence3**.



In [13]:
sentence1 = "i like dogs"
sentence2 = "i like cats"
sentence3 = "the weather is ugly, too hot outside"

In [14]:
embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [15]:
len(embedding1), len(embedding2), len(embedding3)

(768, 768, 768)

In [16]:
np.dot(embedding1, embedding2), np.dot(embedding1, embedding3),np.dot(embedding2, embedding3)

(0.7948764601906951, 0.1009330745675062, 0.10528326984050976)

### **Vectorstores**

In the following code cell, the Chroma.from_documents(...) method is executed, which processes the provided documents, generates embeddings for each document using the specified embedding model, and indexes them in the Chroma object.

In [17]:
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [18]:
vectordb = Chroma.from_documents(
    documents=splits, # splits we created earlier
    embedding=embedding,
    persist_directory=persist_directory # save the directory
)

In the above code cell, in order to enable fast similarity search operations, we use Chroma Object (vectordb) for storing the document embeddings in an efficient data structure, such as a spatial index or a hash table.
* The resulting Chroma object, named **vectordb**, contains the indexed documents and their embeddings.
* It provides methods for querying the indexed documents based on similarity to a given query document or vector
* Additionally, it may support operations such as nearest neighbor search, clustering, and retrieval of similar documents.

In [19]:
vectordb.persist() # Let's **save vectordb** so we can use it later!

  warn_deprecated(


In [20]:
print(vectordb._collection.count()) # same as number of splites

27


### **Similarity Search**

In the following approach, the **vectordb.similarity_search(...)** method is called to perform the similarity search.

The method takes the query question (question) as input and returns a list of documents similar to the query question based on their embeddings stored in the vectordb object.

The parameter k=3 specifies that we want to retrieve the top 3 most similar documents as the search result.

In [21]:
question = "how does pca reduce the dimension?"

In [22]:
docs = vectordb.similarity_search(question,k=3) # k --> No. of doc as return
print(len(docs))
print(docs[0].page_content)
print(docs[1].page_content)
print(docs[2].page_content)

3
This defines  the goal  of PCA: - 
1. Find  linearly  independent  dimensions  which  can losslessly  represent  the data  points.  
2. Those  newly  found  dimensions  should  allow  us to predict/reconstruct  the original  dimensions.
1  
 N  
1 Principal  Component  Analysis  
In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  
data and find various patterns in it or use it to train some machine learning models.  One way to  
think  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as 
a physical  object  then  dimensions  are merely  a basis  of view,  like where  is the data  located  when
2  
  
So, what does  Principal  Component  Analysis  (PCA)  do?  
PCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  
orthogonal (and hence linearly independent) and ranked according to the variance of data along  
them. It means more important prin

In the above code cell, the variable 'question' contains the query question for which we want to find similar documents. In this scenario, the query question is **"how does pca reduce the dimension?"**

* As per the above result, the returned list of documents (docs) contains the search results, with each document representing a similar document to the query question.
* The length of the docs list is printed using print(len(docs)) to display the number of similar documents retrieved.
* Then, the content of the top 3 similar documents is printed using print(docs[0].page_content), print(docs[1].page_content), and print(docs[2].page_content), respectively.

### **Edge case where failure may happen**

1. Lack of Diversity : Semantic search fetches all similar documents, but does not enforce diversity.

    - Notice that we're getting duplicate chunks (because of the duplicate `ens_d2.pdf` in the index). `docs[0]` and `docs[1]` are non-identical.

  **Addressing Diversity - MMR-Maximum Marginal Relevance**

2. Lack of specificity:  The question may be from a particular doc but answer may contain information from other doc.

  **Addressing Specificity: Working with metadata - Manually**

  **Working with metadata using self-query retriever -Automatically**

  **Example 1. Addressing Diversity - MMR-Maximum Marginal Relevance**

In [23]:
question= 'how ensemble method works?'
docs = vectordb.similarity_search(question,k=2) # Without MMR

In [24]:
docs[0]

Document(page_content='1   \nEnsemble  Methods  \nLet us consider  a real  world  situation  which  uses  Ensemble  Methods,  which  is, when  a user  wants  \nto buy a new product. Many users who have already purchased that product will have given either  \npositive  or negative  ratings.  If in the group,  many  users  have  given  positive  ratings,  then  the \ncombined  rating  will be positive.  Instead  of a single  rating,  the ratings  of the group  of users  is', metadata={'page': 0, 'source': '/content/ens_d2.pdf'})

In [25]:
docs[1]

Document(page_content='1   \nEnsemble  Methods  \nLet us consider  a real  world  situation  which  uses  Ensemble  Methods,  which  is, when  a user  wants  \nto buy a new product. Many users who have already purchased that product will have given either  \npositive  or negative  ratings.  If in the group,  many  users  have  given  positive  ratings,  then  the \ncombined  rating  will be positive.  Instead  of a single  rating,  the ratings  of the group  of users  is', metadata={'page': 0, 'source': '/content/ens_d2.pdf'})

In the following code cell, the **max_marginal_relevance_search** function with **Maximal Marginal Relevance (MMR)** is a method used to perform a similarity search with the addition of a relevance-based diversification component.

In [26]:
docs_with_mmr=vectordb.max_marginal_relevance_search(question, k=3, fetch_k=6) # With MMR

In [27]:
docs_with_mmr[0]

Document(page_content='1   \nEnsemble  Methods  \nLet us consider  a real  world  situation  which  uses  Ensemble  Methods,  which  is, when  a user  wants  \nto buy a new product. Many users who have already purchased that product will have given either  \npositive  or negative  ratings.  If in the group,  many  users  have  given  positive  ratings,  then  the \ncombined  rating  will be positive.  Instead  of a single  rating,  the ratings  of the group  of users  is', metadata={'page': 0, 'source': '/content/ens_d2.pdf'})

1. In the above result, **docs_with_mmr[0]** represents the document with the highest relevance score according to the MMR algorithm.
2. This document is expected to have a high similarity to the query question while also providing diverse information compared to the other search results.
3. The MMR algorithm aims to balance relevance and diversity, so **docs_with_mmr[0]** should be both relevant and distinct from the other documents in the search results.

In [28]:
docs_with_mmr[1]

Document(page_content='considered.  The product  is bought  by the user  when  the combined  ratings  of the group  is positive.  \nThe user  gets  a fairer  idea  about  the product  when  all the ratings  are combined.  \nHere, the combination of ratings is done so that the decision making process of the user is made  \neasy.  \nEnsemble Methods refer to combining many different machine learning models in order to get a  \nmore  powerful  prediction.', metadata={'page': 0, 'source': '/content/ens_d2.pdf'})

1. In the above result, **docs_with_mmr[1]** represents the second most relevant document according to the MMR algorithm.
2. While still relevant to the query question, **docs_with_mmr[1]** may contain information that is slightly less similar to the query compared to **docs_with_mmr[0]**.

In [29]:
docs_with_mmr[2]

Document(page_content='more  powerful  prediction.  \nThus,  ensemble  methods  increase  the accuracy  of the predictions.  \n \nWhy  use Ensemble  Methods?  \nEnsemble  Methods  are used  in order  to: \n‚Ä¢ decrease  variance  (bagging)  \n‚Ä¢ decrease  bias (boosting)  \n‚Ä¢ improve  predictions  (stacking)  \n \nBagging  \nBagging  actually  refers  to Bootstrap  Aggregators.  \nBagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap -', metadata={'page': 0, 'source': '/content/ens_d2.pdf'})

1. Similarly, the above output of the **docs_with_mmr[2]** represents the third most relevant document based on the MMR algorithm.
2. It is expected to be relevant to the query question but may contain information that is more diverse compared to **docs_with_mmr[0]** and **docs_with_mmr[1]**.

 **Example 2. Addressing Specificity: Working with metadata - Manually**

**Specificity:**
* In information retrieval tasks such as similarity search, addressing specificity refers to obtaining search results that are relevant and specific to the user's query.
* When dealing with large document collections, it's common for similarity search algorithms to retrieve documents that are relevant but not necessarily specific to the user's query.
* By working with metadata associated with documents, we can provide additional context and information about the search results, helping users assess the relevance and specificity of the retrieved documents.
* Metadata contains information about the documents, such as titles, authors, publication dates, or any other relevant information.

-- In the following code cell, the code starts with a query question defined as question = **"what is the role of variance in pca?"**.

-- The **vectordb.similarity_search(...)** method is called to perform a similarity search based on the query question.

-- The parameter k=5 specifies that the top 5 most similar documents should be retrieved as search results.

In [30]:
# Without metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(question,k=5)
for doc in docs:
    print(doc.metadata) # metadata contains information about from which doc the answer has been fetched

{'page': 1, 'source': '/content/pca_d1.pdf'}
{'page': 2, 'source': '/content/pca_d1.pdf'}
{'page': 1, 'source': '/content/pca_d1.pdf'}
{'page': 2, 'source': '/content/pca_d1.pdf'}
{'page': 0, 'source': '/content/pca_d1.pdf'}


In [31]:
# With metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(
    question,
    k=5,
    filter={"source":'/content/pca_d1.pdf'} # manually passing metadata, using metadata filter.
)

for doc in docs:
    print(doc.metadata)

{'page': 1, 'source': '/content/pca_d1.pdf'}
{'page': 2, 'source': '/content/pca_d1.pdf'}
{'page': 1, 'source': '/content/pca_d1.pdf'}
{'page': 2, 'source': '/content/pca_d1.pdf'}
{'page': 0, 'source': '/content/pca_d1.pdf'}


[**Addressing Specificity -Automatically: Working with metadata using self-query retriever**](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query)



### **Additional tricks: Compression**

Another approach for improving the quality of retrieved docs is compression. Information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

[Contextual compression](https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression) is meant to fix this.

To use the Contextual Compression Retriever, you'll need:

* a base retriever
* a Document Compressor

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.

### **Retrieval + Question Answering :  Connecting with LLMs**

Connecting with LLMs typically involves initializing an instance of the chosen model and setting up parameters for its usage.

* In the following code, it merely assigns the name of the LLM to a variable and prints it.
* In this case, it will print "gpt-3.5-turbo", indicating the name of the chosen LLM.

In [32]:
llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


In the following code cell, **max_marginal_relevance_search** is a function or method used to perform a similarity search with the addition of a relevance-based diversification component, typically using the Maximal Marginal Relevance (MMR) algorithm.

This function retrieves documents from the vectordb that are both relevant to the query and diverse from each other, according to specified criteria.

**question** is a variable containing the query question for which we want to find similar documents.
In this case, the query question is "**What is principal component analysis?**"

The parameter **k=2** specifies that we want to retrieve the top 2 most relevant and diverse documents as the search result.

The parameter **fetch_k=3** specifies that, for each relevant document retrieved, the function should also fetch additional related documents to ensure diversity.
In this case, for each of the top 2 relevant documents retrieved, the function will also fetch 3 additional related documents to consider for diversification.

In [33]:
question = "What is principal component analysis?"
docs = vectordb.max_marginal_relevance_search(question, k=2, fetch_k=3)
len(docs)

2

In [34]:
docs[0]

Document(page_content='2  \n  \nSo, what does  Principal  Component  Analysis  (PCA)  do?  \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread  out data)  \n \nHow  does  PCA  work?  \n‚Ä¢ Calculate  the covariance matrix  X of data  points.', metadata={'page': 1, 'source': '/content/pca_d1.pdf'})

In [35]:
docs[1]

Document(page_content='1  \n N  \n1 Principal  Component  Analysis  \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as \na physical  object  then  dimensions  are merely  a basis  of view,  like where  is the data  located  when', metadata={'page': 0, 'source': '/content/pca_d1.pdf'})

####**[RetrievalQA chain](https://docs.smith.langchain.com/cookbook/hub-examples/retrieval-qa-chain)**

Retrieval methods retrieve a set of candidate documents
$D$ = ${d_{1}, d_{2}, ...., d_{n}}$ from a larger corpus based on their relevance to the query $q$.
This can be represented as:
$D=Retrieve(q)$

QA models are then applied to the retrieved documents to generate candidate answers $A$ = ${a_{1}, a_{2}, ...., a_{n}}$ to the user query $q$. This can be represented as:
$A=QA(D,q)$

The final answer $a_{final}$ is selected from the candidate answers based on various criteria, such as relevance, confidence scores, or other heuristics. This can be represented as:
$a_{final}$ = $Select(A)$

####**[Vector store-backed retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore)**

A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface. It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.

Once you construct a vector store, it's very easy to construct a retriever. Let's walk through the following example.

In the following code cell, we initialize a HuggingFaceEndpoint object named **chat_llm**, which connects to a Hugging Face model endpoint for text generation.

* **repo_id:** It specifies the repository ID of the Hugging Face model to connect to.
* **task:** It defines the task for which the model will be used, in this case, "text-generation".
* **max_new_tokens:** This sets the maximum number of new tokens that can be generated in each response.
* **top_k:** This parameter controls the number of top-k tokens considered during text generation.
* **temperature:** This parameter adjusts the diversity of the generated text by scaling the logits before applying the softmax function.
* **repetition_penalty:** It penalizes the likelihood of generating repeated tokens in the generated text.

In [36]:
chat_llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    max_new_tokens = 512,
    top_k = 30,
    temperature = 0.1,
    repetition_penalty = 1.03,
)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


  warn_deprecated(


In the following code cell, we have instantiated **ChatHuggingFace**
1. It is a wrapper for using Hugging Face LLM‚Äôs as ChatModels.
2. It works with HuggingFaceTextGenInference, HuggingFaceEndpoint, and HuggingFaceHub LLMs.
3. Upon instantiating this class, the model_id is resolved from the url provided to the LLM, and the appropriate tokenizer is loaded from the HuggingFace Hub.
4. Once instantiated, ChatHuggingFace establishes a connection to the specified Hugging Face model endpoint (chat_llm) for chat-based interactions.
5. **Conversation Handling:** It provides methods to handle conversation flows such as sending messages, receiving responses, and maintaining context during the conversation.
6. This is adapted from: [llama2_chat](https://python.langchain.com/docs/integrations/chat/llama2_chat)
7. It creates a new model by parsing and validating input data from keyword arguments.
8. **Text Generation:** Utilizing the Hugging Face model endpoint, ChatHuggingFace, it can generate text responses to incoming messages or prompts, leveraging the capabilities of the underlying model for text generation tasks.
9. It raises ValidationError if the input data cannot be parsed to form a valid model.

In [37]:
llm = ChatHuggingFace(llm=chat_llm)

  warn_deprecated(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [38]:
question = "What is principal component analysis?"

qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectordb.as_retriever(), return_source_documents=True)

result = qa_chain.invoke({"query": question})

In [39]:
result["result"]

'Principal Component Analysis (PCA) is a statistical technique used to analyze complex data with multiple dimensions. It finds a new set of dimensions (also called principal components) that are orthogonal (linearly independent) and ranked based on the variance of the data along them. The goal of PCA is to find linearly independent dimensions that can losslessly represent the data points and allow us to predict or reconstruct the original dimensions. By transforming the original data points such that their covariance becomes a diagonal matrix, we can achieve this goal. Normalizing the data before doing PCA is recommended if the features have different scales to avoid misleading components. In summary, PCA helps to simplify complex data by reducing the number of dimensions and identifying the most important features.'

In [40]:
result["source_documents"]

[Document(page_content='2  \n  \nSo, what does  Principal  Component  Analysis  (PCA)  do?  \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread  out data)  \n \nHow  does  PCA  work?  \n‚Ä¢ Calculate  the covariance matrix  X of data  points.', metadata={'page': 1, 'source': '/content/pca_d1.pdf'}),
 Document(page_content='1  \n N  \n1 Principal  Component  Analysis  \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as \na physical  object  then  dimensions  are merely  a basis  

###**Understanding RAG Prompt under the hood**

In [41]:
prompt = hub.pull("rlm/rag-prompt")
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

Use three sentences maximum. Keep the answer as concise as possible.

In the following activity, we build a prompt template using the **PromptTemplate** class from the **langchain.prompts** module.

**Functionality:**
* Template Construction:
The template string contains a customizable format for constructing prompts. It includes placeholders such as {context} and {question} that will be filled in with actual context and question data.
* PromptTemplate Initialization: **PromptTemplate(input_variables=["context", "question"], template=template)** initializes an instance of the PromptTemplate class with specified input variables and the template string.
* **input_variables** specifies the variables that will be used to fill in the placeholders in the template.
* **template** parameter contains the template string defining the structure of the prompt.

In [42]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

In [43]:
QA_CHAIN_PROMPT

PromptTemplate(input_variables=['context', 'question'], template='Use the following pieces of context to answer the question at the end.\nIf you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\nAlways say "thanks for asking!" at the end of the answer.\n{context}\nQuestion: {question}\nHelpful Answer:')

In the below code cell,
* **llm:** It specifies the language model used for question answering.
* **retriever:** It defines the document retrieval method, including search type and search parameters.

    -- We configure the vectordb object to act as a retriever for similarity
    search using Maximal Marginal Relevance (MMR) algorithm. It specifies to retrieve 2 top relevant documents and fetch 6 additional related documents for diversification.
* **chain_type_kwargs:** It provides additional settings for the retrieval-based question answering chain, such as the prompt template.
* **return_source_documents:** It determines whether the retrieved source documents are returned along with the answers.

In [44]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(search_type="mmr",search_kwargs={"k": 2, "fetch_k":6} ), # "k":2, "fetch_k":3
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
                                       return_source_documents=True
                                       )

In [45]:
qa_chain

RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='Use the following pieces of context to answer the question at the end.\nIf you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\nAlways say "thanks for asking!" at the end of the answer.\n{context}\nQuestion: {question}\nHelpful Answer:'), llm=ChatHuggingFace(llm=HuggingFaceEndpoint(repo_id='HuggingFaceH4/zephyr-7b-beta', top_k=30, temperature=0.1, repetition_penalty=1.03, model='HuggingFaceH4/zephyr-7b-beta', client=<InferenceClient(model='HuggingFaceH4/zephyr-7b-beta', timeout=120)>, async_client=<InferenceClient(model='HuggingFaceH4/zephyr-7b-beta', timeout=120)>, task='text-generation'), tokenizer=LlamaTokenizerFast(name_or_path='HuggingFaceH4/zephyr-7b-beta', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='left', special_token

**Example 1**

In [46]:
question = "What is principal component analysis?"
result = qa_chain.invoke({"query": question})
result["source_documents"]

[Document(page_content='2  \n  \nSo, what does  Principal  Component  Analysis  (PCA)  do?  \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread  out data)  \n \nHow  does  PCA  work?  \n‚Ä¢ Calculate  the covariance matrix  X of data  points.', metadata={'page': 1, 'source': '/content/pca_d1.pdf'}),
 Document(page_content='1  \n N  \n1 Principal  Component  Analysis  \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink  about  dimensions  is that  suppose  you have  an data  point  x , if we consider  this data  point  as \na physical  object  then  dimensions  are merely  a basis  

In [47]:
result["result"]

'Principal Component Analysis (PCA) is a statistical technique that transforms a large set of variables into a smaller set of variables called principal components. These principal components are orthogonal and ordered by their variance, with the first principal component explaining the most variability in the data. PCA helps to simplify complex multidimensional data by reducing the number of dimensions required to represent it, making it easier to visualize and analyze. By finding a new set of dimensions that are orthogonal and ranked by variance, PCA allows us to identify the most important factors that contribute to the variability in the data. In summary, PCA is a powerful tool for data analysis that can help to reduce dimensionality, simplify data interpretation, and improve model accuracy. Thanks for asking!'

**Example 2**

In [48]:
question = "What does it say about variance in context of both PCA and Ensemble?"
result = qa_chain({"query": question})
result["source_documents"]

  warn_deprecated(


[Document(page_content='observation and vice versa. Boosting in general decreases the bias error and builds strong predictive  \nmodels.  \n \nVariance  \nVariance quantifies how the predictions made on same observation are different from each other. A  \nhigh variance model will over -fit on your training population and perform badly on any observation  \nbeyond  training.  Thus,  we aim at low variance.', metadata={'page': 0, 'source': '/content/ens_d2.pdf'}),
 Document(page_content='2  \n  \nSo, what does  Principal  Component  Analysis  (PCA)  do?  \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread  out data)  \n \nHow  does  PCA  work?  \n‚Ä¢ Calculate  the covariance matrix  X of data  points.', metadata={'page': 1, 'source': '/co

In [49]:
result["result"]

'In the context of PCA, variance refers to the spread or dispersion of the data points around their mean. PCA aims to find a new set of dimensions that capture the maximum variance in the data, as this will result in a lower dimensional representation that still preserves most of the variability in the original data. This is achieved by finding the principal component axes, which are orthogonal to each other and ordered by their variance.\n\nIn the context of ensemble learning, variance also refers to the variability or uncertainty in the predictions made by individual models in the ensemble. A high variance ensemble will have widely varying predictions for the same input, which can lead to overfitting and poor generalization performance. To mitigate this, ensemble methods often use techniques such as bagging (bootstrap aggregating) or boosting to reduce the variance of the predictions by combining the outputs of multiple models. By doing so, they can achieve lower overall error and be

### **RetrievalQA chain types : [Map reduce, Refine, Map rerank (Legacy)](https://python.langchain.com/docs/modules/chains/)**

- Whatever techniques we have used is stuff method (default - chain_type="stuff") and there is only one call to LLM

-- The **"stuff"** method, designated as the default chain type **("chain_type='stuff'")**, typically refers to a basic approach where retrieved documents are directly processed by the Large Language Model (LLM) without employing specific strategies like **map reduce** or **reranking**.

-- It implies a straightforward application of the LLM to generate answers based on the retrieved documents, without additional refinement or iterative processing.

But, in the following approach, we will use the **Map Reduce**. The RetrievalQA chain types represent different strategies for combining retrieval and question-answering (QA) methods.
1. **Map Reduce:**
* In the Map Reduce strategy, documents are initially retrieved using a retrieval method such as MMR (Maximal Marginal Relevance).
* The retrieved documents are then processed in parallel (mapped) to generate candidate answers using the QA model.
* Finally, the candidate answers are reduced or combined to produce a final answer.
2. **Refine:**
* The Refine strategy involves an iterative process where retrieved documents are refined or filtered based on their relevance and other criteria.
* After an initial retrieval step, the documents are further refined through multiple iterations, potentially incorporating user feedback or additional context.
* This iterative refinement process aims to improve the quality and relevance of the retrieved documents and answers over time.
3. **Map Rerank (Legacy):**
* In the Map Rerank strategy, retrieved documents are first ranked based on their relevance to the query using a retrieval method.
* The top-ranked documents are then reranked or reevaluated using the QA model to generate final answers.
* This strategy focuses on refining the ranking of retrieved documents to prioritize the most relevant ones for generating answers.

In [50]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(search_type="mmr",search_kwargs={"k": 4, "fetch_k":8}),
    chain_type="map_reduce"
)

In [51]:
question ="What principal component analysis?"
result = qa_chain_mr({"query": question})
result["result"]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

"Principal Component Analysis (PCA) is a statistical technique used to analyze and reduce the dimensionality of a large dataset by transforming it into a new set of variables called principal components. These components are linear combinations of the original variables and are ordered by their variance, with the first principal component explaining the most variability in the data. PCA is commonly used in data analysis and machine learning applications to simplify complex datasets, visualize data, and extract useful features for further analysis or modeling.\n\nIf you're asking whether we're specifically discussing PCA in this context, the answer is yes. The text provided explains what PCA is, how it works, and how it can be used to prepare data for further analysis."

### **Make it like Chatbot : Adding Memory**

1. In the below approach, we use the ConversationBufferMemory which is the most straightforward conversational memory in LangChain. As per its functionality, the raw input of the past conversation between the human and AI is passed ‚Äî in its raw form ‚Äî to the {history} parameter.
2. **ConversationBufferMemory** facilitates the storage and retrieval of conversation history, allowing systems to maintain context across multiple interactions.
3. It stores messages exchanged during conversations, enabling systems to access past interactions for context-aware responses.
4. **memory_key:** It specifies the key or identifier used to access the conversation history within the memory storage.
5. **return_messages:** It determines whether the memory should return individual messages along with their associated metadata, such as timestamps or sender information. If set to **True**, messages will be returned; if set to **False**, only aggregate information about the conversation may be returned.

In [52]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

**ConversationalRetrievalChain** is a kind of chain used to be provided with a query and to answer it using documents retrieved from the query.

It is one of the many possibilities to perform **Retrieval Augmented Generation (RAG)**.

But it won‚Äôt only answer your last query, it will also use the chat history to improve the quality of the RAG by taking into account past queries and answers when:
1. retrieving documents,
2. feeding the LLM with those documents and asking it to answer a question.
It basically works in 3 major steps:
* **Step-1 (query rephrasing):** What you need to understand that the query that will be used is not always the one that you gave the chain, but one that will be constructing using your query and the conversation history, which allow the chain to ‚Äúremember‚Äù what you are asking about. The conversational retrieval chain will for each question (except for the first) rephrase the query to take into account the chat history.
* **Step-2 (relevant document retrieval):** In this step, the chain will use the provided ‚Äòretriever‚Äô to find document relevant to the question. A retriever is basically an object with a function taking a query and returning a list of documents.
In most implementations, RAGs use a retriever constructed from a vectorstore (i.e., a vector database).
* **Step-3 (the generation):** Context + Question = Answer.  In this step, we will basically ask the LLM to answer the rephrased question using the text from the found relevant documents.

In [53]:
# Run chain
qa= ConversationalRetrievalChain.from_llm(llm,
                                       retriever=vectordb.as_retriever(search_type="mmr",search_kwargs={"k": 4, "fetch_k":8} ), # "k":2, "fetch_k":3
                                       memory=memory
                                       )

In [54]:
question = "tell me something about PCA"
result = qa.invoke({"question": question})

In [55]:
result['answer']

'Principal Component Analysis (PCA) is a statistical technique used to analyze and reduce the dimensionality of a large dataset. It finds a new set of dimensions or basis views that are orthogonal (linearly independent) and ranked based on the variance of the data along them. The goal of PCA is to find linearly independent dimensions that can losslessly represent the data points and allow us to predict or reconstruct the original dimensions. PCA helps to remove correlated dimensions by making the covariance matrix have large numbers as the main diagonal elements and zero values as the off-diagonal elements. This ensures that the data is spread out along the dimensions with high variance and the dimensions are linearly independent. Normalizing the data before doing PCA is recommended if the features have different scales to avoid misleading components. Overall, PCA is a powerful tool for data analysis and dimensionality reduction in various fields such as finance, engineering, and scien

In [56]:
question = "please list point-wise,  how does pca works?"
result = qa({"question": question})

In [57]:
print(result['answer'])

Principal Component Analysis (PCA) is a statistical technique used to analyze and reduce the dimensionality of a large dataset. It works by finding a new set of orthogonal dimensions, called principal components, that explain the maximum variance in the data. These principal components are ordered by their variance, with the first component explaining the most variance and subsequent components explaining less.

Here's a step-by-step explanation of how PCA works:

1. Data Preprocessing: Before applying PCA, it's essential to preprocess the data. This involves normalizing the data to have zero mean and unit variance. This step ensures that all features have equal importance in the analysis and prevents the algorithm from being dominated by features with larger scales.

2. Covariance Matrix Calculation: The next step is to calculate the covariance matrix of the preprocessed data. The covariance matrix represents the pairwise relationships between the features.

3. Eigenvalue and Eigenvec

In [58]:
question = "what do we get from covariance matrix for doing PCA?"
result = qa({"question": question})
print(result['answer'])

The covariance matrix plays a crucial role in performing Principal Component Analysis (PCA). In PCA, we aim to find a new set of dimensions or a set of basis views that are orthogonal and ranked according to the variance of the data along them. The covariance matrix helps us calculate the variance and covariance of the data points along each dimension. By calculating the covariance matrix, we can identify the dimensions with high variance and covariance, which indicates that they contain most of the information in the data. We then replace these dimensions with linear combinations of the related n dimensions to reduce the dimensionality of the data while preserving most of the variance. This process helps us to identify the underlying structure and patterns in the data more clearly and efficiently.


### **Download the vector DB**

In [59]:
# Zip the entire folder
!zip -r /content/docs.zip /content/docs

  adding: content/docs/ (stored 0%)
  adding: content/docs/chroma/ (stored 0%)
  adding: content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/ (stored 0%)
  adding: content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/header.bin (deflated 61%)
  adding: content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/length.bin (deflated 19%)
  adding: content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/link_lists.bin (stored 0%)
  adding: content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/data_level0.bin (deflated 100%)
  adding: content/docs/chroma/chroma.sqlite3 (deflated 69%)


In [60]:
files.download("/content/docs.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### **Upload the vector db from previous step and unzip**

In [61]:
!unzip /content/docs.zip  -d /

Archive:  /content/docs.zip
replace /content/docs/chroma/9992ae92-f624-4fee-bb5e-d6806febba95/header.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### Please answer the questions below to complete the experiment:




In [62]:
#@title Contextual Compression Retriever has 2 major components which are { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "All of the above" #@param ["", "embedding and vectordb", "map and reduce", "RecursiveCharacterTextSplitter and Document loader", "a base retriever and a Document Compressor", "All of the above"]

In [63]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Too Difficult for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]

In [64]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "I will need more homework to grasp these topics" #@param {type:"string"}

In [65]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]

In [66]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]

In [67]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]

In [68]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 5443
Date of submission:  21 May 2024
Time of submission:  15:48:11
View your submissions: https://cds-iisc.talentsprint.com/notebook_submissions
