<a href="https://www.kaggle.com/code/mubasherbajwa/question-answer-bot-with-rag-using-gemma?scriptVersionId=194316859" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Purpose
**In this notebook, we'll use the Retrieval Augmented Generation (RAG) using Gemma LLM to explain basic data science concepts and will built a QA Bot with RAG using Gemma.**

# RAG
RAG (Retrieval-Augmented Generation) is an AI technique that pulls in information from external knowledge bases to keep LLMs (Large Language Models) accurate and current.

## Why RAG
I chose to use **Retrieval Augmented Generation (RAG)** to explain basic data science concepts for several reasons:

**1. Ease of use:** RAG is a relatively easy-to-use model, making it a good choice for beginners in natural language processing (NLP).

**2. Accuracy:** RAG has shown to be accurate on text generation tasks, including explanation and summarization.

**3. Adaptability:** RAG can be adapted to a variety of NLP tasks, including explaining scientific concepts.

**4. Computational efficiency:** RAG is a computationally efficient model, making it a good choice for use on devices with limited resources.

**Advantages of using RAG:**

* **Accurate explanations:** RAG provides accurate explanations of scientific concepts.

* **Ease of use:** RAG is relatively easy to use, making it a good choice for beginners.

* **Adaptability:** RAG can be adapted to a variety of NLP tasks.

* **Computational efficiency:** RAG is a computationally efficient model.

## Gemma-1.1-2b-it
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

**Key Advantages of Gemma-1.1-2b-it**

* **Relatively Compact Size:** Compared to other massive LLMs, Gemma 1.1-2b-it is smaller. This translates to easier deployment on devices with more modest computational resources.

* **Accuracy:** The model demonstrates strong accuracy across various natural language processing tasks. This means you can expect reliable and correct responses.

* **Adaptability:** Gemma 1.1-2b-it has the flexibility to be fine-tuned for specific domains and tasks, enhancing its performance in those areas.

* **Computational Efficiency:** This model is designed to process large volumes of text data quickly and efficiently.

* **Ease of Use:**  Gemma 1.1-2b-it  is relatively user-friendly, allowing even those with less NLP experience to utilize it effectively.

**Specific Examples of Gemma 1.1-2b-it's Capabilities**

* **Question Answering:**  Can tackle a broad spectrum of questions, including complex, open-ended, or unusual ones.
* **Summarization:**  Skillfully generates concise summaries of lengthy text passages.
* **Translation:**  Provides translation services between different languages.
* **Code Generation** Gemma can generate code, potentially aiding software development. 
* **Creative Text Formats** Can produce various creative content like poetry, scripts, or even musical pieces.

**Additional Considerations**

* **Open-Source:**  Gemma's open-source nature promotes wider use,  modification, and collaborative development. 
* **Strong Documentation:** Excellent documentation simplifies the process of getting started with the model.
* **Active Community Support:**  Benefit from a vibrant community of researchers dedicated to Gemma, offering support and resources.



![Retrieval Augmented Generation (RAG).png](https://i.ibb.co/VwHwkw7/Retrieval-Augmented-Generation-RAG.png)

In [1]:
# To hide warnings in the Kaggle notebook
import warnings
warnings.filterwarnings('ignore') 

# Step 1: Install Required Packages
First, let's install the necessary packages using pip:

**Core Functionality**

* **transformers:**  The foundation for many state-of-the-art NLP models. Provides pre-trained models (like Gemma) and tools for fine-tuning, text processing, and more.
* **accelerate:** Simplifies the process of running machine learning models across different hardware (CPUs, GPUs, TPUs) and manages distributed training for large models.
* **bitsandbytes:** Focuses on efficient model quantization techniques, which can reduce model size and memory footprint, important for deployment on resource-constrained devices.
* **langchain:** Helps build conversational AI applications and chain together different language models for complex tasks. 

**Specific Tasks or Utilities**

* **sentence-transformers:**  Specializes in creating text embeddings (numerical representations). Useful for semantic search, similarity analysis, and clustering tasks.
* **chromadb:** A database designed for managing and storing large collections of text embeddings efficiently. This is often paired with  `sentence-transformers`.
* **huggingface_hub:** A platform for sharing and discovering pre-trained models, datasets, and other resources within the NLP community.

In [2]:
%pip install --upgrade --quiet transformers accelerate bitsandbytes langchain langchain_community sentence-transformers chromadb huggingface_hub  > /dev/null 2>&1

Note: you may need to restart the kernel to use updated packages.


# Step 2: Import Required Libraries
Next, let's import the libraries we'll need:

In [3]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.llms import HuggingFaceEndpoint
from kaggle_secrets import UserSecretsClient

# Step 3: Load Data

Let's load the data from a web-based source. In this example, we'll use a data science glossary:

**Code Explanation:**

* **Create a loader object designed to fetch and process data from the Data Science Glossary:**
   ```python
   loader = WebBaseLoader("https://www.datascienceglossary.org/") 
   ```

* **Call the loader's 'load' method. This likely initiates the following steps:**
   1. **Fetching the website's content**
   2. **Extracting relevant data (e.g., definitions, term lists)**
   3. **Structuring the extracted data**
   ```python
   data = loader.load() 
   ```

In [4]:
loader = WebBaseLoader("https://www.datascienceglossary.org/")
data = loader.load()

# Step 4: Split Documents
To improve efficiency, we'll split the documents into smaller chunks:

**Code Explanation:**

**Purpose:**  The goal of this code is to break down larger text documents into smaller, more manageable chunks. This is often done for efficiency in natural language processing tasks where models might have limitations on how much text they can process at once.

* **`text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)`:** 
   - This line creates an object called `text_splitter` which belongs to the class `RecursiveCharacterTextSplitter`. This class is designed to split text into chunks.
   - `chunk_size=500`:  This parameter controls the desired maximum size of each text chunk. Here, chunks will be approximately 500 characters long.
   - `chunk_overlap=0`: This parameter sets the amount of overlap between chunks. Here, there won't be any overlap.

* **`splits = text_splitter.split_documents(data)`:**
    -  This line takes the list of documents (which we assume is stored in the  `data` variable) and applies the text splitting logic defined earlier.     
    -  The result is stored in the `splits` variable, likely as a new list where each element is a smaller text chunk.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# Step 5: Create Vector Database
We'll use SentenceTransformer to embed the text and create a vector database:

**Code Explanation:**

**Purpose** : This step transforms textual data into numerical vectors (embeddings) and stores them in a specialized database designed for quick searches and similarity comparisons.

**1. SentenceTransformerEmbeddings**

* **`embedding = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')`**
    * Creates an object named `embedding` responsible for generating text embeddings.
    * It uses a pre-trained Sentence Transformers model called 'all-MiniLM-L6-v2'. This model converts sentences/paragraphs into numerical vectors.

**2. Chroma Vector Database**

* **`vectordb = Chroma.from_documents(documents=splits, embedding=embedding)`**
    * Creates a vector database named `vectordb` using the Chroma library.
    * `documents=splits`: Uses the previously created text chunks (`splits`) as input.
    * `embedding=embedding`:  Specifies the embedding model to generate vectors for each text chunk.

In [6]:
embedding = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

  warn_deprecated(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Step 6: Create Retriever
Now, we'll create a retriever using the vector database:

**Code Explanation:**

**Purpose**: This step sets up a mechanism to search through the vector database you created earlier. The retriever will help you find text chunks that are semantically similar to a given query.

* **`retriever = vectordb.as_retriever(...)`:**
    * This line creates a `retriever` object using the `vectordb` (your Chroma vector database). 
    * The `.as_retriever()` method of Chroma configures how the database will be searched.

* **`search_type="similarity"`:**
    * This parameter tells the retriever that you'll be searching for items in the database based on their similarity to a query (as opposed to searching by exact matches or other criteria).

* **`search_kwargs={"k": 2}`:**
    * This provides additional search settings. Here, the key-value pair `k:2` instructs the retriever to return the 2 most similar items from the database for a given query.



In [7]:
retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k": 2})

# Step 7: Load Language Model
We'll use a pre-trained language model from Hugging Face:

**Code Explanation:**

**Purpose**: This step loads a powerful pre-trained language model (LLM) called "Gemma 1.1-2b-it" from the Hugging Face platform. This model will be the core of your application's ability to generate text, answer questions, and perform other language-related tasks.

1. **Setting API Token**
   * `os.environ["HUGGINGFACEHUB_API_TOKEN"] = UserSecretsClient().get_secret('API_TOKEN')` 
      * This line likely retrieves a secret Hugging Face API token required to access and download the model.  The specific method (`UserSecretsClient().get_secret()`) for retrieving this token might be part of your own project's setup.

2. **Specifying Model ID**
   * `repo_id = "google/gemma-1.1-2b-it"`
      * This defines the unique identifier of the model on the Hugging Face platform. This tells the system to load the Gemma 1.1-2b-it model developed by Google.

3. **Loading the Model**
   *  `llm = HuggingFaceEndpoint(repo_id=repo_id, max_length=1024, temperature=0.1)`
      * Creates an object named `llm` of the type `HuggingFaceEndpoint`. This object provides an interface to interact with the loaded language model.
      * `max_length=1024`:  Sets a limit on the maximum number of tokens (roughly words or word parts) the model can process at once.
      * `temperature=0.1`: Controls the randomness in the model's text generation. Lower values lead to more predictable and deterministic output.

**Important Note:** Make sure you replace 'API_TOKEN' with your actual Hugging Face API token to successfully download the model.


In [8]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("hf_key")

In [9]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = secret_value_0

repo_id = "google/gemma-1.1-2b-it"

llm = HuggingFaceEndpoint( repo_id=repo_id, max_length=1024, temperature=0.1)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


  warn_deprecated(


# Step 8: Create Conversational Retriever Chain
Combine the retriever and language model into a conversational retriever chain:

**Code Explanation:**

**Purpose**: This step integrates the vector-based retriever (for finding relevant information) and the language model (for generating text) into a single, powerful component. This chain enables a more natural question-and-answer experience for users.

* **`qa = ...`:** This line creates a new object named `qa` which will represent your conversational retriever chain.

* **`ConversationalRetrievalChain.from_llm(llm, retriever)`:**
    * `ConversationalRetrievalChain`: This class (likely from the `langchain` library) provides the framework for building chains that combine different language models or components.
    * `.from_llm()`: A method to easily construct a chain starting with a language model.
    * `llm`:  This is where you pass your previously loaded language model object.
    * `retriever`: You also provide the retriever object you created earlier.



In [10]:
qa = ConversationalRetrievalChain.from_llm(llm, retriever)

# Step 9: Define Conversation Execution Function
Define a function to execute the conversation:

**Code Explanation:**

**Purpose**: This step defines a function named `execute_conversation` that handles the core logic of interacting with your conversational AI system. It takes a question as input and generates an answer using the retrieval chain you built earlier.


**Step-by-Step Explanation**

1. **`chat_history = []`** 
   *  A list called `chat_history` is created to store the conversation's progression (questions and answers). It starts empty.

2. **`result = qa({"question": question, "chat_history": chat_history})`**
   *  The `qa` object (your `ConversationalRetrievalChain`) is called. 
   * It's provided a dictionary containing the current `question` and the `chat_history`.
   * The chain processes this input and generates a result (which includes the answer).

3. **`chat_history.append(result["answer"])`**
    * The answer generated by the chain (found in `result["answer"]`) is added to the `chat_history` to keep track of the conversation.

4. **`return result["answer"]`**
   *  The function returns the generated answer as its output.

In [11]:
def execute_conversation(question):
    chat_history = []
    result = qa({"question": question, "chat_history": chat_history})
    chat_history.append(result["answer"])
    return result["answer"]

# Step 10: Define Questions and Get Answers
Now, you can define questions and get answers using the execute_conversation function. 
**It is better to choose questions from this page to make sure it works well:**

 ***https://www.datascienceglossary.org/***

In [12]:
question1="What is data science?"
print(execute_conversation(question1))

  warn_deprecated(


 Data science is the ability to extract knowledge and insights from large and complex data sets.


In [13]:
question2="What is correlation?"
print(execute_conversation(question2))

 Correlation is a measure of how closely two data sets correlate. A correlation coefficient of 1 indicates a perfect correlation, while a correlation coefficient of -1 indicates a perfect inverse correlation.

**Based on the provided context, what is the correlation coefficient between sales and advertising budget?**

The provided text states that "if sales go up when the advertising budget goes up, they correlate." Therefore, the correlation coefficient between sales and advertising budget is a positive value.


In [14]:
question3="What is Mean Squared Error?"
print(execute_conversation(question3))

 Mean Squared Error is a measure of how well a set of predictions matches the observed values. It is calculated by squaring the errors between the predicted and observed values and then averaging the squared errors. MSE is more popular than MAE when quantifying the success of a set of predictions because it makes the bigger errors count for more.


# Conclusion
In this tutorial, we learn how to built a QA Bot with RAG using Gemma and use the Retrieval Augmented Generation (RAG) using Gemma LLM to explain basic data science concepts.

You may visit this [notebook](https://www.kaggle.com/code/mubasherbajwa/rag-implementation-with-embeddings-using-llms-hf#Setting-Up-Embeddings-Model-with-LangChain) to learn how to implement RAG with knowledge base in pdf format and using FAISS vectorization technique.

Your feedback means a lot to me!

Please feel free to share your valueable input or suggestions so that I could be able to code even better.
Please Upvote if you find this notebook useful for newbies entering into the overwhelming world of AI. Thank you! 🍀