**DEVELOPING A CHAT ASSISTANT USING RETRIEVAL AUGMENTED GENERATION (RAG)**

# **Project Overview:**

This Jupyter Notebook demonstrates the implementation of a Retrieval Augmented Generation (RAG) chat assistant using:

- LangChain for document processing and retrieval,

- Hugging Face Embeddings for text vectorization,

- Chroma Vector Database for document storage,

- Groq's LLM for generating responses,

- Gradio for creating an interactive web interface

Key Components:

- Document Loading,

- Text Splitting,

- Embedding Generation,

- Vector Database Creation,

- Conversational Retrieval Chain,

- Gradio Interface

# **Step 1: Library Installation**

Install the required libraries for our RAG chat assistant. Note the specific versions to ensure compatibility.

In [1]:
# Install relevant libraries with specific versions
!pip install chromadb==0.5.5 langchain-chroma==0.1.2 langchain==0.2.11 langchain-community==0.2.10 langchain-text-splitters==0.2.2 langchain-groq==0.1.6 transformers==4.43.2 sentence-transformers==3.0.1 unstructured==0.15.0 unstructured[pdf]==0.15.0 gradio pydantic-settings

Collecting chromadb==0.5.5
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting langchain-chroma==0.1.2
  Downloading langchain_chroma-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain==0.2.11
  Downloading langchain-0.2.11-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community==0.2.10
  Downloading langchain_community-0.2.10-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain-text-splitters==0.2.2
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Collecting langchain-groq==0.1.6
  Downloading langchain_groq-0.1.6-py3-none-any.whl.metadata (2.8 kB)
Collecting transformers==4.43.2
  Downloading transformers-4.43.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m971.3 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence-transformers==3.0.1
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting unstructured

# **Step 2: Import Required Libraries**

Import the necessary Python and LangChain libraries for our RAG chat assistant. We'll use:

- Standard Python libraries for timing and text processing,

- Gradio for web interface,

- LangChain components for document loading, text splitting, embedding, and retrieval

In [2]:
# Import relevant libraries

import time
import textwrap
import gradio as gr
from typing import List, Tuple, Optional

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory

from config import settings
import warnings
warnings.filterwarnings('ignore')

# **Step 3: API Key Configuration**

Retrieve the Groq API key from the configuration settings. It's crucial to keep API keys secure and not hardcode them in the notebook.

**NOTE:**

If you do not have a groq api key, please visit [the link](https://console.groq.com/login) to signup and get your API key.

In [3]:
# Retrieve Groq API key from the `config` python script
# Assign the retrieved key to a variable
groq_api_key = settings.groq_api_key

# **Step 4: Document Loading**

Load PDF documents from specified file paths. This example uses multiple documents.
The PDF documents will be what our chat assistant will use in retrieving information.

**Note:**

- Adjust file paths according to your document locations.

- `PyPDFLoader` supports multiple document types.

In [4]:
# Define file paths for documents to be loaded
file_path = [
    "/content/A_B Testing Explained.pdf",
    "/content/NigeriaDataProtectionRegulation11.pdf",
    "/content/Explore a strategy for sustained employee and organizational Performance.pdf",
    "/content/Chapter 4 Exploratory Data Analysis.pdf"
]

# Initialize a list to store loaded documents
documents = []

# Iterate through each file path,
# load the documents using PyPDFLoader
# append contents in `documents` list
for path in file_path:
    loader = PyPDFLoader(path)
    doc = loader.load()
    documents.append(doc)

In [5]:
# Preview the first document in the `documents` list
documents[0]

[Document(metadata={'source': '/content/A_B Testing Explained.pdf', 'page': 0}, page_content='A/BTESTING\nA/Btesting, also known as split testing, isasimpleyet powerfulmethod used to compare two versions of a product, webpage, orfeaturetodeterminewhichoneperformsbetter.It\'swidelyusedinmarketing, product development, and UX design to makedata-drivendecisions.\nHowA/BTestingWorks1. Formulate a Hypothesis: Start by identifying what youwant to improve. For example, you might hypothesize thatchanging the color of a "BuyNow"buttonwill increasetheconversionrate.2. Create Variants: Develop two versions of the item youwanttotest:a. A(Control):Thisistheoriginalversion.b. B (Treatment/Variant): This is the modified versionthatincludesthechangeyou\'retesting.3. Divide the Audience: Randomly split your audience intotwogroups:a. GroupA:Seesthecontrolversion.b. GroupB:Seesthevariantversion.4. Run the Test: Both groups interact with their respectiveversions, and data is collected on how they perform.

# **Step 5: Text Chunking**

Split documents into smaller, manageable text chunks to improve retrieval efficiency.

**Chunking Parameters:**

`chunk_size`: 1700 characters (adjust based on document complexity)

`chunk_overlap`: 200 characters to maintain context between chunks

In [6]:
# Initialize text splitter with specified chunk size and overlap
text_splitter = CharacterTextSplitter(
    chunk_size=1700,
    chunk_overlap=200
)

In [7]:
# Split all documents into text chunks
texts = []
for doc in documents:
    text_chunks = text_splitter.split_documents(doc)
    texts.extend(text_chunks)

In [8]:
# Preview the first text chunk
print(texts[0])

page_content='A/BTESTING
A/Btesting, also known as split testing, isasimpleyet powerfulmethod used to compare two versions of a product, webpage, orfeaturetodeterminewhichoneperformsbetter.It'swidelyusedinmarketing, product development, and UX design to makedata-drivendecisions.
HowA/BTestingWorks1. Formulate a Hypothesis: Start by identifying what youwant to improve. For example, you might hypothesize thatchanging the color of a "BuyNow"buttonwill increasetheconversionrate.2. Create Variants: Develop two versions of the item youwanttotest:a. A(Control):Thisistheoriginalversion.b. B (Treatment/Variant): This is the modified versionthatincludesthechangeyou'retesting.3. Divide the Audience: Randomly split your audience intotwogroups:a. GroupA:Seesthecontrolversion.b. GroupB:Seesthevariantversion.4. Run the Test: Both groups interact with their respectiveversions, and data is collected on how they perform. Thiscould be clicks, sign-ups, purchases, or any other metricrelevanttoyourgoal.5. 

**Code Explanation**

The code above breaks down large documents into smaller, more manageable chunks. This technique, known as text chunking, enhances the efficiency of information retrieval systems (like our chat assistant).

**Key Components:**

`CharacterTextSplitter`: This method is specifically designed to split text into chunks based on character counts. It takes in several parameters but we'll use only 2;

`chunk_size`: This parameter defines the maximum number of characters in a single chunk. In this case, it's set to 1700 characters. This size can be adjusted based on the complexity of the documents. For simpler texts, larger chunk sizes might be suitable, while more complex documents may benefit from smaller chunks.

`chunk_overlap`: This parameter specifies the number of characters that overlap between consecutive chunks. In this case, 200 characters are overlapped. This overlap ensures that context is maintained across chunk boundaries, improving the quality of search results and analysis.

**Text Chunking Process:**

- Iterate over Documents: The `for` loop iterates over each document in the documents list.

- Split Document: For each document, the text_splitter is used to divide the text into chunks of the specified size and overlap.

- Append Chunks: The resulting chunks are appended to the texts list, creating a flattened list of all chunks from all documents.

**Previewing the First Chunk:**

The `print(texts[0])` line displays the content of the first chunk in the texts list. This provides a quick way to inspect the results of the chunking process.

**Why Chunking?**

- Improved Retrieval Efficiency: Smaller chunks can be indexed and searched more quickly than large documents.

- Enhanced Contextual Understanding: The overlap between chunks helps maintain context, leading to more accurate search results.

- Scalability: Chunking allows for efficient processing and storage of large document collections.

- Flexibility: Chunks can be used for various tasks, such as summarization, translation, or sentiment analysis.

# **Step 6: Embedding Generation and Vector db creation**

Convert text chunks into vector embeddings using Hugging Face embeddings.

**Process:**

1. Create embeddings,

2. Define persistent directory for vector database (optional)

3. Create Chroma vector database

In [9]:
# Initialize Hugging Face embeddings
embedding = HuggingFaceEmbeddings()

# Set persistent directory for vector database storage
persist_directory = "/content/chroma_db"

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [10]:
# Create Chroma vector database with embedded documents
vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    persist_directory=persist_directory
)

In [11]:
# Create retriever to fetch relevant document chunks
retriever = vectordb.as_retriever()

**Code Explanation**

The above code focuses on transforming text chunks into numerical representations (embeddings) and storing them in a vector database for efficient similarity search.

**Key Components:**

`HuggingFaceEmbeddings`: This is a pre-trained model from Hugging Face that can generate dense vector representations of text. When provided with text, it produces numerical vectors that capture semantic and syntactic information.

`persist_directory`: The variable specifies the path to a directory where the vector database will be stored. This allows for persistent storage and retrieval of the database.

`Chroma.from_documents`: The  function creates a Chroma vector database. It takes the texts i.e. the chunked documents, the embedding model, and the persist_directory as input.

- Vector Storage: Chroma efficiently stores the text chunks along with their corresponding embeddings.

- Similarity Search: This database is optimized for similarity search, enabling efficient retrieval of relevant documents based on semantic similarity.
Retriever:

Query Processing: The `retriever` object is created from the vector database. It can be used to process queries and return the most relevant document chunks based on semantic similarity.

**Why Embeddings and Vector Databases?**

- Semantic Search: Embeddings allow for more nuanced search, going beyond exact keyword matching.

- Efficient Retrieval: Vector databases are highly optimized for similarity search, enabling fast retrieval of relevant documents.

- Contextual Understanding: Embeddings capture the semantic and syntactic context of text, leading to more accurate and relevant search results.

- Diverse Applications: This approach can be used for various tasks, including question answering, recommendation systems, and document summarization.

# **Step 7: Language Model Configuration**

Set up the Groq Language Model (LLM) with specific parameters:

- Model: Llama 3.1 70B Versatile,

- Temperature: 0.5 (balanced creativity and consistency)

**Note:**

- Temperature controls randomness of responses\,

- Lower values make responses more focused and deterministic

In [16]:
# Initialize Groq Language Model
llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0.5,
    groq_api_key=groq_api_key
)

In [17]:
# Define custom prompt template
template = """
    You are a knowledgeable and helpful AI assistant.
    Use the following pieces of context to answer the question, but do not mention or refer to the context/documents in your response.
    Instead, respond as if the knowledge is part of your own understanding.

Context: {context}

Current conversation:
{chat_history}

Question: {question}

Helpful Answer:"""

# Create prompt from template
QA_PROMPT = PromptTemplate(
    template=template,
    input_variables=['context', 'chat_history', 'question']
)

# Initialize conversation memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    output_key="answer",
    return_messages=True
)

In [18]:
# Create conversational retrieval chain
conv_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    combine_docs_chain_kwargs={'prompt': QA_PROMPT},
    memory=memory,
)

**Code Explanation**

The code sets up a language model and a conversational retrieval chain to enable more sophisticated and context-aware interactions with the user.

**Key Components:**

1. **Groq Language Model (LLM):**

- **Model Selection:** The llama-3.1-70b-versatile model is chosen for its versatility and capability to handle a wide range of tasks.

- **Temperature Setting:** The temperature parameter is set to 0.5. This balances creativity and consistency in the model's responses. A lower temperature results in more focused and deterministic outputs.

2. **Defining a Custom Prompt Template:**

- **template:** The multiline string defines the structure of the prompt that will be used to instruct the AI model.
It also outlines the structure of the input:

    **Context:** A placeholder to to contain the context information.

    **Current conversation:** Another placeholder to hold the history of the current conversation, allowing the AI to maintain context across multiple exchanges.

    **Question:** A placeholder to hold the user's question.

Finally, it includes the directive "Helpful Answer:" to signal the expected output from the AI.

- **QA_PROMPT:** Creates an instance of the PromptTemplate class.

    **template:** This argument assigns the defined template string to the PromptTemplate object.

    **input_variables:** This argument specifies the names of the variables that will be dynamically filled in the template.

- **memory:** This line creates an instance of the ConversationBufferMemory class. This class is designed to store and manage the conversation history.

    **memory_key:** This argument specifies the key within the memory that will store the conversation history.

    **output_key:** This argument specifies the key that will be used to store the AI's responses.

    **return_messages:** This argument controls whether the memory object should return the conversation history.

3. **Conversational Retrieval Chain:**

- **LLM Integration:** The ChatGroq LLM is integrated into the conversational retrieval chain.

- **Chain Type:** The stuff chain type is used, where the entire document is fed into the LLM at once. This can be useful for more complex queries.

- **Retriever Integration:** The retriever object, created earlier, is integrated into the chain. This allows the LLM to access relevant information from the vector database during the conversation.

- **Source Document Return:** The `return_source_documents` parameter is set to True, allowing the chain to return the source documents that were used to generate the response. This can be helpful for fact-checking and transparency.

- **Using the template to customise the AI reply:** The `combine_docs_chain_kwargs` parameter is a dictionary that takes the pre-defined prompt template (`QA_PROMPT`) as key-pair value. Basically, it ensures that the language model uses the predefined `QA_PROMPT` to guide its processing and response generation.

**How it Works:**

- User Query: A user poses a query.

- Retrieval: The retriever searches the vector database for relevant document chunks based on semantic similarity.

- LLM Processing: The LLM processes the query and the retrieved documents to generate a response.

- Response Generation: The LLM leverages its knowledge and the retrieved information to formulate a comprehensive and informative response. The information returned is based on the custom template.

- Response and Source Documents: The generated response and the source documents used to create the response are returned to the user.

# **Step 8: Using Gradio, create an interactive web interface**

Create an interactive web interface for question processing using Gradio.

**Key components:**

- Define a robust Class to process user questions with:

    - Error handling,

    - Response time tracking,

    - Chat history management

- Web Interface Features:

    - Text input for questions,

    - Chatbot conversation history,

    - Latest answer display

In [19]:
# Invoke the conversational chain to ask our question and get a response
question = "What is AB testing?"
response = conv_chain.invoke({"question": question, "chat_history": []})
print(f"Answer: {response['answer']}")
print(f"Source Document: {response['source_documents']}")

Answer: A/B testing is a straightforward and effective method used to compare two versions of a product, webpage, or feature to determine which one performs better. It involves creating two versions, a control and a variant, and randomly splitting an audience into two groups to see how they interact with each version. The goal is to collect data on how each version performs and then analyze the results to make informed decisions about which version is more effective. This approach helps remove guesswork and allows decisions to be based on actual data, leading to optimized user experiences and increased conversions.
Source Document: [Document(metadata={'page': 7, 'source': '/content/A_B Testing Explained.pdf'}, page_content='BenefitsofA/BTesting\n● Data-Driven Decisions: A/B testing removes guesswork,allowingdecisionstobebasedonactualdata.● Optimization: It helps optimize user experience andincreaseconversionsbyidentifyingwhatworksbest.● Cost-Effective: Implementing successful changesid

In [20]:
class ChatInterface:
    def __init__(self, conversation_chain):
        self.conv_chain = conversation_chain

    def process_message(
        self,
        message: str,
        history: Optional[List[Tuple[str, str]]] = None
    ) -> Tuple[List[Tuple[str, str]], List[Tuple[str, str]]]:
        try:
            history = history or []

            # Get response from conversation chain
            response = self.conv_chain.invoke({
                "question": message,
                "chat_history": history
            })

            if isinstance(response, dict):
                response = response.get('answer', 'No response found')

            # Update history with new message pair
            new_history = history + [(message, response)]

            # Return both the display history and state history
            return new_history, new_history

        except Exception as e:
            error_msg = f"Error processing message: {str(e)}"
            print(f"Error occurred: {error_msg}")
            return history + [(message, error_msg)], history + [(message, error_msg)]

    def create_interface(self) -> gr.Interface:
        with gr.Blocks(title="Personal AI Chat Assistant") as interface:
            gr.Markdown(
                """<div style="text-align: center; font-size: 2.5em;">
                <strong>
                Personal AI Chat Assistant
                </strong>
                </div>
                """
                )
            gr.Markdown("Hi! 👋🏿")
            gr.Markdown("Ask me anything about your documents, I'm here to help.")

            chatbot = gr.Chatbot(
                height=400,
                show_label=False,
                container=True,
                bubble_full_width=False
            )

            with gr.Row():
                msg = gr.Textbox(
                    placeholder="Type your question here...",
                    show_label=False,
                    container=False,
                    scale=5
                )
                submit = gr.Button(
                    "Send",
                    scale=1,
                    variant="primary"
                )

            clear = gr.Button("Clear Chat")
            state = gr.State([])

            # Modified event handlers to match return type
            submit_click = submit.click(
                self.process_message,
                inputs=[msg, state],
                outputs=[chatbot, state],
                api_name="submit"
            )

            msg_submit = msg.submit(
                self.process_message,
                inputs=[msg, state],
                outputs=[chatbot, state],
                api_name="submit_message"
            )

            clear.click(
                lambda: ([], []),
                outputs=[chatbot, state],
                api_name="clear"
            )

            # Clear input after sending
            submit_click.then(lambda: "", None, msg)
            msg_submit.then(lambda: "", None, msg)

        return interface

**Code Summary**

The code creates a chatbot interface that allows users to interact with a conversation chain. The user types their questions, and the system retrieves and displays responses through the conversation chain. The conversation history is maintained for context. Error handling is included to gracefully handle unexpected issues during processing.

**Code Explanation**

- **Class:** `ChatInterface`

This code defines a class named `ChatInterface` that acts as a user interface for a chatbot. It processes and responds to user queries.

- `__init__`: method initializes the ChatInterface object. It takes a `conversation_chain` argument, responsible for handling the conversation flow and generating responses.

- `process_message`: method is the core functionality of the class. It takes two arguments:

    - message: This is a string representing the user's question or input.

    - history (optional): This is a list of tuples where each tuple represents a message exchange in the conversation. The first element in the tuple is the user's message, and the second element is the system's response. If not provided, an empty list is used.

    The method first initializes an empty history list if none is provided.

    Then, it retrieves a response from the conversation_chain by invoking it with a dictionary containing the message and conversation history.

    If the response is a dictionary, it extracts the 'answer' key's value as the response. If the key is not found, it defaults to "No response found".

    A new conversation history is created by appending the current message and response to the existing history.

    The method returns two lists of tuples:

      - The first list represents the complete conversation history for display purposes.

      - The second list is the same as the first, potentially used for internal state management.

    In case of any exceptions during processing, the method catches the error, logs it with a message, and returns the conversation history with an error message appended as the system's response.

- `create_interface`: This method creates the graphical user interface for the chatbot. It interacts with the gradio(gr) GUI library.

The method uses gr.Blocks to define a block container titled "Personal AI Chat Assistant".

Within the block, it displays a welcome message.

  - A `gr.Chatbot` element is created to handle user chat interactions.

  - A row is created to hold a text box for the user to enter their question and a button to submit it.

  - Another button is included to clear the chat history.

Event handlers are defined for the submit button and the text box submit event. These handlers call the `process_message` method with the user's input and the current conversation history. The handlers also update the chatbot element and the internal state with the processed response.

A clear button is included, and its click event handler clears the conversation history and updates the chatbot and state elements.

In [21]:
# Initialize and launch the interface
chat_interface = ChatInterface(conv_chain)
interface = chat_interface.create_interface()
interface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d2d12d5c13d47deb5c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




**Code Summary**

The code does the following:

1. **Creates the ChatInterface object:** This object encapsulates the logic for interacting with the conversation chain and building the GUI.

2. **Creates the GUI:** The `create_interface()` method generates the visual components of the chatbot, such as the chatbox, input fields, and buttons.

3. **Launches the GUI and enables sharing:** The `launch(share=True)` method displays the chatbot interface to the user and potentially allows others to access it through a shared link.

Ultimately, this sequence of actions initializes and starts the chatbot application, making it ready for user interaction.