**DEVELOPING A CHAT ASSISTANT USING RETRIEVAL AUGMENTED GENERATION (RAG)**

# **Project Overview:**

This Jupyter Notebook demonstrates the implementation of a Retrieval Augmented Generation (RAG) chat assistant using:

- LangChain for document processing and retrieval,

- Hugging Face Embeddings for text vectorization,

- Chroma Vector Database for document storage,

- Groq's LLM for generating responses,

- Gradio for creating an interactive web interface

Key Components:

- Document Loading,

- Text Splitting,

- Embedding Generation,

- Vector Database Creation,

- Conversational Retrieval Chain,

- Gradio Interface

# **Step 1: Library Installation**

Install the required libraries for our RAG chat assistant. Note the specific versions to ensure compatibility.

In [23]:
# Install relevant libraries with specific versions
!pip install chromadb==0.5.5 langchain-chroma==0.1.2 langchain==0.2.11 langchain-community==0.2.10 langchain-text-splitters==0.2.2 langchain-groq==0.1.6 transformers==4.43.2 sentence-transformers==3.0.1 unstructured==0.15.0 unstructured[pdf]==0.15.0 gradio pydantic-settings



# **Step 2: Import Required Libraries**

Import the necessary Python and LangChain libraries for our RAG chat assistant. We'll use:

- Standard Python libraries for timing and text processing,

- Gradio for web interface,

- LangChain components for document loading, text splitting, embedding, and retrieval

In [25]:
# Import relevant libraries

import time
import textwrap
import gradio as gr

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain.chains import ConversationalRetrievalChain

from config import settings
import warnings
warnings.filterwarnings('ignore')

# **Step 3: API Key Configuration**

Retrieve the Groq API key from the configuration settings. It's crucial to keep API keys secure and not hardcode them in the notebook.

**NOTE:**

If you do not have a groq api key, please visit [the link](https://console.groq.com/login) to signup and get your API key.

In [26]:
# Retrieve Groq API key from the `config` python script
# Assign the retrieved key to a variable
groq_api_key = settings.groq_api_key

# **Step 4: Document Loading**

Load PDF documents from specified file paths. This example uses multiple documents.
The PDF documents will be what our chat assistant will use in retrieving information.

**Note:**

- Adjust file paths according to your document locations.

- `PyPDFLoader` supports multiple document types.

In [27]:
# Define file paths for documents to be loaded
file_path = [
    "/content/A_B Testing Explained.pdf",
    "/content/NigeriaDataProtectionRegulation11.pdf",
    "/content/Explore a strategy for sustained employee and organizational Performance.pdf",
    "/content/Chapter 4 Exploratory Data Analysis.pdf"
]

# Initialize a list to store loaded documents
documents = []

# Iterate through each file path,
# load the documents using PyPDFLoader
# append contents in `documents` list
for path in file_path:
    loader = PyPDFLoader(path)
    doc = loader.load()
    documents.append(doc)

In [28]:
# Preview the first document in the `documents` list
documents[0]

[Document(metadata={'source': '/content/A_B Testing Explained.pdf', 'page': 0}, page_content='A/BTESTING\nA/Btesting, also known as split testing, isasimpleyet powerfulmethod used to compare two versions of a product, webpage, orfeaturetodeterminewhichoneperformsbetter.It\'swidelyusedinmarketing, product development, and UX design to makedata-drivendecisions.\nHowA/BTestingWorks1. Formulate a Hypothesis: Start by identifying what youwant to improve. For example, you might hypothesize thatchanging the color of a "BuyNow"buttonwill increasetheconversionrate.2. Create Variants: Develop two versions of the item youwanttotest:a. A(Control):Thisistheoriginalversion.b. B (Treatment/Variant): This is the modified versionthatincludesthechangeyou\'retesting.3. Divide the Audience: Randomly split your audience intotwogroups:a. GroupA:Seesthecontrolversion.b. GroupB:Seesthevariantversion.4. Run the Test: Both groups interact with their respectiveversions, and data is collected on how they perform.

# **Step 5: Text Chunking**

Split documents into smaller, manageable text chunks to improve retrieval efficiency.

**Chunking Parameters:**

`chunk_size`: 1700 characters (adjust based on document complexity)

`chunk_overlap`: 200 characters to maintain context between chunks

In [29]:
# Initialize text splitter with specified chunk size and overlap
text_splitter = CharacterTextSplitter(
    chunk_size=1700,
    chunk_overlap=200
)

In [30]:
# Split all documents into text chunks
texts = []
for doc in documents:
    text_chunks = text_splitter.split_documents(doc)
    texts.extend(text_chunks)

In [31]:
# Preview the first text chunk
print(texts[0])

page_content='A/BTESTING
A/Btesting, also known as split testing, isasimpleyet powerfulmethod used to compare two versions of a product, webpage, orfeaturetodeterminewhichoneperformsbetter.It'swidelyusedinmarketing, product development, and UX design to makedata-drivendecisions.
HowA/BTestingWorks1. Formulate a Hypothesis: Start by identifying what youwant to improve. For example, you might hypothesize thatchanging the color of a "BuyNow"buttonwill increasetheconversionrate.2. Create Variants: Develop two versions of the item youwanttotest:a. A(Control):Thisistheoriginalversion.b. B (Treatment/Variant): This is the modified versionthatincludesthechangeyou'retesting.3. Divide the Audience: Randomly split your audience intotwogroups:a. GroupA:Seesthecontrolversion.b. GroupB:Seesthevariantversion.4. Run the Test: Both groups interact with their respectiveversions, and data is collected on how they perform. Thiscould be clicks, sign-ups, purchases, or any other metricrelevanttoyourgoal.5. 

**Code Explanation**

The code above breaks down large documents into smaller, more manageable chunks. This technique, known as text chunking, enhances the efficiency of information retrieval systems (like our chat assistant).

**Key Components:**

`CharacterTextSplitter`: This method is specifically designed to split text into chunks based on character counts. It takes in several parameters but we'll use only 2;

`chunk_size`: This parameter defines the maximum number of characters in a single chunk. In this case, it's set to 1700 characters. This size can be adjusted based on the complexity of the documents. For simpler texts, larger chunk sizes might be suitable, while more complex documents may benefit from smaller chunks.

`chunk_overlap`: This parameter specifies the number of characters that overlap between consecutive chunks. In this case, 200 characters are overlapped. This overlap ensures that context is maintained across chunk boundaries, improving the quality of search results and analysis.

**Text Chunking Process:**

- Iterate over Documents: The `for` loop iterates over each document in the documents list.

- Split Document: For each document, the text_splitter is used to divide the text into chunks of the specified size and overlap.

- Append Chunks: The resulting chunks are appended to the texts list, creating a flattened list of all chunks from all documents.

**Previewing the First Chunk:**

The `print(texts[0])` line displays the content of the first chunk in the texts list. This provides a quick way to inspect the results of the chunking process.

**Why Chunking?**

- Improved Retrieval Efficiency: Smaller chunks can be indexed and searched more quickly than large documents.

- Enhanced Contextual Understanding: The overlap between chunks helps maintain context, leading to more accurate search results.

- Scalability: Chunking allows for efficient processing and storage of large document collections.

- Flexibility: Chunks can be used for various tasks, such as summarization, translation, or sentiment analysis.

# **Step 6: Embedding Generation and Vector db creation**

Convert text chunks into vector embeddings using Hugging Face embeddings.

**Process:**

1. Create embeddings,

2. Define persistent directory for vector database (optional)

3. Create Chroma vector database

In [32]:
# Initialize Hugging Face embeddings
embedding = HuggingFaceEmbeddings()

# Set persistent directory for vector database storage
persist_directory = "/content/chroma_db"

In [33]:
# Create Chroma vector database with embedded documents
vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    persist_directory=persist_directory
)

In [34]:
# Create retriever to fetch relevant document chunks
retriever = vectordb.as_retriever()

**Code Explanation**

The above code focuses on transforming text chunks into numerical representations (embeddings) and storing them in a vector database for efficient similarity search.

**Key Components:**

`HuggingFaceEmbeddings`: This is a pre-trained model from Hugging Face that can generate dense vector representations of text. When provided with text, it produces numerical vectors that capture semantic and syntactic information.

`persist_directory`: The variable specifies the path to a directory where the vector database will be stored. This allows for persistent storage and retrieval of the database.

`Chroma.from_documents`: The  function creates a Chroma vector database. It takes the texts i.e. the chunked documents, the embedding model, and the persist_directory as input.

- Vector Storage: Chroma efficiently stores the text chunks along with their corresponding embeddings.

- Similarity Search: This database is optimized for similarity search, enabling efficient retrieval of relevant documents based on semantic similarity.
Retriever:

Query Processing: The `retriever` object is created from the vector database. It can be used to process queries and return the most relevant document chunks based on semantic similarity.

**Why Embeddings and Vector Databases?**

- Semantic Search: Embeddings allow for more nuanced search, going beyond exact keyword matching.

- Efficient Retrieval: Vector databases are highly optimized for similarity search, enabling fast retrieval of relevant documents.

- Contextual Understanding: Embeddings capture the semantic and syntactic context of text, leading to more accurate and relevant search results.

- Diverse Applications: This approach can be used for various tasks, including question answering, recommendation systems, and document summarization.

# **Step 7: Language Model Configuration**

Set up the Groq Language Model (LLM) with specific parameters:

- Model: Llama 3.1 70B Versatile,

- Temperature: 0.5 (balanced creativity and consistency)

**Note:**

- Temperature controls randomness of responses\,

- Lower values make responses more focused and deterministic

In [35]:
# Initialize Groq Language Model
llm = ChatGroq(
    model="llama-3.1-70b-versatile",
    temperature=0.5,
    groq_api_key=groq_api_key
)

In [36]:
# Create conversational retrieval chain
conv_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

**Code Explanation**

The code sets up a language model and a conversational retrieval chain to enable more sophisticated and context-aware interactions with the user.

**Key Components:**

1. **Groq Language Model (LLM):**

- **Model Selection:** The llama-3.1-70b-versatile model is chosen for its versatility and capability to handle a wide range of tasks.

- **Temperature Setting:** The temperature parameter is set to 0.5. This balances creativity and consistency in the model's responses. A lower temperature results in more focused and deterministic outputs.

2. **Conversational Retrieval Chain:**

- **LLM Integration:** The ChatGroq LLM is integrated into the conversational retrieval chain.

- **Chain Type:** The stuff chain type is used, where the entire document is fed into the LLM at once. This can be useful for more complex queries.

- **Retriever Integration:** The retriever object, created earlier, is integrated into the chain. This allows the LLM to access relevant information from the vector database during the conversation.

- **Source Document Return:** The return_source_documents parameter is set to True, allowing the chain to return the source documents that were used to generate the response. This can be helpful for fact-checking and transparency.

**How it Works:**

- User Query: A user poses a query.

- Retrieval: The retriever searches the vector database for relevant document chunks based on semantic similarity.

- LLM Processing: The LLM processes the query and the retrieved documents to generate a response.

- Response Generation: The LLM leverages its knowledge and the retrieved information to formulate a comprehensive and informative response.

- Response and Source Documents: The generated response and the source documents used to create the response are returned to the user.

# **Step 8: Question Processing Function**

Implement a robust function to process user questions with:

- Error handling,

- Response time tracking,

- Chat history management"

In [37]:
# Invoke the conversational chain to ask our question and get a response
question = "What is AB testing?"
response = conv_chain.invoke({"question": question, "chat_history": []})
print(f"Answer: {response['answer']}")
print(f"Source Document: {response['source_documents']}")

Answer: A/B testing, also known as split testing, is a simple yet powerful method used to compare two versions of a product, webpage, or feature to determine which one performs better. It's widely used in marketing, product development, and UX design to make data-driven decisions.
Source Document: [Document(metadata={'page': 7, 'source': '/content/A_B Testing Explained.pdf'}, page_content='BenefitsofA/BTesting\n● Data-Driven Decisions: A/B testing removes guesswork,allowingdecisionstobebasedonactualdata.● Optimization: It helps optimize user experience andincreaseconversionsbyidentifyingwhatworksbest.● Cost-Effective: Implementing successful changesidentifiedthrough A/B testing can lead to significant improvementswithoutmajorinvestments.\nCommonMistakestoAvoid\n● TestingTooManyChangesat Once: Focusononechangeatatimetoclearlyunderstanditsimpact.● Stopping the Test Too Early: Ensure the test runs longenoughtogathersufficientdata.● Ignoring External Factors: Consider other variables, like

In [38]:
def process_question(user_question, history):
    """
    Process a user's question and retrieve an answer using a conversational retrieval chain.

    Args:
        user_question (str): User's input question,
        history (list): Previous conversation history

    Returns:
        tuple: Updated chat history and response
    """
    try:
        start_time = time.time()

        # Initialize empty history if None
        if history is None:
            history = []

        # Prepare chat history for the retrieval chain
        chat_history = [(h[0], h[1].split("\n\nResponse time:")[0]) for h in history]

        # Debug print
        print(f"Processing question: {user_question}")
        print(f"Chat history: {chat_history}")

        # Invoke conversational chain with both the question and chat_history
        response = conv_chain.invoke({"question": user_question, "chat_history": chat_history})

        # If response is a dict, extract the actual response text
        if isinstance(response, dict) and 'answer' in response:
            response = response['answer']

        # Measure the response time
        end_time = time.time()
        response_time = f"Response time: {end_time - start_time:.2f} seconds."

        # Combine the response and the response time
        full_response = f"{response}\n\n{response_time}"

        # Update the conversation history
        history.append((user_question, full_response))

        # Debug print
        print(f"Processed successfully. Response: {full_response}")

        return history, history, full_response

    except Exception as e:
        error_message = f"An error occurred: {str(e)}"
        print(error_message)
        return history, history, error_message



# **Step 9: Gradio Interface**

Create an interactive web interface for the chat assistant using Gradio.

Interface Features:

- Text input for questions,

- Chatbot conversation history,

- Latest answer display

In [39]:
# Setup the Gradio interface for the chat assistant
iface = gr.Interface(
    fn=process_question,
    inputs=[
        gr.Textbox(lines=2, placeholder="Type your question here..."),
        gr.State()
    ],
    outputs=[
        gr.Chatbot(),
        gr.State(),
        gr.Textbox(label="Latest Answer")
    ],
    title="Chat Assistant",
    description="Ask any question about the document provided."
)

# Launch the interface
iface.launch(share=True)

"""
Note: share=True enables public link generation
"""

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://5ac638786de20827a4.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


'\nNote: share=True enables public link generation\n'