##**Building a Retrieval Augmented Generation (RAG) Chatbot**

Using Gemini, LangChain, and ChromaDB

This notebook will guide you through implementing the backend components of a RAG chatbot system.

##Setup and Prerequisites

First, let's install the necessary libraries.

In [1]:
# Install required packages
!pip install langchain langchain-google-genai langchain_community pypdf chromadb sentence-transformers -q
!pip install google-generativeai pdfplumber -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.9/323.9 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.8/20.8 MB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m9.0 MB/s[0m eta [36m0:00:0

Next, let's import all required libraries:

In [2]:
import os
import pdfplumber
import google.generativeai as genai
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

In [4]:
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get("GEMINI_API_KEY")

##**Section 1: Uploading PDF**
In this section, we'll implement the functionality to upload PDF files. For this notebook demonstration, we'll assume the PDF is in a local path.

In [5]:
def upload_pdf(pdf_path):
    """
    Function to handle PDF uploads.

    Args:
        pdf_path (str): Path to the PDF file

    Returns:
        str: PDF file path if successful
    """
    try:
        # In a real application with Streamlit, you would use:
        # uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")
        # But for this notebook, we'll just verify the file exists

        if os.path.exists(pdf_path):
            print(f"PDF file found at: {pdf_path}")
            return pdf_path
        else:
            print(f"Error: File not found at {pdf_path}")
            return None
    except Exception as e:
        print(f"Error uploading PDF: {e}")
        return None

In [6]:
attention_paper_path = "/content/attention_is_all_u_need.pdf"

In [7]:
upload_pdf(attention_paper_path)

PDF file found at: /content/attention_is_all_u_need.pdf


'/content/attention_is_all_u_need.pdf'

##**Section 2: Parsing the PDF and Creating Text Files**
Now we'll extract the text content from the uploaded PDFs.

In [8]:
def parse_pdf(pdf_path):
    """
    Function to extract text from PDF files.

    Args:
        pdf_path (str): Path to the PDF file

    Returns:
        str: Extracted text from the PDF
    """
    try:
        text = ""

        # Using pdfplumber to extract text
        with pdfplumber.open(pdf_path) as pdf:
            for page in pdf.pages:
                text += page.extract_text() + "\n"

        # Save the extracted text to a file (optional)
        text_file_path = pdf_path.replace('.pdf', '.txt')
        with open(text_file_path, 'w', encoding='utf-8') as f:
            f.write(text)

        print(f"PDF parsed successfully, extracted {len(text)} characters")
        return text
    except Exception as e:
        print(f"Error parsing PDF: {e}")
        return None

In [9]:
text_file = parse_pdf(attention_paper_path)

PDF parsed successfully, extracted 35526 characters


##**Section 3: Creating Document Chunks**
To effectively process and retrieve information, we need to break down our document into manageable chunks.

In [10]:
def create_document_chunks(text):
    """
    Function to split the document text into smaller chunks for processing.

    Args:
        text (str): The full text from the PDF

    Returns:
        list: List of text chunks
    """
    try:
        # Initialize the text splitter
        # We can tune these parameters based on our needs and model constraints
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=500,        # Size of each chunk in characters
            chunk_overlap=100,      # Overlap between chunks to maintain context
            length_function=len,
            separators=["\n\n", "\n", " ", ""]  # Hierarchy of separators to use when splitting
        )

        # Split the text into chunks
        chunks = text_splitter.split_text(text)

        print(f"Document split into {len(chunks)} chunks")
        print("chunks: ", chunks)
        return chunks
    except Exception as e:
        print(f"Error creating document chunks: {e}")
        return []

In [11]:
text_chunks = create_document_chunks(text_file)

Document split into 91 chunks
chunks:  ['Providedproperattributionisprovided,Googleherebygrantspermissionto\nreproducethetablesandfiguresinthispapersolelyforuseinjournalisticor\nscholarlyworks.\nAttention Is All You Need\nAshishVaswani∗ NoamShazeer∗ NikiParmar∗ JakobUszkoreit∗\nGoogleBrain GoogleBrain GoogleResearch GoogleResearch\navaswani@google.com noam@google.com nikip@google.com usz@google.com\nLlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain', 'LlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain\nllion@google.com aidan@cs.toronto.edu lukaszkaiser@google.com\nIlliaPolosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThedominantsequencetransductionmodelsarebasedoncomplexrecurrentor\nconvolutionalneuralnetworksthatincludeanencoderandadecoder. Thebest\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,', 'me

In [12]:
text_chunks

['Providedproperattributionisprovided,Googleherebygrantspermissionto\nreproducethetablesandfiguresinthispapersolelyforuseinjournalisticor\nscholarlyworks.\nAttention Is All You Need\nAshishVaswani∗ NoamShazeer∗ NikiParmar∗ JakobUszkoreit∗\nGoogleBrain GoogleBrain GoogleResearch GoogleResearch\navaswani@google.com noam@google.com nikip@google.com usz@google.com\nLlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain',
 'LlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain\nllion@google.com aidan@cs.toronto.edu lukaszkaiser@google.com\nIlliaPolosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThedominantsequencetransductionmodelsarebasedoncomplexrecurrentor\nconvolutionalneuralnetworksthatincludeanencoderandadecoder. Thebest\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,',
 'mechanism. We propose a new simple netw

##**Section 4: Embedding the Documents**
Now we'll create vector embeddings for each text chunk using Gemini's embedding model.

In [13]:
def embed_and_view(text_chunks):
    """
    Embed document chunks and display their numeric embeddings.

    Args:
        text_chunks (list): List of text chunks from the document
    """
    try:
        # Initialize the Gemini embeddings
        embedding_model = GoogleGenerativeAIEmbeddings(
            model="models/text-embedding-004"  # Specify the Gemini Embedding model
        )

        print("Embedding model initialized successfully")

        # Generate and display embeddings for all chunks
        for i, chunk in enumerate(text_chunks):
            embedding = embedding_model.embed_query(chunk)
            print(f"Chunk {i} Embedding:\n{embedding}\n")

    except Exception as e:
        print(f"Error embedding documents: {e}")

# Example usage
sample_chunks = ["This is the first chunk.", "This is the second chunk.", "And this is the third chunk."]
embed_and_view(sample_chunks)

Embedding model initialized successfully
Chunk 0 Embedding:
[0.005306204780936241, -0.019982466474175453, -0.05330009013414383, -0.037803467363119125, 0.0438869446516037, 0.012169086374342442, 0.011968716979026794, 0.030833037570118904, -0.015381194651126862, 0.02207416482269764, -0.01051324326545, 0.05356165021657944, 0.05694853141903877, 0.013736017979681492, 0.014268357306718826, -0.00033483054721727967, 0.026143047958612442, 0.002164868637919426, -0.10417395830154419, 0.03183707222342491, 0.0369376577436924, -0.026903631165623665, 0.035999879240989685, -0.041685134172439575, -0.014223109930753708, 0.002302129054442048, 0.00924470741301775, -0.036460429430007935, 0.037307705730199814, 0.0015566367655992508, 0.058599747717380524, 0.05178055167198181, -0.0052936505526304245, -0.04410144314169884, 0.014856294728815556, 0.018107743933796883, -0.0010075304890051484, 0.017477499321103096, 0.024988515302538872, -0.02734184078872204, -0.08513811230659485, 0.0653407871723175, -0.025275081396

In [14]:
def embed_documents(text_chunks):
    """
    Function to generate embeddings for the text chunks.

    Args:
        text_chunks (list): List of text chunks from the document

    Returns:
        object: Embedding model for further use
    """
    try:
        # Initialize the Gemini embeddings
        embedding_model = GoogleGenerativeAIEmbeddings(
            model="models/text-embedding-004"  # Specify the Gemini Embedding model
        )

        print("Embedding model initialized successfully")
        return embedding_model, text_chunks
    except Exception as e:
        print(f"Error embedding documents: {e}")
        return None, None

In [15]:
embedded_documents = embed_documents(text_chunks)

Embedding model initialized successfully


##**Section 5: Storing in Vector Database (ChromaDB)**
In this section, we'll store the embedded document chunks in a vector database for efficient semantic search.

In [16]:
def store_embeddings(embedding_model, text_chunks):
    """
    Function to store document embeddings in ChromaDB.

    Args:
        embedding_model: The embedding model to use
        text_chunks (list): List of text chunks to embed and store

    Returns:
        object: Vector store for retrieval
    """
    try:
        # Create a vector store from the documents
        vectorstore = Chroma.from_texts(
            texts=text_chunks,
            embedding=embedding_model,
            persist_directory="./chroma_db"  # Directory to persist the database
        )

        # Persist the vector store to disk
        vectorstore.persist()

        print(f"Successfully stored {len(text_chunks)} document chunks in ChromaDB")
        return vectorstore
    except Exception as e:
        print(f"Error storing embeddings: {e}")
        return None

In [17]:
chroma_store = store_embeddings(embedded_documents[0],embedded_documents[1])

Successfully stored 91 document chunks in ChromaDB


  vectorstore.persist()


##**Section 6: Embedding User Queries**
When a user submits a query, we need to embed it using the same embedding model to find semantically similar chunks.

In [18]:
def embed_query(query, embedding_model):
    """
    Function to embed the user's query.

    Args:
        query (str): User's question
        embedding_model: The embedding model to use

    Returns:
        list: Embedded query vector
    """
    try:
        # Generate embedding for the query
        query_embedding = embedding_model.embed_query(query)

        print("Query embedded successfully")
        return query_embedding
    except Exception as e:
        print(f"Error embedding query: {e}")
        return None

In [19]:
user_query = "Who are the authors of the Attention paper?"

In [20]:
embedded_query = embed_query(user_query, embedded_documents[0])
print(embedded_query)

Query embedded successfully
[0.026780592277646065, 0.015416848473250866, -0.050271518528461456, 0.002554000820964575, -0.0059105996042490005, 0.04259137436747551, 0.026414358988404274, 0.06467478722333908, 0.01493052113801241, 0.0037402184680104256, -0.017083164304494858, 0.013827289454638958, 0.050974294543266296, 0.012638152576982975, -0.004161462187767029, -0.03828955441713333, 0.054527826607227325, 0.016758719459176064, -0.04421818256378174, 0.03707011789083481, -0.0034565047826617956, -0.01614072546362877, -0.0002991380461025983, -0.004321862477809191, 0.00837301928550005, -0.0422188974916935, 0.013816770166158676, -0.04541477933526039, -0.016824742779135704, -0.04866284877061844, -0.0005496578523889184, 0.0390058234333992, 0.0016210851026698947, -0.0188022181391716, 0.01164932269603014, 0.06262259185314178, -0.007935280911624432, 0.028559977188706398, 0.05521153286099434, -0.07550094276666641, -0.035486288368701935, -0.02005232684314251, 0.0053126011043787, 0.06321743130683899, -

##**Section 7: Retrieval Process**
Now we'll implement the retrieval component that finds the most relevant document chunks based on the user's query.

In [21]:
def retrieve_relevant_chunks(vectorstore, query, embedding_model, k=3):
    """
    Function to retrieve the most relevant document chunks for a query.

    Args:
        vectorstore: The ChromaDB vector store
        query (str): User's question
        embedding_model: The embedding model
        k (int): Number of chunks to retrieve

    Returns:
        list: List of relevant document chunks
    """
    try:
        # Create a retriever from the vector store
        retriever = vectorstore.as_retriever(
            search_type="similarity",  # Can also use "mmr" for Maximum Marginal Relevance
            search_kwargs={"k": k}     # Number of documents to retrieve
        )

        # Retrieve relevant chunks
        relevant_chunks = retriever.get_relevant_documents(query)

        print(f"Retrieved {len(relevant_chunks)} relevant document chunks")
        return relevant_chunks
    except Exception as e:
        print(f"Error retrieving chunks: {e}")
        return []

In [22]:
relevant_chunks = retrieve_relevant_chunks(chroma_store, user_query, embedded_documents[0])

Retrieved 3 relevant document chunks


  relevant_chunks = retriever.get_relevant_documents(query)


In [23]:
relevant_chunks

[Document(metadata={}, page_content='Providedproperattributionisprovided,Googleherebygrantspermissionto\nreproducethetablesandfiguresinthispapersolelyforuseinjournalisticor\nscholarlyworks.\nAttention Is All You Need\nAshishVaswani∗ NoamShazeer∗ NikiParmar∗ JakobUszkoreit∗\nGoogleBrain GoogleBrain GoogleResearch GoogleResearch\navaswani@google.com noam@google.com nikip@google.com usz@google.com\nLlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain'),
 Document(metadata={}, page_content='attentionandtheparameter-freepositionrepresentationandbecametheotherpersoninvolvedinnearlyevery\ndetail.Nikidesigned,implemented,tunedandevaluatedcountlessmodelvariantsinouroriginalcodebaseand\ntensor2tensor.Llionalsoexperimentedwithnovelmodelvariants,wasresponsibleforourinitialcodebase,and\nefficientinferenceandvisualizations.LukaszandAidanspentcountlesslongdaysdesigningvariouspartsofand\nimplementingtensor2tensor,replacingourearliercodebase,greatlyimprovingresultsa

In [24]:
def get_context_from_chunks(relevant_chunks, splitter="\n\n---\n\n"):
    """
    Extract page_content from document chunks and join them with a splitter.

    Args:
        relevant_chunks (list): List of document chunks from retriever
        splitter (str): String to use as separator between chunk contents

    Returns:
        str: Combined context from all chunks
    """
    # Extract page_content from each chunk
    chunk_contents = []

    for i, chunk in enumerate(relevant_chunks):
        if hasattr(chunk, 'page_content'):
            # Add a chunk identifier to help with tracing which chunk provided what information
            chunk_text = f"[Chunk {i+1}]: {chunk.page_content}"
            chunk_contents.append(chunk_text)

    # Join all contents with the splitter
    combined_context = splitter.join(chunk_contents)

    return combined_context

In [25]:
context = get_context_from_chunks(relevant_chunks)

In [26]:
context

'[Chunk 1]: Providedproperattributionisprovided,Googleherebygrantspermissionto\nreproducethetablesandfiguresinthispapersolelyforuseinjournalisticor\nscholarlyworks.\nAttention Is All You Need\nAshishVaswani∗ NoamShazeer∗ NikiParmar∗ JakobUszkoreit∗\nGoogleBrain GoogleBrain GoogleResearch GoogleResearch\navaswani@google.com noam@google.com nikip@google.com usz@google.com\nLlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain\n\n---\n\n[Chunk 2]: attentionandtheparameter-freepositionrepresentationandbecametheotherpersoninvolvedinnearlyevery\ndetail.Nikidesigned,implemented,tunedandevaluatedcountlessmodelvariantsinouroriginalcodebaseand\ntensor2tensor.Llionalsoexperimentedwithnovelmodelvariants,wasresponsibleforourinitialcodebase,and\nefficientinferenceandvisualizations.LukaszandAidanspentcountlesslongdaysdesigningvariouspartsofand\nimplementingtensor2tensor,replacingourearliercodebase,greatlyimprovingresultsandmassivelyaccelerating\n\n---\n\n[Chunk 3]:

In [27]:
 final_prompt = f"""You are a helpful assistant answering questions based on provided context.

The context is taken from academic papers, and might have formatting issues like spaces missing between words.
Please interpret the content intelligently, separating words properly when they appear joined together.

Use ONLY the following context to answer the question.
If the answer cannot be determined from the context, respond with "I cannot answer this based on the provided context."

Context:
{context}

Question: {user_query}

Answer:"""

In [28]:
final_prompt

'You are a helpful assistant answering questions based on provided context.\n\nThe context is taken from academic papers, and might have formatting issues like spaces missing between words.\nPlease interpret the content intelligently, separating words properly when they appear joined together.\n\nUse ONLY the following context to answer the question.\nIf the answer cannot be determined from the context, respond with "I cannot answer this based on the provided context."\n\nContext:\n[Chunk 1]: Providedproperattributionisprovided,Googleherebygrantspermissionto\nreproducethetablesandfiguresinthispapersolelyforuseinjournalisticor\nscholarlyworks.\nAttention Is All You Need\nAshishVaswani∗ NoamShazeer∗ NikiParmar∗ JakobUszkoreit∗\nGoogleBrain GoogleBrain GoogleResearch GoogleResearch\navaswani@google.com noam@google.com nikip@google.com usz@google.com\nLlionJones∗ AidanN.Gomez∗ † ŁukaszKaiser∗\nGoogleResearch UniversityofToronto GoogleBrain\n\n---\n\n[Chunk 2]: attentionandtheparameter-free

In [29]:
def generate_response(prompt, model="gemini-2.0-flash-thinking-exp-01-21", temperature=0.3, top_p=0.95):
    """
    Function to generate a response using the Gemini model.

    Args:
        prompt (str): The prompt for the model

    Returns:
        str: Model's response
    """

    llm = ChatGoogleGenerativeAI(
            model=model,
            temperature=0.2,  # Lower temperature for more focused answers
            top_p=0.95
        )

    response = llm.invoke(prompt)

    return response.content

In [30]:
generate_response(final_prompt)

'The authors of the Attention paper are:\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Łukasz Kaiser.'