# Advanced RAG with Gemini API

## Install Required Dependencies

We need LangChain for document processing, Chroma for vector storage, and sentence-transformers for embeddings.

In [1]:
!pip install langchain-community chromadb

Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting chromadb
  Downloading chromadb-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting requests<3,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.9 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>

In [2]:
!pip install langchain pypdf

Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.0.0-py3-none-any.whl (310 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.5/310.5 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-6.0.0


In [3]:
!pip install sentence-transformers



In [4]:
!pip install google-genai



## Set Up Gemini API

Configure your Gemini API key. You can get one from Google AI Studio for free Using this [link](https://aistudio.google.com/apikey).

In [5]:
import os
from getpass import getpass

GEMINI_API_KEY = getpass("Enter Gemini API Key:")
os.environ['GEMINI_API_KEY'] = GEMINI_API_KEY

Enter Gemini API Key:··········


## Import Libraries

Import all necessary components for our RAG pipeline.

In [6]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from google import genai
import warnings
warnings.filterwarnings('ignore')

## Load and Chunk Document

Upload your pdf to google colab (Files -> Upload to session storage).
NORE: Uploading to google colab is not permanent, you will need to upload the pdf each time you restart the session.

Check if your pdf is properly loaded.

In [11]:
!ls

Dietry.pdf  sample_data


In [12]:
# Replace 'your_document.pdf' with your actual PDF file path
documents = PyPDFLoader("Dietry.pdf").load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=100
)
texts = text_splitter.split_documents(documents)

print(f"Document split into {len(texts)} chunks")

Document split into 619 chunks


## Create Vector Embeddings

Use a local embedding model to convert text chunks into vector representations.

In [13]:
# Use local HuggingFace embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

print("Embedding model loaded")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding model loaded


## Build Vector Database

Create a Chroma vector database to store and search through document chunks.

In [14]:
db = Chroma.from_documents(texts, embeddings)
print("Vector database created")

Vector database created


## Set Up Basic Retriever

Create a retriever that finds the most similar documents based on vector similarity.

In [15]:
retriever = db.as_retriever(search_kwargs={"k": 4})
print("Basic retriever configured")

Basic retriever configured


## Add Document Re-ranking

Use a cross-encoder to re-rank retrieved documents for better relevance.

In [16]:
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")

compressor = CrossEncoderReranker(model=model, top_n=3)
re_rank_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

print("Re-ranking retriever configured")

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

Re-ranking retriever configured


## Initialize Gemini Client

Set up the Gemini API client for generating final responses.

In [17]:
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])
print("Gemini client initialized")

Gemini client initialized


## Define RAG Pipeline Function

Create a function that combines retrieval, re-ranking, and generation.

In [18]:
def rag_query(query):
    # Retrieve and re-rank relevant documents
    relevant_docs = re_rank_retriever.get_relevant_documents(query)

    # Combine document content
    context = "\n\n".join([doc.page_content for doc in relevant_docs])

    # Create prompt for Gemini
    prompt = f"""
You are an AI assistant that provides accurate answers based on the given context.
Please answer the question using only the information provided in the context.
If the answer is not in the context, say "I don't know based on the provided context."

CONTEXT:
{context}

QUESTION: {query}

ANSWER:
"""

    # Generate response using Gemini
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt
    )

    return response.text

print("RAG pipeline function defined")

RAG pipeline function defined


## Test the RAG System

Ask questions about your document to test the complete pipeline.

In [19]:
# Test query
query = "What are the main topics discussed in the document?"

response = rag_query(query)
print(f"Question: {query}")
print(f"Answer: {response}")

Question: What are the main topics discussed in the document?
Answer: Based on the provided context, the main topics discussed in the document include:

*   Keep Food Fresh: Food Storage and Sanitation
    *   Food Storage
    *   Food Sanitation
    *   Sample Food Safety Checklist
    *   Grain Requirements for Child Nutrition Programs
    *   Choking Risks
*   GETTING ORGANIZED: PURCHASING AND RECEIVING FOOD (under PART ONE: PLANNING QUALITY MEALS)
    *   STAYING ON BUDGET
    *   PURCHASING FOOD
    *   PURCHASING LOCAL FOODS
    *   RECEIVING FOOD
    *   MENU PRODUCTION RECORDS
*   UNDERSTANDING MEAL PATTERN REQUIREMENTS (under PART ONE: PLANNING QUALITY MEALS)
    *   SFSP MEAL PATTERN REQUIREMENTS
    *   FOOD COMPONENTS
    *   SERVING ADDITIONAL FOODS
    *   CREDITING FOODS
    *   MEAL MODIFICATIONS
    *   DOCUMENTING MEALS


## Advanced Features

Optional enhancements you can add to improve the RAG system.

In [22]:
def enhanced_rag_query(query, temperature=0.1, max_tokens=500):
    # Retrieve documents
    relevant_docs = re_rank_retriever.get_relevant_documents(query)

    # Show retrieved documents
    print(f"Retrieved {len(relevant_docs)} relevant documents\n")

    context = "\n\n".join([
        f"Document {i+1}: {doc.page_content[:200]}..."
        for i, doc in enumerate(relevant_docs)
    ])

    prompt = f"""
Based on the provided context, answer the following question accurately and concisely.
If you cannot find the answer in the context, clearly state that.

Context:
{context}

Question: {query}

Provide a detailed answer:
"""

    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt
    )

    return response.text

print("Enhanced RAG function defined")

Enhanced RAG function defined


## Interactive Query Interface

Create an interactive loop to ask multiple questions.

In [28]:
def interactive_rag():
    print("RAG System Ready! Type 'quit' to exit.")

    while True:
        user_query = input("\nEnter your question: ")

        if user_query.lower() == 'quit':
            break

        try:
            answer = rag_query(user_query)
            print(f"\nAnswer: {answer}")
        except Exception as e:
            print(f"Error: {e}")

    print("Session ended.")

# Uncomment to run interactive mode
# interactive_rag()

## Usage Example

Example of how to use the advanced RAG system.

In [29]:
# Example usage
sample_query = "Summarize the key points from the document"

print("=== Enhanced RAG Response ===")
enhanced_response = enhanced_rag_query(sample_query)
print(enhanced_response)


print("\n\n=== Enteractive RAG Response ===")
interactive_rag()

=== Enhanced RAG Response ===
Retrieved 3 relevant documents

The request asks to summarize "the document," but the provided context contains three separate documents covering different topics. A single summary for "the document" cannot be provided as there isn't one unified document.

Here are the key points from each individual document:

*   **Document 1:** Emphasizes the importance of training employees to follow procedures (SOPs), which are available in Microsoft Word (.doc) and Adobe Acrobat (.pdf) formats.
*   **Document 2:** Provides instructions for entering unit cost (column 4) and calculating total cost (column 5) by multiplying the number of units by the unit cost.
*   **Document 3:** Highlights the necessity of keeping accurate and detailed records of meals prepared and served for any successful food service operation.


=== Enteractive RAG Response ===
RAG System Ready! Type 'quit' to exit.

Enter your question: What are the main 3 points of this document?

Answer: The ma