#Retrieval-Augmented Generation (RAG) system using Chroma DB and Gemini API

#🔍 Overview
We’ll build a small RAG system that:

Loads and chunks documents

Generates embeddings using Gemini

Stores embeddings in ChromaDB

Retrieves relevant chunks for a query

Uses Gemini Pro to generate an answer based on retrieved context

#Step 1: Install Required Libraries

In [1]:
!pip install chromadb langchain google-generativeai tiktoken
!pip install -U langchain-google-genai

Collecting chromadb
  Downloading chromadb-1.0.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.10.0-py3-none-any.whl.metadata (6.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.34.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.34.1-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb)
  Downloading opentelemetry_instrumentation_fastapi-0.55b1-py3-none-any.whl.metadata (2.2 kB)
Collecting opentelemetry-sdk>=1.2.0 (from 

#Step 2: Initialize Gemini API

In [2]:
import os
import google.generativeai as genai

# Set your Google Gemini API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyDR7ItGwxOcbodnqRZXJQzFN_MVrRWxGaw"

# Configure Gemini
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


#Step 3: Prepare Sample Documents and Chunk Them

In [3]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document

# Sample documents
docs = [
    "Gemini is Google's next-generation large language model developed by DeepMind.",
    "It supports multimodal input including text, image, and audio.",
    "Gemini is designed to compete with OpenAI's GPT-4 and is integrated into Google products."
]

# Split long text into smaller chunks
splitter = CharacterTextSplitter(separator=". ", chunk_size=100, chunk_overlap=10)
doc_chunks = splitter.create_documents(docs)

# Show the chunks
for i, doc in enumerate(doc_chunks):
    print(f"Chunk {i+1}: {doc.page_content}")


Chunk 1: Gemini is Google's next-generation large language model developed by DeepMind.
Chunk 2: It supports multimodal input including text, image, and audio.
Chunk 3: Gemini is designed to compete with OpenAI's GPT-4 and is integrated into Google products.


$Step 4: Generate Embeddings Using Gemini

In [8]:

from langchain_google_genai import GoogleGenerativeAIEmbeddings
# Initialize Gemini embedding model
embedding_model = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=os.environ["GOOGLE_API_KEY"]
)

# Generate embeddings for document chunks
texts = [doc.page_content for doc in doc_chunks]
embeddings = embedding_model.embed_documents(texts)


#Step 5: Store Embeddings in ChromaDB

In [9]:
import chromadb
from chromadb.utils.embedding_functions import EmbeddingFunction

# Create or connect to Chroma DB collection
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(name="gemini_rag_demo")

# Add chunks to ChromaDB
for i, (text, embedding) in enumerate(zip(texts, embeddings)):
    collection.add(
        ids=[f"chunk-{i}"],
        embeddings=[embedding],
        documents=[text]
    )


#Step 6: Accept a Query and Search ChromaDB


In [10]:
# User query
query = "Who developed Gemini?"

# Convert query to embedding using Gemini
query_embedding = embedding_model.embed_query(query)

# Search in ChromaDB
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=3  # Retrieve top 3 relevant documents
)

# Display retrieved chunks
retrieved_docs = results["documents"][0]
print("\nTop Retrieved Chunks:")
for doc in retrieved_docs:
    print("-", doc)



Top Retrieved Chunks:
- Gemini is designed to compete with OpenAI's GPT-4 and is integrated into Google products.
- Gemini is Google's next-generation large language model developed by DeepMind.
- It supports multimodal input including text, image, and audio.


#Step 7: Use Gemini Pro to Generate Answer from Retrieved Chunks

In [12]:
# Create prompt with retrieved context
context = "\n".join(retrieved_docs)

prompt = f"""
You are a helpful assistant. Based on the following context, answer the question.

Context:
{context}

Question:
{query}
"""

# Initialize Gemini Pro model
model = genai.GenerativeModel("gemini-1.5-flash-latest")

# Generate the final answer
response = model.generate_content(prompt)

# Output the answer
print("\nGemini Answer:\n", response.text)



Gemini Answer:
 DeepMind developed Gemini.



#✅ Summary

| Component    | Tool                      |
| ------------ | ------------------------- |
| Chunking     | LangChain TextSplitter    |
| Embeddings   | Gemini Embedding-001      |
| Vector Store | ChromaDB                  |
| Generation   | Gemini Pro (`gemini-pro`) |


#🔄 Optional: Clear ChromaDB Collection

In [17]:
# Delete collection (if needed)
# chroma_client.delete_collection(name="gemini_rag_demo")


#📘 What This Demonstrated
RAG with fully local ChromaDB

Language understanding powered by Gemini

Lightweight and offline-compatible vector search

#While ChromaDB (using the chromadb Python package) stores embeddings, it does not directly expose a built-in .view() or .get_all_embeddings() method. However, you can view embeddings by retrieving them manually using .get() or .query() methods.

#✅ Ways to View Embeddings in ChromaDB
🔹 1. When You Add Data
If you keep the embedding list you used in .add(), you can view it directly:

In [13]:
# Print first embedding (768 dimensions)
print("Embedding for first chunk:\n", embeddings[0])


Embedding for first chunk:
 [0.01272655837237835, -0.007440745830535889, -0.0675327479839325, 0.00744202733039856, 0.0733533650636673, -0.0009060268057510257, -0.0010522770462557673, -0.040370795875787735, 0.03930964320898056, 0.05540554225444794, -0.038794197142124176, 0.008102688007056713, -0.05616523325443268, -0.01134650781750679, -0.012691427022218704, 0.0022314637899398804, 0.011485285125672817, 0.01124726701527834, -0.003444547997787595, 0.011975969187915325, 0.01472781877964735, 0.018791915848851204, -0.015119043178856373, -0.027898162603378296, -0.014068600721657276, -0.019609082490205765, 0.016654863953590393, -0.009272011928260326, -0.03108236752450466, 0.0017324743093922734, -0.024981927126646042, 0.04149814695119858, -0.030066825449466705, 0.014123250730335712, -0.03368433564901352, -0.02119774930179119, 0.004019313026219606, -0.03531693294644356, 0.02441428415477276, 0.00837720837444067, 0.013804377987980843, -0.05454118549823761, -0.0014481054386124015, 0.018217952921986

#🔹 2. Retrieve Embeddings Using .get()
Use the IDs you used when adding data:



In [14]:
# Get embeddings for specific IDs
result = collection.get(ids=["chunk-0"], include=["embeddings", "documents"])

# View the embedding
print("Document:", result["documents"][0])
print("Embedding:", result["embeddings"][0])


Document: Gemini is Google's next-generation large language model developed by DeepMind.
Embedding: [ 1.27265584e-02 -7.44074583e-03 -6.75327480e-02  7.44202733e-03
  7.33533651e-02 -9.06026806e-04 -1.05227705e-03 -4.03707959e-02
  3.93096432e-02  5.54055423e-02 -3.87941971e-02  8.10268801e-03
 -5.61652333e-02 -1.13465078e-02 -1.26914270e-02  2.23146379e-03
  1.14852851e-02  1.12472670e-02 -3.44454800e-03  1.19759692e-02
  1.47278188e-02  1.87919158e-02 -1.51190432e-02 -2.78981626e-02
 -1.40686007e-02 -1.96090825e-02  1.66548640e-02 -9.27201193e-03
 -3.10823675e-02  1.73247431e-03 -2.49819271e-02  4.14981470e-02
 -3.00668254e-02  1.41232507e-02 -3.36843356e-02 -2.11977493e-02
  4.01931303e-03 -3.53169329e-02  2.44142842e-02  8.37720837e-03
  1.38043780e-02 -5.45411855e-02 -1.44810544e-03  1.82179529e-02
 -1.99215319e-02  3.55944643e-03  9.13674571e-03  3.61810140e-02
  2.36197617e-02 -5.04285023e-02  4.54770140e-02  3.68804187e-02
  7.16646165e-02 -4.53323163e-02 -7.21761398e-03 -5.996

#🔹 3. Loop Through All Stored Embeddings
If you don’t know the IDs, you can retrieve all of them by chunking over them manually (since ChromaDB limits max per call):

In [15]:
# Get all IDs (manually stored or assume format)
all_ids = [f"chunk-{i}" for i in range(len(texts))]

# Retrieve and print all embeddings
for i in range(len(all_ids)):
    result = collection.get(ids=[all_ids[i]], include=["embeddings", "documents"])
    print(f"\nID: {all_ids[i]}")
    print("Text:", result["documents"][0])
    print("Embedding[:5]:", result["embeddings"][0][:5])  # Just show first 5 values



ID: chunk-0
Text: Gemini is Google's next-generation large language model developed by DeepMind.
Embedding[:5]: [ 0.01272656 -0.00744075 -0.06753275  0.00744203  0.07335337]

ID: chunk-1
Text: It supports multimodal input including text, image, and audio.
Embedding[:5]: [ 0.04877108 -0.01827895 -0.09199723 -0.01355792  0.06735586]

ID: chunk-2
Text: Gemini is designed to compete with OpenAI's GPT-4 and is integrated into Google products.
Embedding[:5]: [ 0.0217246  -0.03876981 -0.07529283 -0.02459843  0.06027972]


#✅ Pro Tip: Save Embeddings to File (Optional)
You can store them to .json or .csv for inspection:

In [16]:
import json

data = [
    {"id": f"chunk-{i}", "text": text, "embedding": embedding}
    for i, (text, embedding) in enumerate(zip(texts, embeddings))
]

with open("chroma_embeddings.json", "w") as f:
    json.dump(data, f, indent=2)
