# Retrieval-Augmented Generation with ChromaDB

## Learning Goals
- Understand the architecture of **RAG** (indexing → retrieval → generation).
- Implement a minimal RAG pipeline with **LangChain** and **ChromaDB**.
- Compare different chunking strategies and their effect on retrieval.
- Query the vector database and inject retrieved context into the LLM prompt.

This notebook corresponds to Section *1.6 Retrieval-Augmented Generation (RAG)* in the lecture notes.

In [1]:
# %load get_llm.py
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_community.chat_models import ChatOllama

# Load environment variables from .env
load_dotenv()

def get_llm(provider: str = "openai"):
    """
    Return a language model instance configured for either OpenAI or Ollama.

    This function centralizes the initialization of chat-based LLMs so that 
    notebooks and applications can switch seamlessly between cloud-based models 
    (OpenAI) and local models (Ollama).

    Parameters
    ----------
    provider : str, optional
        The backend provider to use. Options are:
        - "openai": returns a ChatOpenAI instance (requires OPENAI_API_KEY in .env).
        - "ollama": returns a ChatOllama instance (requires Ollama installed locally).
        Default is "openai".

    Returns
    -------
    langchain.chat_models.base.BaseChatModel
        A chat model instance that can be invoked with messages.

    Examples
    --------
    Initialize an OpenAI model (requires API key):

    >>> llm = get_llm("openai")
    >>> llm.invoke("Hello, how are you?")

    Initialize a local Ollama model (e.g., Gemma2 2B):

    >>> llm = get_llm("ollama")
    >>> llm.invoke("Summarize the benefits of reinforcement learning.")
    """
    if provider == "openai":
        return ChatOpenAI(
            model="gpt-4o-mini",  # can also be "gpt-4.1" or "gpt-4o"
            temperature=0
        )
    elif provider == "ollama":
        return ChatOllama(
            model="gemma2:2b",   # replace with any local model installed in Ollama
            temperature=0
        )
    else:
        raise ValueError("Unsupported provider. Use 'openai' or 'ollama'.")


In [2]:
import os
from dotenv import load_dotenv

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema import HumanMessage

load_dotenv()

llm = get_llm("openai")  # or get_llm("ollama")

## Step 1 — Load sample documents
A small text file is created and used for demonstration.

In [3]:
sample_text = """
Databases are organized collections of structured information or data,
typically stored electronically in a computer system.
Artificial intelligence (AI) refers to systems that can perform tasks that
normally require human intelligence, such as reasoning and learning.
LLM-based agents combine large language models with external tools.
"""

os.makedirs("data", exist_ok=True)
with open("data/sample_doc.txt", "w") as f:
    f.write(sample_text)

loader = TextLoader("data/sample_doc.txt")
docs = loader.load()

print("Loaded document:")
print(docs[0].page_content)

Loaded document:

Databases are organized collections of structured information or data,
typically stored electronically in a computer system.
Artificial intelligence (AI) refers to systems that can perform tasks that
normally require human intelligence, such as reasoning and learning.
LLM-based agents combine large language models with external tools.



## Step 2 — Chunking
The document is split into overlapping chunks to fit into the model context window.

In [4]:
splitter = RecursiveCharacterTextSplitter(chunk_size=80, chunk_overlap=20)
chunks = splitter.split_documents(docs)

print("Number of chunks:", len(chunks))
for i, c in enumerate(chunks):
    print(f"Chunk {i}: {c.page_content}")

Number of chunks: 5
Chunk 0: Databases are organized collections of structured information or data,
Chunk 1: typically stored electronically in a computer system.
Chunk 2: Artificial intelligence (AI) refers to systems that can perform tasks that
Chunk 3: normally require human intelligence, such as reasoning and learning.
Chunk 4: LLM-based agents combine large language models with external tools.


## Step 3 — Embedding + Indexing in ChromaDB
Each chunk is converted into embeddings and stored in a local ChromaDB instance.

In [5]:
# embedding_model = HuggingFaceEmbeddings(
#   model_name="sentence-transformers/all-MiniLM-L6-v2"
#)
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"}
)

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="data/chroma_store"
)

  embedding_model = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Step 4 — Retrieval
Semantic search is performed over the vectorstore.

In [6]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 2})

query = "What is artificial intelligence?"
results = retriever.get_relevant_documents(query)

print("Retrieved chunks:")
for r in results:
    print("-", r.page_content)

Retrieved chunks:
- Artificial intelligence (AI) refers to systems that can perform tasks that
- Artificial intelligence (AI) refers to systems that can perform tasks that


  results = retriever.get_relevant_documents(query)


## Step 5 — Generation with retrieved context
The retrieved chunks are injected into a prompt template before calling the LLM.

In [7]:
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the following context to answer:\n\n{context}"),
    ("human", "{question}")
])

def rag_pipeline(question: str):
    # Retrieve
    retrieved_docs = retriever.get_relevant_documents(question)
    context = "\n".join([doc.page_content for doc in retrieved_docs])

    # Generate
    chain = prompt_template | llm
    return chain.invoke({"context": context, "question": question}).content

print(rag_pipeline("What is a database?"))
print(rag_pipeline("What are LLM-based agents?"))

A database is an organized collection of structured information or data. It allows for the efficient storage, retrieval, and management of data, often using a database management system (DBMS). Databases can store various types of information and are commonly used in applications ranging from business operations to personal data management. They enable users to perform operations such as querying, updating, and analyzing data in a systematic way.
LLM-based agents are systems that integrate large language models (LLMs) with external tools to perform various tasks. These agents leverage the natural language processing capabilities of LLMs to understand and generate human-like text while also utilizing external resources or tools to enhance their functionality. This combination allows them to perform complex tasks, such as answering questions, providing recommendations, or executing commands, by accessing real-time data or specialized applications beyond their training data.


### Reflection
- RAG decouples **knowledge storage** (ChromaDB) from **reasoning** (LLM).
- Chunking is critical: too small → fragmented context; too large → exceeds token limits.
- The vector database enables **semantic search**, not keyword search.
- This architecture is the basis for practical applications like Q&A over documents.

## Exercises
1. Replace the sample document with a **PDF loader** (e.g., `PyPDFLoader`) and index a real article.
2. Change the chunk size and observe how retrieval quality changes.
3. Experiment with different embedding models (`all-MiniLM`, `multi-qa-mpnet-base-dot-v1`).
4. Persist the ChromaDB index and reload it in a new notebook.

# Misc

In [3]:
from langchain.embeddings import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Example texts (chunks from a document)
texts = [
    "The cat sits outside.",
    "It is sunny today.",
    "The dog barks loudly."
]

# Create embedding model
embedding_model = OpenAIEmbeddings()

# Generate vector representations
vectors = embedding_model.embed_documents(texts)

print(len(vectors), "embeddings generated.")
print("Dimension of each embedding:", len(vectors[0]))

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(vectors)

# Pretty print with pandas DataFrame
df = pd.DataFrame(similarity_matrix, index=texts, columns=texts)
print("\nSimilarity matrix:")
df.round(2)

3 embeddings generated.
Dimension of each embedding: 1536

Similarity matrix:


Unnamed: 0,The cat sits outside.,It is sunny today.,The dog barks loudly.
The cat sits outside.,1.0,0.83,0.83
It is sunny today.,0.83,1.0,0.79
The dog barks loudly.,0.83,0.79,1.0


In [1]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

# Example document
text = "The cat sits outside. It is sunny today. The dog barks loudly."
chunks = CharacterTextSplitter(chunk_size=20, chunk_overlap=5, separator=" ").split_text(text)

print("Chuncks:")
print(chunks)

# Embedding model
embedding_model = OpenAIEmbeddings()

# Store chunks + embeddings in Chroma vector DB
vectorstore = Chroma.from_texts(chunks, embedding_model)

# Example query
query = "What is the weather like?"
docs = vectorstore.similarity_search(query, k=2)

print("Query:", query)
print("Result:")
for d in docs:
    print(d.page_content)

Chuncks:
['The cat sits', 'sits outside. It is', 'It is sunny today.', 'The dog barks', 'barks loudly.']
Query: What is the weather like?
Result:
It is sunny today.
sits outside. It is


In [2]:
from langchain.text_splitter import CharacterTextSplitter

text = "The cat sits outside. It is sunny today. The dog barks loudly."
splitter = CharacterTextSplitter(chunk_size=20, chunk_overlap=5, separator=" ")
chunks = splitter.split_text(text)

print(chunks)


['The cat sits', 'sits outside. It is', 'It is sunny today.', 'The dog barks', 'barks loudly.']
