# üîç RAG + Vector Databases

## Build Smarter AI Apps with LangChain

Welcome to the live session on Retrieval Augmented Generation (RAG) and Vector Databases! In this notebook, you'll learn how to build production-ready RAG systems that ground LLM responses in your own data.

---

## üöÄ Getting Started

### Prerequisites
- Python 3.8 or higher
- Jupyter Notebook installed
- OpenAI API key (for embeddings and LLM calls)

### Setup Instructions

**Option 1: Using requirements.txt (Recommended)**
```bash
pip install -r requirements.txt
```

**Option 2: Manual Installation**
```bash
pip install langchain langchain-openai langchain-community langchain-text-splitters chromadb pypdf python-dotenv
```

### Environment Setup

Create a `.env` file in this directory with your OpenAI API key:
```
OPENAI_API_KEY=your_api_key_here
```

---

## üìö What You'll Learn

1. **Why RAG Exists** ‚Äî Understanding LLM limitations and how RAG solves them
2. **RAG Architecture** ‚Äî The complete flow from indexing to query
3. **Embeddings Deep Dive** ‚Äî How text becomes vectors and why it matters
4. **Chunking Strategies** ‚Äî The make-or-break step most people get wrong
5. **Vector Databases** ‚Äî Storing and searching semantic data at scale
6. **Live Build** ‚Äî Build a working Document Q&A system
7. **Evaluation** ‚Äî How to measure and improve your RAG system

### üéØ By the End of This Notebook

- ‚úÖ Understand the complete RAG architecture
- ‚úÖ Master embeddings and vector similarity search
- ‚úÖ Build a production-ready Document Q&A system
- ‚úÖ Know how to evaluate and debug RAG systems
- ‚úÖ Be ready to build RAG applications on your own data

---

## üîë Key Concepts

| Concept | What It Is | Why It Matters |
|---------|------------|----------------|
| **RAG** | Retrieval + Augmentation + Generation | Grounds LLM responses in your data |
| **Embeddings** | Vector representations of text | Enables semantic search |
| **Vector DB** | Database optimized for high-dimensional vectors | Fast similarity search at scale |
| **Chunking** | Splitting documents into smaller pieces | Critical for retrieval quality |
| **Retrieval** | Finding relevant context for queries | Determines answer quality |

---

## üìñ How to Use This Notebook

1. **Run cells in order** - Each section builds on the previous one
2. **Read the markdown cells** - They contain important explanations
3. **Experiment** - Try modifying parameters to see what happens
4. **Test with your data** - Replace example documents with your own

**Ready? Let's start by installing dependencies!**


In [None]:
# Otherwise, install individually:
%pip install langchain langchain-core langchain-openai langchain-community langchain-text-splitters chromadb pypdf python-dotenv -q

# Verify installation
import sys
print(f"Python version: {sys.version}")

# Verify key packages are installed
try:
    import langchain
    import langchain_core
    print(f"‚úÖ LangChain version: {langchain.__version__}")
    print("‚úÖ All core packages installed successfully!")
except ImportError as e:
    print(f"‚ö†Ô∏è  Error: {e}")
    print("Please ensure all packages are installed correctly.")


---

## üõ†Ô∏è Setup & Imports

Let's import all the essential components we'll need throughout this notebook.
right

In [None]:
# Standard library imports
import os
from pathlib import Path
from dotenv import load_dotenv

# LangChain core imports
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate

# Modern LangChain agent imports (latest pattern from LangChain docs)
from langchain.agents import create_agent
from langchain.tools import tool

# Load environment variables - check current directory first, then parent
current_dir = Path.cwd()
env_path = current_dir / '.env'
if env_path.exists():
    load_dotenv(env_path)
    print(f"‚úÖ Loading .env from: {env_path}")
else:
    # Try parent directory
    parent_env = current_dir.parent / '.env'
    if parent_env.exists():
        load_dotenv(parent_env)
        print(f"‚úÖ Loading .env from: {parent_env}")
    else:
        # Fallback to default behavior (searches current and parent directories)
        load_dotenv()
        print("‚ö†Ô∏è  No .env file found. Using default load_dotenv() search")

# Verify API key is set
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    print("‚ö†Ô∏è  WARNING: OPENAI_API_KEY not found in environment variables!")
    print("Please create a .env file in this directory with your OpenAI API key:")
    print(f"   {current_dir / '.env'}")
    print("\nFormat: OPENAI_API_KEY=sk-...")
else:
    # Validate API key format (basic check)
    if not api_key.startswith('sk-'):
        print("‚ö†Ô∏è  WARNING: API key format looks incorrect. Should start with 'sk-'")
    else:
        print("‚úÖ OpenAI API key loaded successfully!")
        # Test the API key by making a simple call
        try:
            test_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, max_tokens=5)
            test_llm.invoke("Hi")
            print("‚úÖ API key validated successfully!")
        except Exception as e:
            print(f"‚ùå ERROR: API key validation failed!")
            print(f"   Error: {str(e)}")
            print("\nüí° Please check:")
            print("   1. Your API key is correct (get it from https://platform.openai.com/api-keys)")
            print("   2. You have sufficient credits in your OpenAI account")
            print("   3. The .env file is in the correct location")
            raise

print("\n‚úÖ All imports successful!")
print("üìö Using modern LangChain patterns: create_agent with @tool decorator")


---

## üéØ Part 1: Why RAG Exists

### The Problem with LLMs

LLMs are incredibly powerful, but they have critical limitations:

1. **Knowledge Cutoff** - They only know what they were trained on (e.g., GPT-4 trained on data up to April 2023)
2. **Hallucination** - They can confidently make up information
3. **No Access to Private Data** - They can't access your documents, databases, or internal knowledge
4. **Static Knowledge** - They can't learn new information after training

### How RAG Solves This

**RAG = Retrieval + Augmentation + Generation**

1. **Retrieval**: Find relevant information from your knowledge base
2. **Augmentation**: Add that information to the LLM's prompt
3. **Generation**: LLM generates an answer grounded in the retrieved context

This gives you:
- ‚úÖ Up-to-date information
- ‚úÖ Access to private data
- ‚úÖ Reduced hallucination
- ‚úÖ Source attribution

### RAG vs Fine-Tuning vs Prompt Engineering

| Approach | What It Does | Best For |
|----------|-------------|----------|
| **RAG** | Adds knowledge to prompts | New information, private data, facts |
| **Fine-tuning** | Changes model behavior | Style, format, domain-specific tasks |
| **Prompt Engineering** | Optimizes prompts | Simple tasks, no code changes needed |

**Key insight:** RAG handles *knowledge*, fine-tuning handles *behavior* ‚Äî they're not either/or!


---

## üèóÔ∏è Part 2: The RAG Architecture

The RAG architecture has two main phases:

### Phase 1: Indexing (One-time setup)

1. **Load your knowledge base** - Documents, PDFs, text files, etc.
2. **Split into chunks** - Break documents into manageable pieces
3. **Convert to vectors** - Use an embedding model to create vector representations
4. **Store in Vector DB** - Save vectors for fast retrieval

### Phase 2: Query (Real-time)

1. **User asks a question** - "What are the key findings?"
2. **Embed the query** - Convert the question to a vector
3. **Find similar chunks** - Search the vector database for relevant documents
4. **Add context to prompt** - Combine retrieved chunks with the original question
5. **LLM generates answer** - The model uses the context to answer
6. **Return grounded response** - Answer with source attribution

### Visual Flow

```
INDEXING:
Knowledge Base ‚Üí Chunks ‚Üí Embedding Model ‚Üí Vector DB

QUERY:
User Query ‚Üí Embedding Model ‚Üí Vector DB (search) ‚Üí Top K Docs ‚Üí 
Augment Prompt ‚Üí LLM ‚Üí Output
```

Let's build this step by step!


---

## üî¢ Part 3: Embeddings Deep Dive

### What are Embeddings?

Embeddings are **vector representations of text** that capture semantic meaning. Similar texts have similar vectors.

### Key Properties

1. **Semantic Similarity** - "dog" and "puppy" are close in vector space
2. **Fixed Dimensions** - Each embedding has a fixed size (e.g., 1536 for OpenAI's text-embedding-3-small)
3. **Distance = Meaning** - Closer vectors = more similar meaning

### Why Embeddings Matter

- **Traditional search**: "Python programming" won't match "coding in Python"
- **Semantic search**: "Python programming" WILL match "coding in Python" because they're semantically similar

Let's see embeddings in action:


In [None]:
# Initialize the embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Example texts to embed
texts = [
    "Python is a programming language",
    "Coding in Python is fun",
    "Dogs are loyal pets",
    "Cats are independent animals"
]

# Generate embeddings
text_embeddings = embeddings.embed_documents(texts)

print(f"Number of texts: {len(texts)}")
print(f"Embedding dimensions: {len(text_embeddings[0])}")
print(f"\nFirst embedding (first 10 values): {text_embeddings[0][:10]}")

# Calculate similarity between first two texts
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Compare similarities
similarity_01 = cosine_similarity(text_embeddings[0], text_embeddings[1])  # Python texts
similarity_02 = cosine_similarity(text_embeddings[0], text_embeddings[2])  # Python vs Dog

print(f"\nüìä Similarity Scores:")
print(f"  'Python is a programming language' vs 'Coding in Python is fun': {similarity_01:.4f}")
print(f"  'Python is a programming language' vs 'Dogs are loyal pets': {similarity_02:.4f}")
print(f"\n‚úÖ Notice: Similar topics have higher similarity scores!")


---

## ‚úÇÔ∏è Part 4: Chunking Strategies

### Why Chunking Matters

**Chunking is the make-or-break step in RAG.** Bad chunking = bad retrieval = bad answers.

### The Challenge

- **Too small**: Lose context, incomplete information
- **Too large**: Noise, irrelevant information, token limits
- **Wrong boundaries**: Split sentences, lose meaning

### Chunking Strategies

| Strategy | How It Works | Best For |
|----------|-------------|----------|
| **Fixed-size** | Split every N characters/tokens | Simple, predictable |
| **Recursive** | Split by paragraphs ‚Üí sentences ‚Üí words | General-purpose (LangChain default) |
| **Semantic** | Split when meaning shifts | High-quality but slower |
| **Document-aware** | Split by headers, sections, pages | Structured docs (markdown, HTML) |

### Best Practices

1. **Use overlap** - 10-20% overlap prevents losing context at boundaries
2. **Respect structure** - Don't split in the middle of sentences
3. **Consider your use case** - Technical docs need larger chunks, Q&A needs smaller

Let's see different chunking strategies:


In [None]:
# Sample document for chunking demonstration
sample_document = """
Artificial Intelligence (AI) is transforming how we work and live. 
Machine learning, a subset of AI, enables computers to learn from data without explicit programming.

Natural Language Processing (NLP) is another key area of AI. 
It allows computers to understand and generate human language.

Large Language Models (LLMs) like GPT-4 represent a breakthrough in NLP. 
They can generate human-like text and understand context.

Retrieval Augmented Generation (RAG) combines LLMs with information retrieval. 
This allows AI systems to answer questions using up-to-date information from knowledge bases.
"""

# Strategy 1: Recursive Character Text Splitter (Recommended)
recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=30,
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks_recursive = recursive_splitter.split_text(sample_document)

print("üìÑ Original document length:", len(sample_document), "characters")
print(f"\n‚úÇÔ∏è  Recursive Splitter Results ({len(chunks_recursive)} chunks):")
print("-" * 60)
for i, chunk in enumerate(chunks_recursive, 1):
    print(f"\nChunk {i} ({len(chunk)} chars):")
    print(f"  {chunk[:100]}..." if len(chunk) > 100 else f"  {chunk}")

# Strategy 2: Fixed-size chunks (for comparison)
fixed_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
    separators=[""]
)

chunks_fixed = fixed_splitter.split_text(sample_document)

print(f"\n\n‚úÇÔ∏è  Fixed-size Splitter Results ({len(chunks_fixed)} chunks):")
print("-" * 60)
for i, chunk in enumerate(chunks_fixed[:3], 1):  # Show first 3
    print(f"\nChunk {i} ({len(chunk)} chars):")
    print(f"  {chunk[:100]}..." if len(chunk) > 100 else f"  {chunk}")

print("\nüí° Notice how recursive splitter respects sentence boundaries better!")


---

## üóÑÔ∏è Part 5: Vector Databases

### What is a Vector Database?

A **vector database** is specialized storage optimized for:
- High-dimensional vectors (hundreds to thousands of dimensions)
- Fast similarity search (finding nearest neighbors)
- Scalability (millions of vectors)

### Why Not Regular Databases?

Regular SQL databases are great for exact matches, but terrible for:
- "Find documents similar to this query"
- Semantic search
- High-dimensional data

### Vector Database Options

| Database | Best For | Notes |
|----------|----------|-------|
| **ChromaDB** | Development, small-medium scale | Easy to use, Python-native |
| **Pinecone** | Production, large scale | Managed service, fast |
| **Weaviate** | Enterprise, complex queries | Open source, feature-rich |
| **Qdrant** | Performance-critical | Fast, open source |
| **Milvus** | Very large scale | Distributed, enterprise |

For this session, we'll use **ChromaDB** - it's simple, works great for learning, and is perfect for production at small-medium scale.

### How Vector Search Works

1. Query is embedded ‚Üí becomes a vector
2. Vector DB finds vectors with highest cosine similarity
3. Returns top K most similar documents

Let's set up ChromaDB:


In [None]:
# Create a simple vector store with sample documents
sample_docs = [
    "Python is a high-level programming language known for its simplicity.",
    "Machine learning is a subset of artificial intelligence.",
    "Vector databases enable semantic search over large text collections.",
    "LangChain is a framework for building LLM applications.",
    "RAG combines retrieval and generation for grounded AI responses."
]

# Split into chunks (we already covered chunking strategies in Part 4)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

# For this demo, we'll treat each sentence as a chunk
chunks = sample_docs

# Create embeddings and store in ChromaDB
vectorstore = Chroma.from_texts(
    texts=chunks,
    embedding=embeddings,
    persist_directory="./chroma_demo_db"
)

print(f"‚úÖ Created vector store with {len(chunks)} documents")
print(f"üìÅ Database saved to: ./chroma_demo_db")

# Test similarity search
query = "What is Python?"
results = vectorstore.similarity_search(query, k=2)

print(f"\nüîç Query: '{query}'")
print(f"\nüìÑ Top {len(results)} results:")
for i, doc in enumerate(results, 1):
    print(f"\n  {i}. {doc.page_content}")


---

## üõ†Ô∏è Part 6: Live Build - Document Q&A System

Now let's build a complete RAG system step by step!

### What We're Building

A Document Q&A system that can:
1. Load PDF or text documents
2. Chunk and embed them
3. Store in a vector database
4. Answer questions based on the documents

### Architecture

```
PDF/Text Files ‚Üí Load ‚Üí Chunk ‚Üí Embed ‚Üí Store in ChromaDB
                                                      ‚Üì
User Query ‚Üí Embed ‚Üí Search ‚Üí Retrieve Top K ‚Üí Augment Prompt ‚Üí LLM ‚Üí Answer
```

Let's build it!


### Step 1: Create Sample Document

First, let's create a sample document to work with. In production, you'd load your own PDFs or text files.


In [None]:
# Create a sample document file for demonstration
sample_content = """
# Introduction to RAG Systems

## What is RAG?

Retrieval Augmented Generation (RAG) is a technique that enhances the capabilities of Large Language Models (LLMs) by providing them with access to external knowledge sources.

## How RAG Works

RAG systems work in two phases:

### Phase 1: Indexing
1. Documents are loaded from various sources (PDFs, databases, APIs)
2. Documents are split into smaller chunks
3. Each chunk is converted to a vector using an embedding model
4. Vectors are stored in a vector database

### Phase 2: Querying
1. User asks a question
2. Question is converted to a vector
3. Vector database finds similar document chunks
4. Retrieved chunks are added to the LLM prompt
5. LLM generates an answer based on the retrieved context

## Benefits of RAG

- Reduces hallucination by grounding answers in real data
- Allows access to up-to-date information
- Enables use of private/internal documents
- Provides source attribution for answers

## Best Practices

1. Use appropriate chunk sizes (typically 500-1000 characters)
2. Include overlap between chunks (10-20%)
3. Choose the right embedding model for your domain
4. Test retrieval quality before optimizing generation
5. Use metadata filtering when possible
"""

# Save to a text file
with open("sample_rag_document.txt", "w") as f:
    f.write(sample_content)

print("‚úÖ Created sample document: sample_rag_document.txt")
print(f"üìÑ Document length: {len(sample_content)} characters")


### Step 2: Load Documents

LangChain supports 80+ document loaders for different formats.


In [None]:
# Load the document
loader = TextLoader("sample_rag_document.txt")
documents = loader.load()

print(f"‚úÖ Loaded {len(documents)} document(s)")
print(f"üìÑ First document preview:")
print("-" * 60)
print(documents[0].page_content[:300] + "...")
print("-" * 60)
print(f"\nüìä Document metadata: {documents[0].metadata}")


### Step 3: Chunk Documents

Split documents into manageable chunks with overlap.


In [None]:
# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # Size of each chunk
    chunk_overlap=100,   # Overlap between chunks (important!)
    separators=["\n\n", "\n", ". ", " ", ""]  # Try to split at these boundaries
)

# Split documents into chunks
chunks = text_splitter.split_documents(documents)

print(f"‚úÖ Split document into {len(chunks)} chunks")
print(f"\nüìä Chunk statistics:")
print(f"  Average chunk size: {sum(len(c.page_content) for c in chunks) / len(chunks):.0f} characters")
print(f"  Min chunk size: {min(len(c.page_content) for c in chunks)} characters")
print(f"  Max chunk size: {max(len(c.page_content) for c in chunks)} characters")

print(f"\nüìÑ First chunk preview:")
print("-" * 60)
print(chunks[0].page_content)
print("-" * 60)


### Step 4: Create Embeddings and Vector Store

Convert chunks to vectors and store in ChromaDB.


In [None]:
# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store from chunks
# This will:
# 1. Generate embeddings for all chunks
# 2. Store them in ChromaDB
# 3. Persist to disk for later use

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./rag_vector_db"
)

print("‚úÖ Created vector store!")
print(f"üìÅ Database saved to: ./rag_vector_db")
print(f"üìä Stored {len(chunks)} document chunks")

# Test retrieval
test_query = "What are the benefits of RAG?"
results = vectorstore.similarity_search(test_query, k=1)

print(f"\nüîç Test query: '{test_query}'")
print(f"\nüìÑ Retrieved {len(results)} relevant chunks:")
print("-" * 60)
for i, doc in enumerate(results, 1):
    print(f"\n{i}. {doc.page_content[:200]}...")
    print(f"   Metadata: {doc.metadata}")


### Step 5: Create the RAG Agent

Now let's connect retrieval to the LLM using the modern LangChain agent pattern with tools. This is the recommended approach from the latest LangChain documentation.


In [None]:
# Initialize LLM
model = ChatOpenAI(
    model="gpt-4o-mini",  # Fast and cost-effective
    temperature=0  # Deterministic responses
)

# Create a retrieval tool using the modern @tool decorator pattern
# This follows the latest LangChain documentation pattern
@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information from the document store to help answer a query."""
    retrieved_docs = vectorstore.similarity_search(query, k=3)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

# Create the RAG agent using create_agent (modern pattern)
tools = [retrieve_context]
system_prompt = (
    "You have access to a tool that retrieves context from documents. "
    "Use the tool to help answer user queries. Always cite your sources."
)

rag_agent = create_agent(model, tools, system_prompt=system_prompt)

print("‚úÖ RAG agent created using modern LangChain pattern!")
print("\nüìã Agent configuration:")
print(f"  LLM: {model.model_name}")
print(f"  Retrieval: Top 3 chunks")
print(f"  Pattern: Agent with @tool decorator (latest LangChain docs)")


### Step 6: Ask Questions!

Now let's test our RAG system with some questions.


In [None]:
# Test questions using the modern agent pattern
questions = [
    "What is RAG?",
    "How does RAG work?",
    "What are the benefits of RAG?",
]

for question in questions:
    print(f"\n{'='*70}")
    print(f"‚ùì Question: {question}")
    print(f"{'='*70}\n")
    
    # Use agent.stream() with stream_mode="values" (modern pattern)
    for event in rag_agent.stream(
        {"messages": [{"role": "user", "content": question}]},
        stream_mode="values",
    ):
        # Get the last message from the event
        last_message = event["messages"][-1]
        
        # Display tool calls if present
        if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
            print("üîß Tool Calls:")
            for tool_call in last_message.tool_calls:
                print(f"  - {tool_call.get('name', 'unknown')}: {tool_call.get('args', {})}")
            print()
        
        # Display the final answer
        if hasattr(last_message, 'content') and last_message.content:
            print(f"üí¨ Answer:\n{last_message.content}\n")
    
    print("-"*70)


---

## üß™ Part 7: Evaluation & Debugging

### Common RAG Failure Modes

| Problem | Symptom | Fix |
|---------|---------|-----|
| **Bad chunking** | Half-relevant text | Adjust chunk size, use semantic splitting |
| **Wrong K value** | Too much noise or missing context | Experiment with K=3 to K=10 |
| **Embedding mismatch** | Irrelevant results | Try different embedding model |
| **Prompt leakage** | LLM ignores context | Strengthen grounding instructions |
| **Boundary issues** | Answer split across chunks | Increase chunk overlap |

### Testing Your RAG System

1. **Test retrieval first** - If retrieval is bad, generation will be bad
2. **Check chunk quality** - Are chunks meaningful and complete?
3. **Verify embedding quality** - Do similar queries return similar results?
4. **Evaluate answers** - Are answers grounded in the context?

Let's create a simple evaluation function:


In [None]:
def evaluate_retrieval(vectorstore, query, k=3):
    """Evaluate retrieval quality by showing retrieved chunks"""
    results = vectorstore.similarity_search(query, k=k)
    
    print(f"üîç Query: '{query}'")
    print(f"üìä Retrieved {len(results)} chunks\n")
    
    for i, doc in enumerate(results, 1):
        relevance_score = "‚úÖ" if any(keyword in doc.page_content.lower() 
                                     for keyword in query.lower().split()) else "‚ö†Ô∏è"
        print(f"{relevance_score} Chunk {i} ({len(doc.page_content)} chars):")
        print(f"   {doc.page_content[:200]}...")
        print()
    
    return results

# Test retrieval quality
print("="*70)
print("RETRIEVAL QUALITY TEST")
print("="*70)
evaluate_retrieval(vectorstore, "What are the benefits of RAG?", k=3)


---

## üéì Summary & Next Steps

### What We Built

‚úÖ Complete RAG system with:
- Document loading
- Intelligent chunking
- Vector embeddings
- Vector database storage
- Retrieval-augmented Q&A
- Evaluation tools

### Key Takeaways

1. **RAG = Retrieval + Augmentation + Generation**
2. **Chunking is critical** - Bad chunking = bad results
3. **Test retrieval before generation** - Garbage in, garbage out
4. **Embeddings enable semantic search** - Similar meaning = similar vectors
5. **Vector databases scale** - Fast similarity search at scale

### Next Steps

1. **Try with your own documents** - Replace the sample document with your PDFs/text files
2. **Experiment with chunk sizes** - Find what works best for your data
3. **Try different embedding models** - Some models work better for specific domains
4. **Add metadata filtering** - Filter by date, source, category before retrieval
5. **Implement hybrid search** - Combine vector search with keyword matching
6. **Add reranking** - Use a cross-encoder to improve top results

### Resources

- **LangChain Docs**: https://python.langchain.com/
- **ChromaDB Docs**: https://docs.trychroma.com/
- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings
- **RAG Evaluation**: Consider using RAGAS (https://docs.ragas.io/)

### Your Challenge

Build a RAG system on YOUR data this week! Start simple, iterate, and remember:
- **Start simple** - Basic RAG with good chunking beats complex RAG with bad data
- **Test retrieval first** - If retrieval is bad, nothing else matters
- **Iterate** - RAG is an iterative process, not a one-time setup

---

## üéâ Congratulations!

You've built a complete RAG system! You now have the foundation to build production-ready AI applications that can answer questions based on your own data.

**Happy building! üöÄ**
