# 04 - RAG Fundamentals

**Build your first Retrieval-Augmented Generation system!**

## Learning Objectives

By the end of this notebook, you will:
- Understand the RAG architecture
- Load and chunk documents
- Create embeddings and store in vector DB
- Implement retrieval and generation

## Table of Contents

1. [What is RAG?](#what-is-rag)
2. [Document Loading](#loading)
3. [Chunking Strategies](#chunking)
4. [Vector Storage](#vector)
5. [Retrieval & Generation](#retrieval)
6. [Complete Pipeline](#pipeline)
7. [Exercises](#exercises)
8. [Checkpoint](#checkpoint)

In [None]:
# GUIDED: Setup
import os
import sys
from pathlib import Path

sys.path.append(str(Path.cwd().parent))

from dotenv import load_dotenv
load_dotenv(Path.cwd().parent / ".env")

print("Setup complete!")

---
## 1. What is RAG? <a id='what-is-rag'></a>

**RAG = Retrieval-Augmented Generation**

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────▶│  Retrieve   │────▶│  Generate   │
│             │     │  Relevant   │     │  Answer     │
│             │     │  Context    │     │             │
└─────────────┘     └─────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │  Vector DB  │
                    │  (indexed   │
                    │  documents) │
                    └─────────────┘
```

### Why RAG?

- **Current Information**: Access up-to-date data
- **Source Citation**: Know where answers come from
- **Domain Knowledge**: Add your own documents
- **Reduced Hallucination**: Ground answers in facts

---
## 2. Document Loading <a id='loading'></a>

In [None]:
# GUIDED: Load documents using our loader
from src.rag_pipeline import DocumentLoader, Document

loader = DocumentLoader()

# Create sample document
sample_text = """
# Introduction to RAG

Retrieval-Augmented Generation (RAG) is a technique that combines retrieval 
and generation for better AI responses.

## Key Components

1. Document Store: Where your documents are indexed
2. Embeddings: Vector representations of text
3. Retrieval: Finding relevant documents
4. Generation: Creating the final answer

## Benefits

- More accurate answers
- Traceable sources
- Up-to-date information
- Domain-specific knowledge
"""

# Save sample document
sample_path = Path("../data/documents/sample.txt")
sample_path.parent.mkdir(parents=True, exist_ok=True)
sample_path.write_text(sample_text)

# Load it
docs = loader.load_file(str(sample_path))
print(f"Loaded {len(docs)} document(s)")
print(f"Content preview: {docs[0].content[:200]}...")

---
## 3. Chunking Strategies <a id='chunking'></a>

In [None]:
# GUIDED: Chunk documents
from src.rag_pipeline import Chunker

# Create chunker
chunker = Chunker(
    chunk_size=200,  # Max characters per chunk
    overlap=50       # Overlap between chunks
)

# Chunk the document
chunks = chunker.chunk_all(docs)

print(f"Created {len(chunks)} chunks:")
for i, chunk in enumerate(chunks):
    print(f"\nChunk {i+1} ({len(chunk.content)} chars):")
    print(f"  {chunk.content[:100]}...")

---
## 4. Vector Storage <a id='vector'></a>

In [None]:
# GUIDED: Create embeddings and store
from src.embedding_utils import EmbeddingModel, SimpleVectorStore

# Create embedding model
embedder = EmbeddingModel(provider="openai", model="text-embedding-3-small")

# Create vector store
store = SimpleVectorStore(embedding_model=embedder)

# Add chunks
store.add(
    texts=[c.content for c in chunks],
    metadata=[c.metadata for c in chunks]
)

print(f"Vector store contains {len(store)} documents")

---
## 5. Retrieval & Generation <a id='retrieval'></a>

In [None]:
# GUIDED: Search and retrieve
query = "What are the benefits of RAG?"

results = store.search(query, k=3)

print(f"Query: {query}")
print(f"\nTop {len(results)} results:")
for i, r in enumerate(results):
    print(f"\n{i+1}. Score: {r['score']:.3f}")
    print(f"   {r['text'][:100]}...")

In [None]:
# GUIDED: Generate answer with context
from src.llm_utils import LLMClient

llm = LLMClient(provider="openai", model="gpt-4o-mini")

# Build context from retrieved documents
context = "\n\n".join([r["text"] for r in results])

# Create prompt
prompt = f"""Use the following context to answer the question.

Context:
{context}

Question: {query}

Answer:"""

answer = llm.chat(prompt)
print(f"Answer: {answer}")

---
## 6. Complete Pipeline <a id='pipeline'></a>

In [None]:
# GUIDED: Use our RAG pipeline class
from src.rag_pipeline import RAGPipeline
from src.llm_utils import LLMClient

# Create pipeline
llm = LLMClient(provider="openai", model="gpt-4o-mini")
rag = RAGPipeline(
    llm_client=llm,
    chunk_size=200,
    chunk_overlap=50,
    collection_name="demo_rag"
)

# Load documents
rag.load_documents("../data/documents/")

# Query
result = rag.query("What are the key components of RAG?")

print("Answer:")
print(result["answer"])
print(f"\nSources: {len(result['sources'])} documents used")

---
## 7. Exercises <a id='exercises'></a>

### Exercise 1: Add Your Documents

Add your own documents to the RAG system.

In [None]:
# TODO: Add your own documents and query them

# Your code here:


### Exercise 2: Experiment with Chunk Size

Try different chunk sizes and see how it affects retrieval.

In [None]:
# TODO: Compare chunk sizes 100, 500, 1000

# Your code here:


---
## 8. Checkpoint <a id='checkpoint'></a>

Before moving on, verify:

- [ ] You understand the RAG pipeline
- [ ] You can load and chunk documents
- [ ] You created a vector store
- [ ] You can query and get answers

### Next Steps

In the next notebook, we'll explore **Advanced RAG** - hybrid search, re-ranking, and evaluation!