# 05 - Advanced RAG Techniques

**Production-grade retrieval and generation.**

## Learning Objectives

By the end of this notebook, you will:
- Implement advanced chunking strategies
- Use hybrid search (BM25 + vectors)
- Apply re-ranking for better results
- Evaluate RAG performance

## Table of Contents

1. [Chunking Strategies](#chunking)
2. [Hybrid Search](#hybrid)
3. [Re-ranking](#reranking)
4. [Query Transformation](#query)
5. [Evaluation Metrics](#evaluation)
6. [Exercises](#exercises)
7. [Checkpoint](#checkpoint)

In [None]:
# GUIDED: Setup
import os
import sys
from pathlib import Path

sys.path.append(str(Path.cwd().parent))

from dotenv import load_dotenv
load_dotenv(Path.cwd().parent / ".env")

print("Setup complete!")

---
## 1. Chunking Strategies <a id='chunking'></a>

How you split documents affects retrieval quality:

```
Strategy          Best For                    Trade-offs
────────────────────────────────────────────────────────
Fixed size        Simple, predictable         May break context
Recursive         Preserves structure         More complex
Semantic          Meaning-aware               Slower, needs model
Sentence          Natural boundaries          Variable sizes
```

In [None]:
# GUIDED: Compare chunking strategies
sample_text = """
# Introduction to Machine Learning

Machine learning is a subset of artificial intelligence that enables computers to learn from data.

## Types of Machine Learning

### Supervised Learning
In supervised learning, the algorithm learns from labeled training data. Examples include:
- Classification: Predicting categories (spam/not spam)
- Regression: Predicting continuous values (house prices)

### Unsupervised Learning
Unsupervised learning finds patterns in unlabeled data. Common techniques:
- Clustering: Grouping similar data points
- Dimensionality reduction: Simplifying data

### Reinforcement Learning
The agent learns by interacting with an environment and receiving rewards or penalties.

## Applications

Machine learning powers many modern applications:
- Image recognition
- Natural language processing
- Recommendation systems
- Autonomous vehicles
"""

print(f"Total length: {len(sample_text)} characters")

In [None]:
# GUIDED: Fixed-size chunking
def chunk_fixed(text: str, chunk_size: int = 200, overlap: int = 50) -> list[str]:
    """Simple fixed-size chunking with overlap."""
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        if chunk.strip():
            chunks.append(chunk.strip())
        start = end - overlap
    
    return chunks

fixed_chunks = chunk_fixed(sample_text, chunk_size=300, overlap=50)
print(f"Fixed chunking: {len(fixed_chunks)} chunks")
for i, chunk in enumerate(fixed_chunks[:2]):
    print(f"\nChunk {i+1} ({len(chunk)} chars):")
    print(f"  {chunk[:100]}...")

In [None]:
# GUIDED: Recursive chunking (respects structure)
def chunk_recursive(
    text: str,
    chunk_size: int = 300,
    separators: list[str] = None
) -> list[str]:
    """Recursively split text, trying to preserve structure."""
    separators = separators or ["\n\n", "\n", ". ", " "]
    
    if len(text) <= chunk_size:
        return [text.strip()] if text.strip() else []
    
    # Try each separator
    for sep in separators:
        if sep in text:
            parts = text.split(sep)
            chunks = []
            current = ""
            
            for part in parts:
                if len(current) + len(part) + len(sep) <= chunk_size:
                    current += (sep if current else "") + part
                else:
                    if current:
                        chunks.append(current.strip())
                    current = part
            
            if current:
                chunks.append(current.strip())
            
            # Recursively split any chunks that are still too large
            final_chunks = []
            for chunk in chunks:
                if len(chunk) > chunk_size:
                    final_chunks.extend(chunk_recursive(chunk, chunk_size, separators[1:]))
                else:
                    final_chunks.append(chunk)
            
            return [c for c in final_chunks if c]
    
    # Fallback: split at chunk_size
    return chunk_fixed(text, chunk_size, overlap=0)

recursive_chunks = chunk_recursive(sample_text, chunk_size=300)
print(f"Recursive chunking: {len(recursive_chunks)} chunks")
for i, chunk in enumerate(recursive_chunks[:3]):
    print(f"\nChunk {i+1} ({len(chunk)} chars):")
    print(f"  {chunk[:100]}...")

In [None]:
# GUIDED: Use our Chunker class
from src.rag_pipeline import Chunker, Document

doc = Document(content=sample_text, source="sample.md")

# Compare different settings
for size, overlap in [(200, 50), (500, 100), (1000, 200)]:
    chunker = Chunker(chunk_size=size, overlap=overlap)
    chunks = chunker.chunk(doc)
    print(f"Size {size}, Overlap {overlap}: {len(chunks)} chunks")

---
## 2. Hybrid Search <a id='hybrid'></a>

Combine keyword (BM25) and semantic (vector) search for better results.

In [None]:
# GUIDED: Simple BM25 implementation
import math
from collections import Counter

class SimpleBM25:
    """Basic BM25 implementation."""
    
    def __init__(self, documents: list[str], k1: float = 1.5, b: float = 0.75):
        self.documents = documents
        self.k1 = k1
        self.b = b
        
        # Tokenize
        self.doc_tokens = [doc.lower().split() for doc in documents]
        self.doc_lengths = [len(tokens) for tokens in self.doc_tokens]
        self.avg_length = sum(self.doc_lengths) / len(self.doc_lengths)
        
        # Calculate IDF
        self.idf = {}
        n_docs = len(documents)
        all_terms = set()
        for tokens in self.doc_tokens:
            all_terms.update(tokens)
        
        for term in all_terms:
            doc_freq = sum(1 for tokens in self.doc_tokens if term in tokens)
            self.idf[term] = math.log((n_docs - doc_freq + 0.5) / (doc_freq + 0.5) + 1)
    
    def score(self, query: str) -> list[tuple[int, float]]:
        """Score all documents for a query."""
        query_tokens = query.lower().split()
        scores = []
        
        for i, (tokens, length) in enumerate(zip(self.doc_tokens, self.doc_lengths)):
            score = 0
            token_counts = Counter(tokens)
            
            for term in query_tokens:
                if term in token_counts:
                    tf = token_counts[term]
                    idf = self.idf.get(term, 0)
                    
                    # BM25 formula
                    numerator = tf * (self.k1 + 1)
                    denominator = tf + self.k1 * (1 - self.b + self.b * length / self.avg_length)
                    score += idf * numerator / denominator
            
            scores.append((i, score))
        
        return sorted(scores, key=lambda x: x[1], reverse=True)

# Test BM25
docs = [
    "Python is a programming language for data science",
    "Machine learning algorithms learn from data",
    "Data visualization helps understand patterns",
    "Python and JavaScript are popular languages"
]

bm25 = SimpleBM25(docs)
results = bm25.score("Python programming")

print("BM25 Results for 'Python programming':")
for idx, score in results:
    print(f"  {score:.3f}: {docs[idx]}")

In [None]:
# GUIDED: Hybrid search combining BM25 and vectors
from src.embedding_utils import EmbeddingModel, cosine_similarity

class HybridSearch:
    """Combine BM25 and vector search."""
    
    def __init__(self, documents: list[str], embedder: EmbeddingModel):
        self.documents = documents
        self.embedder = embedder
        
        # Initialize BM25
        self.bm25 = SimpleBM25(documents)
        
        # Create embeddings
        self.embeddings = embedder.embed_batch(documents)
    
    def search(
        self, 
        query: str, 
        k: int = 5,
        alpha: float = 0.5  # Weight for vector search (0=BM25 only, 1=vector only)
    ) -> list[tuple[str, float]]:
        """Hybrid search with weighted combination."""
        # BM25 scores
        bm25_results = self.bm25.score(query)
        bm25_scores = {idx: score for idx, score in bm25_results}
        
        # Normalize BM25 scores
        max_bm25 = max(bm25_scores.values()) if bm25_scores.values() else 1
        bm25_norm = {idx: score / max_bm25 for idx, score in bm25_scores.items()}
        
        # Vector scores
        query_embedding = self.embedder.embed(query)
        vector_scores = {}
        for i, emb in enumerate(self.embeddings):
            vector_scores[i] = cosine_similarity(query_embedding, emb)
        
        # Combine scores
        combined = []
        for i in range(len(self.documents)):
            bm25_s = bm25_norm.get(i, 0)
            vec_s = vector_scores.get(i, 0)
            final_score = alpha * vec_s + (1 - alpha) * bm25_s
            combined.append((self.documents[i], final_score))
        
        return sorted(combined, key=lambda x: x[1], reverse=True)[:k]

# Test hybrid search
embedder = EmbeddingModel(provider="openai", model="text-embedding-3-small")
hybrid = HybridSearch(docs, embedder)

print("Hybrid Search Results (alpha=0.5):")
for doc, score in hybrid.search("Python programming", alpha=0.5):
    print(f"  {score:.3f}: {doc}")

---
## 3. Re-ranking <a id='reranking'></a>

Use a more powerful model to re-order initial results.

In [None]:
# GUIDED: LLM-based re-ranking
from src.llm_utils import LLMClient
import json

def rerank_with_llm(
    query: str,
    documents: list[str],
    llm: LLMClient,
    top_k: int = 3
) -> list[tuple[str, float]]:
    """Re-rank documents using an LLM."""
    # Format documents for the prompt
    doc_list = "\n".join([f"{i+1}. {doc}" for i, doc in enumerate(documents)])
    
    prompt = f"""Given the query and documents, rank the documents by relevance.
Return JSON with 'rankings' array of objects with 'index' (1-based) and 'score' (0-1).

Query: {query}

Documents:
{doc_list}

Return only the top {top_k} most relevant. JSON:"""
    
    response = llm.chat(prompt)
    
    try:
        # Parse JSON
        start = response.find("{")
        end = response.rfind("}") + 1
        data = json.loads(response[start:end])
        
        results = []
        for item in data.get("rankings", []):
            idx = item["index"] - 1
            if 0 <= idx < len(documents):
                results.append((documents[idx], item["score"]))
        
        return results
    except:
        # Fallback: return original order
        return [(doc, 0.5) for doc in documents[:top_k]]

# Test re-ranking
llm = LLMClient(provider="openai", model="gpt-4o-mini")

initial_docs = [
    "Python is used for web development with Django and Flask.",
    "Machine learning in Python uses libraries like scikit-learn.",
    "JavaScript runs in web browsers for frontend development.",
    "Python's data science stack includes pandas and numpy."
]

reranked = rerank_with_llm(
    "Python for machine learning",
    initial_docs,
    llm,
    top_k=3
)

print("Re-ranked results:")
for doc, score in reranked:
    print(f"  {score:.2f}: {doc}")

---
## 4. Query Transformation <a id='query'></a>

Improve retrieval by transforming the query.

In [None]:
# GUIDED: Query expansion
from src.llm_utils import LLMClient

def expand_query(query: str, llm: LLMClient) -> list[str]:
    """Generate multiple search queries from a single question."""
    prompt = f"""Generate 3 different search queries that could help answer this question.
Make them diverse but relevant.

Question: {query}

Return as a JSON array of strings. JSON:"""
    
    response = llm.chat(prompt)
    
    try:
        start = response.find("[")
        end = response.rfind("]") + 1
        queries = json.loads(response[start:end])
        return [query] + queries  # Include original
    except:
        return [query]

# Test
llm = LLMClient(provider="openai", model="gpt-4o-mini")
expanded = expand_query("How do neural networks learn?", llm)

print("Expanded queries:")
for i, q in enumerate(expanded):
    print(f"  {i+1}. {q}")

In [None]:
# GUIDED: Hypothetical Document Embedding (HyDE)
def hyde_search(
    query: str,
    documents: list[str],
    llm: LLMClient,
    embedder: EmbeddingModel,
    k: int = 3
) -> list[tuple[str, float]]:
    """Generate hypothetical answer, then search for similar docs."""
    
    # Generate hypothetical answer
    prompt = f"""Write a short, factual answer to this question as if you were a document:

Question: {query}

Answer:"""
    
    hypothetical = llm.chat(prompt)
    print(f"Hypothetical answer: {hypothetical[:100]}...")
    
    # Embed the hypothetical answer
    query_embedding = embedder.embed(hypothetical)
    
    # Search with hypothetical embedding
    doc_embeddings = embedder.embed_batch(documents)
    
    results = []
    for doc, emb in zip(documents, doc_embeddings):
        score = cosine_similarity(query_embedding, emb)
        results.append((doc, score))
    
    return sorted(results, key=lambda x: x[1], reverse=True)[:k]

# Test HyDE
test_docs = [
    "Neural networks learn through backpropagation, adjusting weights based on errors.",
    "Gradient descent optimizes the loss function by iteratively updating parameters.",
    "Deep learning uses multiple layers to learn hierarchical representations.",
    "Python is a popular programming language for machine learning."
]

hyde_results = hyde_search(
    "How do neural networks learn?",
    test_docs,
    llm,
    embedder
)

print("\nHyDE Results:")
for doc, score in hyde_results:
    print(f"  {score:.3f}: {doc}")

---
## 5. Evaluation Metrics <a id='evaluation'></a>

In [None]:
# GUIDED: RAG evaluation metrics
from src.evaluation import calculate_retrieval_metrics, calculate_mrr

# Example: Evaluating retrieval quality
retrieved = ["doc1", "doc2", "doc3", "doc4", "doc5"]
relevant = ["doc2", "doc4", "doc6"]

metrics = calculate_retrieval_metrics(retrieved, relevant)
print("Retrieval Metrics:")
print(f"  Precision: {metrics['precision']:.3f}")
print(f"  Recall: {metrics['recall']:.3f}")
print(f"  F1 Score: {metrics['f1']:.3f}")

# MRR
mrr = calculate_mrr(retrieved, "doc2")
print(f"  MRR: {mrr:.3f}")

In [None]:
# GUIDED: End-to-end RAG evaluation
from src.evaluation import LLMJudge
from src.llm_utils import LLMClient

llm = LLMClient(provider="openai", model="gpt-4o-mini")
judge = LLMJudge(llm)

# Evaluate a RAG answer
question = "What are the benefits of RAG?"
context = "RAG reduces hallucination, provides citations, and enables domain-specific knowledge."
answer = "The benefits of RAG include reduced hallucinations, ability to cite sources, and adding custom knowledge."

score, feedback = judge.evaluate(
    question=question,
    answer=answer,
    criteria="accuracy, completeness, relevance to context",
    reference=context
)

print(f"Score: {score:.2f}")
print(f"Feedback: {feedback}")

---
## 6. Exercises <a id='exercises'></a>

### Exercise 1: Optimize Chunking

Find the optimal chunk size for your documents.

In [None]:
# TODO: Compare retrieval quality with different chunk sizes

# Your code here:


### Exercise 2: Build Advanced RAG Pipeline

Combine multiple techniques into one pipeline.

In [None]:
# TODO: Create pipeline with: hybrid search + re-ranking + query expansion

# Your code here:


---
## 7. Checkpoint <a id='checkpoint'></a>

Before moving on, verify:

- [ ] You understand different chunking strategies
- [ ] You implemented hybrid search
- [ ] You used re-ranking techniques
- [ ] You can evaluate RAG systems

### Next Steps

In the next notebook, we'll explore **Fine-tuning** - training models on your own data!