# üéì Week 15: Capstone Project - End-to-End RAG System

This notebook guides you through building a complete production RAG system.

## Project Overview

Build a **Document Q&A System** that:
1. Ingests and processes documents
2. Retrieves relevant context
3. Generates accurate answers
4. Provides citations

---

## System Architecture

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                     DOCUMENT Q&A SYSTEM                              ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                      ‚îÇ
‚îÇ  Documents ‚Üí [Chunking] ‚Üí [Embedding] ‚Üí [Vector Store]              ‚îÇ
‚îÇ                                               ‚Üì                      ‚îÇ
‚îÇ  Query ‚Üí [Query Processing] ‚Üí [Retrieval] ‚Üí [Context]               ‚îÇ
‚îÇ                                               ‚Üì                      ‚îÇ
‚îÇ                               [Reranking] ‚Üí [Top-K Docs]             ‚îÇ
‚îÇ                                               ‚Üì                      ‚îÇ
‚îÇ                               [Generation] ‚Üí [Answer + Citations]    ‚îÇ
‚îÇ                                               ‚Üì                      ‚îÇ
‚îÇ                               [Evaluation] ‚Üí [Quality Metrics]       ‚îÇ
‚îÇ                                                                      ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

In [None]:
# Setup
import sys
sys.path.insert(0, '../..')

from dataclasses import dataclass
from typing import List, Dict, Optional
import numpy as np

# Import our modules
from src.embeddings import TextEmbedder, EmbeddingCache
from src.retrieval import BM25Retriever, DenseRetriever, HybridRetriever
from src.reranking import CrossEncoderReranker, DiversityReranker
from src.orchestration import Chain, SequentialChain, LambdaChain, ConversationMemory, Message, MessageRole
from src.evaluation import RAGEvaluator, EvaluationReport

print("‚úÖ All modules imported!")

---

## Step 1: Document Processing

In [None]:
@dataclass
class Chunk:
    """A document chunk with metadata."""
    id: str
    content: str
    doc_id: str
    start_idx: int
    end_idx: int
    metadata: Dict = None

class DocumentProcessor:
    """
    Process documents into chunks for retrieval.
    """
    
    def __init__(self, chunk_size: int = 500, overlap: int = 50):
        self.chunk_size = chunk_size
        self.overlap = overlap
    
    def chunk_document(self, doc_id: str, text: str) -> List[Chunk]:
        """Split document into overlapping chunks."""
        chunks = []
        start = 0
        chunk_idx = 0
        
        while start < len(text):
            end = min(start + self.chunk_size, len(text))
            
            # Try to end at sentence boundary
            if end < len(text):
                for punct in ['. ', '? ', '! ', '\n']:
                    last_punct = text[start:end].rfind(punct)
                    if last_punct != -1:
                        end = start + last_punct + len(punct)
                        break
            
            chunk = Chunk(
                id=f"{doc_id}_chunk_{chunk_idx}",
                content=text[start:end].strip(),
                doc_id=doc_id,
                start_idx=start,
                end_idx=end
            )
            chunks.append(chunk)
            
            start = end - self.overlap
            chunk_idx += 1
        
        return chunks

# Test
processor = DocumentProcessor(chunk_size=200, overlap=20)

sample_doc = """
Machine learning is a subset of artificial intelligence. It enables computers to learn from data.
Deep learning is a type of machine learning that uses neural networks. Neural networks have many layers.
Natural language processing helps computers understand text. It is used in chatbots and translation.
"""

chunks = processor.chunk_document("doc1", sample_doc)
print(f"Created {len(chunks)} chunks from document")
for c in chunks:
    print(f"  [{c.id}]: {c.content[:50]}...")

---

## Step 2: Indexing Pipeline

In [None]:
from src.retrieval.retrieval import Document

class IndexingPipeline:
    """
    Pipeline for indexing documents.
    """
    
    def __init__(self):
        self.processor = DocumentProcessor(chunk_size=300, overlap=30)
        self.embedder = TextEmbedder()
        self.retriever = HybridRetriever(alpha=0.5)
        self.chunks = []
    
    def ingest(self, documents: Dict[str, str]):
        """
        Ingest documents into the index.
        
        Args:
            documents: Dict of doc_id -> content
        """
        print(f"Ingesting {len(documents)} documents...")
        
        # Chunk all documents
        all_chunks = []
        for doc_id, content in documents.items():
            chunks = self.processor.chunk_document(doc_id, content)
            all_chunks.extend(chunks)
        
        self.chunks = all_chunks
        print(f"Created {len(all_chunks)} chunks")
        
        # Convert to Document objects for retriever
        docs = [
            Document(id=c.id, content=c.content, metadata={"doc_id": c.doc_id})
            for c in all_chunks
        ]
        
        # Index
        self.retriever.index(docs)
        print("‚úÖ Indexing complete!")
    
    def search(self, query: str, top_k: int = 5):
        """Search for relevant chunks."""
        return self.retriever.retrieve(query, top_k=top_k)

# Test
pipeline = IndexingPipeline()

docs = {
    "ml_intro": "Machine learning is AI that learns from data. It powers recommendations and predictions.",
    "dl_intro": "Deep learning uses neural networks with many layers. It excels at image and text tasks.",
    "nlp_intro": "NLP enables computers to understand human language. Applications include chatbots and translation."
}

pipeline.ingest(docs)

results = pipeline.search("What is deep learning?")
print(f"\nSearch results:")
for r in results[:3]:
    print(f"  [{r.score:.3f}] {r.document.content[:60]}...")

---

## Step 3: RAG System

In [None]:
@dataclass
class RAGResponse:
    """Response from RAG system."""
    answer: str
    sources: List[str]
    confidence: float

class RAGSystem:
    """
    Complete Retrieval-Augmented Generation system.
    """
    
    def __init__(self):
        self.indexer = IndexingPipeline()
        self.reranker = CrossEncoderReranker()
        self.memory = ConversationMemory(max_messages=10)
        self.evaluator = RAGEvaluator()
    
    def ingest_documents(self, documents: Dict[str, str]):
        """Ingest documents for retrieval."""
        self.indexer.ingest(documents)
    
    def query(self, question: str, top_k: int = 3) -> RAGResponse:
        """Answer a question using RAG."""
        # 1. Retrieve
        initial_results = self.indexer.search(question, top_k=top_k * 2)
        
        # 2. Rerank
        docs = [r.document for r in initial_results]
        reranked = self.reranker.rerank(question, docs, top_k=top_k)
        
        # 3. Build context
        context_docs = [r.document for r in reranked]
        context = "\n\n".join([f"[{i+1}] {d.content}" for i, d in enumerate(context_docs)])
        
        # 4. Generate answer (mock - replace with actual LLM)
        answer = self._generate_answer(question, context)
        
        # 5. Update memory
        self.memory.add_message(Message(role=MessageRole.USER, content=question))
        self.memory.add_message(Message(role=MessageRole.ASSISTANT, content=answer))
        
        # 6. Compute confidence
        avg_score = np.mean([r.rerank_score for r in reranked]) if reranked else 0
        
        return RAGResponse(
            answer=answer,
            sources=[d.id for d in context_docs],
            confidence=avg_score
        )
    
    def _generate_answer(self, question: str, context: str) -> str:
        """Generate answer from context (mock implementation)."""
        # In production, call your LLM here
        return f"Based on the provided context about {context[:50]}..., the answer is: [Generated answer]"

# Test complete system
rag = RAGSystem()
rag.ingest_documents(docs)

response = rag.query("What can deep learning do?")
print(f"\nAnswer: {response.answer}")
print(f"Sources: {response.sources}")
print(f"Confidence: {response.confidence:.2f}")

---

## Step 4: Evaluation

In [None]:
# Evaluate the RAG system
evaluator = RAGEvaluator()

# Test data
test_cases = [
    {
        "question": "What is machine learning?",
        "expected": "Machine learning is AI that learns from data."
    },
    {
        "question": "What are neural networks?",
        "expected": "Neural networks are used in deep learning."
    }
]

# Run evaluation
report = EvaluationReport(title="RAG System Evaluation")

for i, test in enumerate(test_cases):
    response = rag.query(test["question"])
    
    # Evaluate
    faithfulness = evaluator.evaluate_faithfulness(response.answer, response.sources)
    relevance = evaluator.evaluate_relevance(test["question"], response.answer)
    
    report.add_metrics(f"Test Case {i+1}", {
        "faithfulness": faithfulness,
        "relevance": relevance,
        "confidence": response.confidence
    })

print(report.to_markdown())

---

## üìù Capstone Checklist

### Core Components
- [x] Document chunking with overlap
- [x] Hybrid retrieval (BM25 + Dense)
- [x] Cross-encoder reranking
- [x] Answer generation with context
- [x] Conversation memory
- [x] Evaluation metrics

### Next Steps
- [ ] Add actual LLM integration
- [ ] Implement citation extraction
- [ ] Add streaming responses
- [ ] Deploy as API
- [ ] Add monitoring