# 10 - Capstone: Build Your AI Application

**Put everything together in a complete project.**

## Project Overview

Build a **Document Q&A Assistant** that:
- Loads and indexes your documents
- Answers questions using RAG
- Provides source citations
- Includes production-ready features

## What You'll Build

```
┌─────────────────────────────────────────────────┐
│              Document Q&A Assistant              │
├─────────────────────────────────────────────────┤
│  ┌─────────┐    ┌─────────┐    ┌─────────────┐ │
│  │Documents│───▶│ Indexer │───▶│Vector Store │ │
│  └─────────┘    └─────────┘    └─────────────┘ │
│                                       │        │
│  ┌─────────┐    ┌─────────┐    ┌─────┴───────┐ │
│  │ Answer  │◀───│   LLM   │◀───│  Retriever  │ │
│  └─────────┘    └─────────┘    └─────────────┘ │
│        │                                       │
│        ▼                                       │
│  ┌─────────────────────────────────────────┐  │
│  │  + Caching + Metrics + Cost Tracking    │  │
│  └─────────────────────────────────────────┘  │
└─────────────────────────────────────────────────┘
```

In [None]:
# Setup
import os
import sys
import json
from pathlib import Path
from datetime import datetime

sys.path.append(str(Path.cwd().parent))

from dotenv import load_dotenv
load_dotenv(Path.cwd().parent / ".env")

print("Setup complete!")

---
## Step 1: Create the Document Store

In [None]:
# Create sample documents for the Q&A system
from pathlib import Path

docs_dir = Path("../data/documents")
docs_dir.mkdir(parents=True, exist_ok=True)

# Sample documents
documents = {
    "company_policies.txt": """
# Company Policies

## Work Hours
Standard work hours are 9 AM to 5 PM, Monday through Friday.
Flexible hours are available with manager approval.
Remote work is permitted up to 3 days per week.

## Time Off
Employees receive 20 days of paid time off per year.
Sick leave is separate and unlimited with documentation.
Holidays follow the company calendar (12 days per year).

## Benefits
Health insurance is provided for all full-time employees.
401(k) matching up to 4% of salary.
Professional development budget of $2,000 per year.
""",
    "product_guide.txt": """
# Product Guide

## Getting Started
1. Create an account at our website
2. Download the desktop or mobile app
3. Log in with your credentials
4. Complete the onboarding tutorial

## Features
- Document Management: Store and organize files
- Collaboration: Share with team members
- Search: Find anything instantly
- Analytics: Track usage and trends

## Pricing
- Free: Up to 5 GB storage, 3 users
- Pro: $10/month, 100 GB storage, 10 users
- Enterprise: Custom pricing, unlimited storage and users
""",
    "faq.txt": """
# Frequently Asked Questions

## Account
Q: How do I reset my password?
A: Click "Forgot Password" on the login page and follow the instructions.

Q: Can I change my email address?
A: Yes, go to Settings > Account > Email to update it.

## Billing
Q: What payment methods are accepted?
A: We accept all major credit cards and PayPal.

Q: Can I get a refund?
A: Yes, within 30 days of purchase with no questions asked.

## Technical
Q: What browsers are supported?
A: Chrome, Firefox, Safari, and Edge (latest versions).

Q: Is my data secure?
A: Yes, we use industry-standard encryption and security practices.
"""
}

# Write documents
for filename, content in documents.items():
    (docs_dir / filename).write_text(content)
    print(f"Created: {filename}")

print(f"\nDocuments saved to {docs_dir}")

---
## Step 2: Build the RAG Pipeline

In [None]:
# Build the complete Q&A system
from dataclasses import dataclass, field
from typing import Optional
import hashlib
import time

from src.rag_pipeline import DocumentLoader, Chunker, VectorStore, Document
from src.llm_utils import LLMClient
from src.embedding_utils import EmbeddingModel

@dataclass
class QAResult:
    """Result from Q&A query."""
    answer: str
    sources: list[dict]
    cached: bool
    latency_ms: float
    tokens_used: int
    cost: float

class DocumentQA:
    """Production-ready Document Q&A System."""
    
    def __init__(
        self,
        llm_provider: str = "openai",
        llm_model: str = "gpt-4o-mini",
        embedding_model: str = "text-embedding-3-small",
        chunk_size: int = 500,
        chunk_overlap: int = 50,
        cache_ttl: float = 3600
    ):
        # Core components
        self.llm = LLMClient(provider=llm_provider, model=llm_model)
        self.loader = DocumentLoader()
        self.chunker = Chunker(chunk_size=chunk_size, overlap=chunk_overlap)
        self.vector_store = VectorStore(collection_name="qa_docs")
        
        # Production features
        self.cache: dict = {}
        self.cache_ttl = cache_ttl
        self.metrics = {
            "queries": 0,
            "cache_hits": 0,
            "total_tokens": 0,
            "total_cost": 0,
            "latencies": []
        }
        
        self.documents_loaded = 0
    
    def load_documents(self, path: str) -> int:
        """Load and index documents."""
        path = Path(path)
        
        if path.is_file():
            docs = self.loader.load_file(str(path))
        else:
            docs = self.loader.load_directory(str(path))
        
        # Chunk documents
        chunks = self.chunker.chunk_all(docs)
        
        # Add to vector store
        self.vector_store.add_documents(chunks)
        self.documents_loaded = len(chunks)
        
        return len(chunks)
    
    def _cache_key(self, query: str) -> str:
        """Generate cache key."""
        return hashlib.sha256(query.lower().strip().encode()).hexdigest()[:16]
    
    def _get_cached(self, query: str) -> Optional[QAResult]:
        """Check cache for result."""
        key = self._cache_key(query)
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry["timestamp"] < self.cache_ttl:
                return entry["result"]
        return None
    
    def _set_cached(self, query: str, result: QAResult):
        """Store result in cache."""
        key = self._cache_key(query)
        self.cache[key] = {
            "result": result,
            "timestamp": time.time()
        }
    
    def query(self, question: str, k: int = 3) -> QAResult:
        """Answer a question using the indexed documents."""
        self.metrics["queries"] += 1
        start_time = time.time()
        
        # Check cache
        cached = self._get_cached(question)
        if cached:
            self.metrics["cache_hits"] += 1
            cached.cached = True
            cached.latency_ms = (time.time() - start_time) * 1000
            return cached
        
        # Retrieve relevant documents
        results = self.vector_store.search(question, k=k)
        
        # Build context
        context_parts = []
        sources = []
        for i, r in enumerate(results):
            context_parts.append(f"[{i+1}] {r['content']}")
            sources.append({
                "id": i + 1,
                "content": r["content"][:200] + "...",
                "metadata": r.get("metadata", {})
            })
        
        context = "\n\n".join(context_parts)
        
        # Generate answer
        prompt = f"""Use the following context to answer the question. 
Include source references like [1], [2] when citing information.
If you cannot answer from the context, say so.

Context:
{context}

Question: {question}

Answer:"""
        
        answer = self.llm.chat(prompt)
        
        # Calculate metrics
        latency = (time.time() - start_time) * 1000
        stats = self.llm.get_stats()
        tokens = stats.total_input_tokens + stats.total_output_tokens
        cost = stats.total_cost
        
        # Update metrics
        self.metrics["total_tokens"] += tokens
        self.metrics["total_cost"] += cost
        self.metrics["latencies"].append(latency)
        
        result = QAResult(
            answer=answer,
            sources=sources,
            cached=False,
            latency_ms=latency,
            tokens_used=tokens,
            cost=cost
        )
        
        # Cache result
        self._set_cached(question, result)
        
        return result
    
    def get_metrics(self) -> dict:
        """Get system metrics."""
        latencies = self.metrics["latencies"]
        return {
            "documents_indexed": self.documents_loaded,
            "total_queries": self.metrics["queries"],
            "cache_hit_rate": self.metrics["cache_hits"] / max(1, self.metrics["queries"]),
            "total_tokens": self.metrics["total_tokens"],
            "total_cost": f"${self.metrics['total_cost']:.4f}",
            "avg_latency_ms": sum(latencies) / max(1, len(latencies)),
            "cache_size": len(self.cache)
        }

print("DocumentQA class defined!")

---
## Step 3: Initialize and Load Documents

In [None]:
# Create and initialize the Q&A system
qa = DocumentQA(
    llm_provider="openai",
    llm_model="gpt-4o-mini",
    chunk_size=300,
    chunk_overlap=50
)

# Load documents
num_chunks = qa.load_documents("../data/documents")
print(f"\nIndexed {num_chunks} document chunks")

---
## Step 4: Test the System

In [None]:
# Test with various questions
questions = [
    "How many days of PTO do employees get?",
    "What is the pricing for the Pro plan?",
    "How do I reset my password?",
    "What payment methods are accepted?",
    "Is remote work allowed?"
]

print("Testing Document Q&A System")
print("=" * 50)

for q in questions:
    print(f"\nQ: {q}")
    result = qa.query(q)
    print(f"A: {result.answer}")
    print(f"   [Latency: {result.latency_ms:.0f}ms, Cached: {result.cached}]")

In [None]:
# Test caching - repeat a question
print("Testing cache (repeating first question)...")
result = qa.query(questions[0])
print(f"Q: {questions[0]}")
print(f"A: {result.answer}")
print(f"   [Cached: {result.cached}, Latency: {result.latency_ms:.2f}ms]")

---
## Step 5: View Metrics

In [None]:
# Display system metrics
metrics = qa.get_metrics()

print("System Metrics")
print("=" * 40)
for key, value in metrics.items():
    if isinstance(value, float):
        print(f"{key}: {value:.2f}")
    else:
        print(f"{key}: {value}")

---
## Step 6: Interactive Demo

In [None]:
# Interactive Q&A (run this cell and enter questions)
def interactive_qa():
    print("\n" + "="*50)
    print("Document Q&A Assistant")
    print("Type 'quit' to exit, 'metrics' to see stats")
    print("="*50 + "\n")
    
    while True:
        question = input("You: ").strip()
        
        if not question:
            continue
        
        if question.lower() == 'quit':
            print("Goodbye!")
            break
        
        if question.lower() == 'metrics':
            print(json.dumps(qa.get_metrics(), indent=2))
            continue
        
        result = qa.query(question)
        print(f"\nAssistant: {result.answer}")
        
        if result.sources:
            print(f"\n[Sources: {len(result.sources)} documents | "
                  f"Latency: {result.latency_ms:.0f}ms | "
                  f"Cached: {result.cached}]")
        print()

# Uncomment to run interactive mode
# interactive_qa()

print("Uncomment the last line to run interactive Q&A!")

---
## Capstone Challenges

Extend your Document Q&A system with these features:

### Challenge 1: Add Conversation History

Implement multi-turn conversations that remember context.

In [None]:
# TODO: Add conversation history to enable follow-up questions

# Your code here:


### Challenge 2: Add Source Highlighting

Show which parts of the source documents were used for the answer.

In [None]:
# TODO: Highlight relevant passages in source documents

# Your code here:


### Challenge 3: Add Feedback Loop

Implement user feedback to improve answers over time.

In [None]:
# TODO: Add thumbs up/down feedback and store for analysis

# Your code here:


### Challenge 4: Create API Endpoint

Wrap the system in a FastAPI endpoint.

In [None]:
# TODO: Create FastAPI app with /query endpoint

# Your code here:


---
## Congratulations!

You've completed the AI Engineering track! You now know how to:

- Work with LLM APIs and structured outputs
- Build RAG systems with vector databases
- Fine-tune models for specific tasks
- Evaluate and test AI systems
- Deploy production-ready AI applications

### What's Next?

1. **Expand this project** with the challenges above
2. **Try the Agentic AI track** to build autonomous agents
3. **Build your own project** applying these concepts
4. **Contribute** to open-source AI tools

Happy building!