# Project Planning & Architecture Design

**Module:** 4.3 - Capstone Project (Domain 4: Production AI)
**Time:** 4-6 hours
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Design a complete system architecture for your project
- [ ] Create detailed component specifications
- [ ] Plan your data pipeline and model strategy
- [ ] Define your API contracts
- [ ] Set up monitoring and evaluation infrastructure

---

## üìö Prerequisites

- Completed: `lab-4.3.0-project-kickoff.ipynb`
- Selected: Your project option (A, B, C, or D)
- Created: Initial project proposal

---

## üåç Real-World Context

At companies like Google, Meta, and OpenAI, engineers spend 20-30% of their project time on planning and design. This isn't wasted time - it's the foundation for everything that follows.

**Why Architecture Matters:**

| Without Architecture | With Architecture |
|---------------------|-------------------|
| "Let me just start coding..." | "Here's what we're building..." |
| Constant rewrites | Incremental progress |
| Integration nightmares | Clean interfaces |
| "It works on my machine" | Reproducible everywhere |
| Unclear requirements | Testable specifications |

This notebook guides you through the same planning process used in production AI systems.

---

## üßí ELI5: System Architecture

> **Imagine you're building a treehouse.** Before picking up a hammer, you'd want to:
>
> 1. **Sketch a plan** - Where does the door go? How big is the window?
> 2. **List materials** - How much wood? What kind of nails?
> 3. **Plan the order** - Build the floor before the walls!
> 4. **Think about problems** - What if it rains during construction?
>
> **System architecture is your blueprint.** It shows:
> - What pieces you're building (components)
> - How they connect (interfaces)
> - What they're made of (technologies)
> - What order to build them (dependencies)
>
> **Without a blueprint**, you might build the roof first and realize you can't attach it. With one, you build systematically and everything fits together.


---

## Part 1: Architecture Patterns for AI Systems

Let's explore common patterns used in production AI systems. Understanding these helps you design your own.

In [None]:
# Architecture Pattern Reference
# These are the building blocks for your system design

architecture_patterns = {
    "rag_pipeline": {
        "name": "RAG Pipeline",
        "description": "Retrieval-Augmented Generation for knowledge-grounded responses",
        "components": [
            "Document Loader",
            "Embedding Model",
            "Vector Store",
            "Retriever",
            "LLM Generator",
            "Response Formatter",
        ],
        "data_flow": "Query ‚Üí Embed ‚Üí Search ‚Üí Retrieve ‚Üí Augment ‚Üí Generate ‚Üí Response",
        "best_for": ["Option A", "Option B"],
        "diagram": """
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Query     ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Embedder   ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Vector DB   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ
                                               ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Response   ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ     LLM     ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ  Documents  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "agent_orchestrator": {
        "name": "Agent Orchestrator",
        "description": "Central coordinator managing specialized agents",
        "components": [
            "Orchestrator",
            "Task Planner",
            "Agent Pool",
            "Tool Registry",
            "Memory Manager",
            "Safety Layer",
        ],
        "data_flow": "Task ‚Üí Plan ‚Üí Dispatch ‚Üí Execute ‚Üí Aggregate ‚Üí Verify ‚Üí Output",
        "best_for": ["Option C"],
        "diagram": """
                    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                    ‚îÇ  Orchestrator   ‚îÇ
                    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
              ‚ñº              ‚ñº              ‚ñº
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ Agent A  ‚îÇ  ‚îÇ Agent B  ‚îÇ  ‚îÇ Agent C  ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
             ‚îÇ             ‚îÇ             ‚îÇ
             ‚ñº             ‚ñº             ‚ñº
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ Tools A  ‚îÇ  ‚îÇ Tools B  ‚îÇ  ‚îÇ Tools C  ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "training_pipeline": {
        "name": "Training Pipeline",
        "description": "End-to-end model training and deployment workflow",
        "components": [
            "Data Collector",
            "Preprocessor",
            "Trainer",
            "Evaluator",
            "Model Registry",
            "Deployment Manager",
        ],
        "data_flow": "Collect ‚Üí Clean ‚Üí Train ‚Üí Evaluate ‚Üí Register ‚Üí Deploy",
        "best_for": ["Option D"],
        "diagram": """
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Data   ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Clean  ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Train  ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Eval   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ
                                     Pass?  ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ Fail?
                                        ‚îÇ      ‚îÇ      ‚îÇ
                                        ‚ñº      ‚îÇ      ‚ñº
                                   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                   ‚îÇ Deploy  ‚îÇ ‚îÇ ‚îÇ Iterate ‚îÇ
                                   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ      ‚îÇ
                                               ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "multimodal_processor": {
        "name": "Multimodal Processor",
        "description": "Process and understand multiple data modalities",
        "components": [
            "Input Router",
            "Vision Encoder",
            "Text Encoder",
            "Fusion Layer",
            "Task Head",
            "Output Formatter",
        ],
        "data_flow": "Input ‚Üí Route ‚Üí Encode ‚Üí Fuse ‚Üí Process ‚Üí Format ‚Üí Output",
        "best_for": ["Option B"],
        "diagram": """
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ  Image  ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Vision  ‚îÇ‚îÄ‚îÄ‚îê
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ Encoder ‚îÇ  ‚îÇ
                            ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îÇ     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                         ‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Fusion  ‚îÇ
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îÇ     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚îÇ  Text   ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Text   ‚îÇ‚îÄ‚îÄ‚îò          ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ Encoder ‚îÇ             ‚ñº
                            ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                              ‚îÇ  Output ‚îÇ
                                              ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
}

def show_pattern(pattern_name: str):
    """Display details of an architecture pattern."""
    pattern = architecture_patterns.get(pattern_name)
    if not pattern:
        print(f"‚ùå Unknown pattern: {pattern_name}")
        return
    
    print(f"\nüèóÔ∏è {pattern['name']}")
    print("="*60)
    print(f"\nüìù {pattern['description']}")
    print(f"\nüéØ Best for: {', '.join(pattern['best_for'])}")
    print(f"\nüì¶ Components:")
    for comp in pattern['components']:
        print(f"   ‚Ä¢ {comp}")
    print(f"\nüîÑ Data Flow:\n   {pattern['data_flow']}")
    print(f"\nüìä Architecture Diagram:")
    print(pattern['diagram'])

# Show all patterns
for pattern in architecture_patterns:
    show_pattern(pattern)
    print("\n" + "-"*60)

### üîç What Just Happened?

We explored four common architecture patterns:

1. **RAG Pipeline** - For knowledge-grounded generation (Options A, B)
2. **Agent Orchestrator** - For multi-agent coordination (Option C)
3. **Training Pipeline** - For model development (Option D)
4. **Multimodal Processor** - For mixed media processing (Option B)

Your project will likely combine elements from multiple patterns. For example, an AI Assistant (Option A) might use:
- RAG Pipeline for knowledge retrieval
- Agent-like tool calling for actions
- Training elements for fine-tuning

---

## Part 2: Component Design Template

Every component in your system should be well-defined. Here's a template for component specifications.

In [None]:
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional
from enum import Enum

class ComponentStatus(Enum):
    PLANNED = "planned"
    IN_PROGRESS = "in_progress"
    COMPLETE = "complete"
    BLOCKED = "blocked"

@dataclass
class ComponentSpec:
    """Specification for a system component."""
    
    name: str
    purpose: str
    inputs: List[Dict[str, str]]  # [{"name": ..., "type": ..., "description": ...}]
    outputs: List[Dict[str, str]]
    dependencies: List[str]  # Names of components this depends on
    technologies: List[str]
    estimated_hours: float
    status: ComponentStatus = ComponentStatus.PLANNED
    notes: str = ""
    
    def to_markdown(self) -> str:
        """Generate markdown documentation for this component."""
        md = f"## {self.name}\n\n"
        md += f"**Purpose:** {self.purpose}\n\n"
        md += f"**Status:** {self.status.value}\n\n"
        md += f"**Estimated Hours:** {self.estimated_hours}\n\n"
        
        md += "### Inputs\n\n"
        md += "| Name | Type | Description |\n"
        md += "|------|------|-------------|\n"
        for inp in self.inputs:
            md += f"| {inp['name']} | `{inp['type']}` | {inp['description']} |\n"
        
        md += "\n### Outputs\n\n"
        md += "| Name | Type | Description |\n"
        md += "|------|------|-------------|\n"
        for out in self.outputs:
            md += f"| {out['name']} | `{out['type']}` | {out['description']} |\n"
        
        md += f"\n### Dependencies\n\n"
        for dep in self.dependencies:
            md += f"- {dep}\n"
        
        md += f"\n### Technologies\n\n"
        for tech in self.technologies:
            md += f"- {tech}\n"
        
        if self.notes:
            md += f"\n### Notes\n\n{self.notes}\n"
        
        return md

# Example: RAG Retriever Component
example_component = ComponentSpec(
    name="RAG Retriever",
    purpose="Retrieve relevant documents from the knowledge base for a given query",
    inputs=[
        {"name": "query", "type": "str", "description": "User's natural language question"},
        {"name": "top_k", "type": "int", "description": "Number of documents to retrieve"},
        {"name": "filters", "type": "dict", "description": "Optional metadata filters"},
    ],
    outputs=[
        {"name": "documents", "type": "List[Document]", "description": "Retrieved documents with scores"},
        {"name": "metadata", "type": "dict", "description": "Retrieval metadata (time, scores, etc.)"},
    ],
    dependencies=["Embedding Model", "Vector Store"],
    technologies=["sentence-transformers", "FAISS", "LangChain"],
    estimated_hours=4.0,
    notes="Consider hybrid search (dense + sparse) for better recall."
)

print(example_component.to_markdown())

In [None]:
# System Architecture Builder

@dataclass
class SystemArchitecture:
    """Complete system architecture specification."""
    
    name: str
    description: str
    components: List[ComponentSpec] = field(default_factory=list)
    
    def add_component(self, component: ComponentSpec):
        """Add a component to the architecture."""
        self.components.append(component)
    
    def get_build_order(self) -> List[str]:
        """Get components in dependency order (topological sort)."""
        # Build dependency graph
        graph = {c.name: set(c.dependencies) for c in self.components}
        all_names = set(graph.keys())
        
        # Topological sort
        order = []
        while graph:
            # Find nodes with no dependencies (or deps outside our system)
            ready = [name for name, deps in graph.items() 
                     if not deps.intersection(all_names - set(order))]
            if not ready:
                raise ValueError("Circular dependency detected!")
            order.extend(sorted(ready))  # Alphabetical for consistency
            for name in ready:
                del graph[name]
        
        return order
    
    def total_hours(self) -> float:
        """Calculate total estimated hours."""
        return sum(c.estimated_hours for c in self.components)
    
    def summary(self):
        """Print architecture summary."""
        print(f"\nüèõÔ∏è ARCHITECTURE: {self.name}")
        print("="*60)
        print(f"\n{self.description}\n")
        
        print(f"üì¶ Components ({len(self.components)}):")
        for c in self.components:
            status_emoji = {
                ComponentStatus.PLANNED: "üìã",
                ComponentStatus.IN_PROGRESS: "üîÑ",
                ComponentStatus.COMPLETE: "‚úÖ",
                ComponentStatus.BLOCKED: "üö´",
            }
            print(f"  {status_emoji[c.status]} {c.name} ({c.estimated_hours}h)")
        
        print(f"\n‚è±Ô∏è Total Estimated Hours: {self.total_hours()}")
        
        print(f"\nüî® Build Order:")
        for i, name in enumerate(self.get_build_order(), 1):
            print(f"  {i}. {name}")

# Example: Build an architecture for Option A (AI Assistant)
assistant_arch = SystemArchitecture(
    name="Domain-Specific AI Assistant",
    description="Fine-tuned LLM with RAG, custom tools, and streaming API"
)

# Add components
assistant_arch.add_component(ComponentSpec(
    name="Embedding Model",
    purpose="Convert text to vector embeddings",
    inputs=[{"name": "text", "type": "str", "description": "Text to embed"}],
    outputs=[{"name": "embedding", "type": "np.ndarray", "description": "768-dim vector"}],
    dependencies=[],
    technologies=["sentence-transformers", "BGE-M3"],
    estimated_hours=2.0
))

assistant_arch.add_component(ComponentSpec(
    name="Vector Store",
    purpose="Store and search document embeddings",
    inputs=[
        {"name": "embeddings", "type": "np.ndarray", "description": "Vectors to store"},
        {"name": "query", "type": "np.ndarray", "description": "Query vector"},
    ],
    outputs=[{"name": "results", "type": "List[tuple]", "description": "(id, score) pairs"}],
    dependencies=["Embedding Model"],
    technologies=["FAISS", "ChromaDB"],
    estimated_hours=3.0
))

assistant_arch.add_component(ComponentSpec(
    name="Document Processor",
    purpose="Parse and chunk documents for indexing",
    inputs=[{"name": "documents", "type": "List[Path]", "description": "Files to process"}],
    outputs=[{"name": "chunks", "type": "List[Chunk]", "description": "Processed chunks"}],
    dependencies=[],
    technologies=["LangChain", "unstructured"],
    estimated_hours=4.0
))

assistant_arch.add_component(ComponentSpec(
    name="RAG Retriever",
    purpose="Retrieve relevant context for queries",
    inputs=[{"name": "query", "type": "str", "description": "User question"}],
    outputs=[{"name": "context", "type": "str", "description": "Retrieved context"}],
    dependencies=["Embedding Model", "Vector Store", "Document Processor"],
    technologies=["LangChain"],
    estimated_hours=4.0
))

assistant_arch.add_component(ComponentSpec(
    name="Fine-tuned LLM",
    purpose="Generate domain-specific responses",
    inputs=[
        {"name": "prompt", "type": "str", "description": "System + user prompt"},
        {"name": "context", "type": "str", "description": "RAG context"},
    ],
    outputs=[{"name": "response", "type": "str", "description": "Model response"}],
    dependencies=[],
    technologies=["transformers", "PEFT", "bitsandbytes"],
    estimated_hours=12.0
))

assistant_arch.add_component(ComponentSpec(
    name="Tool Registry",
    purpose="Manage available tools and their execution",
    inputs=[{"name": "tool_call", "type": "ToolCall", "description": "Tool request"}],
    outputs=[{"name": "result", "type": "str", "description": "Tool output"}],
    dependencies=[],
    technologies=["LangChain", "custom"],
    estimated_hours=6.0
))

assistant_arch.add_component(ComponentSpec(
    name="Orchestrator",
    purpose="Coordinate RAG, LLM, and tools for query handling",
    inputs=[{"name": "user_message", "type": "str", "description": "User input"}],
    outputs=[{"name": "response", "type": "AssistantResponse", "description": "Full response"}],
    dependencies=["RAG Retriever", "Fine-tuned LLM", "Tool Registry"],
    technologies=["custom"],
    estimated_hours=6.0
))

assistant_arch.add_component(ComponentSpec(
    name="Streaming API",
    purpose="FastAPI endpoint with SSE streaming",
    inputs=[{"name": "request", "type": "ChatRequest", "description": "API request"}],
    outputs=[{"name": "stream", "type": "AsyncGenerator", "description": "Token stream"}],
    dependencies=["Orchestrator"],
    technologies=["FastAPI", "SSE"],
    estimated_hours=4.0
))

# Display summary
assistant_arch.summary()

### ‚úã Try It Yourself

Create a `SystemArchitecture` for your chosen project. Use the template above as a starting point.

<details>
<summary>üí° Hints for Each Option</summary>

**Option A (AI Assistant):**
- Start with the example above
- Add domain-specific tools
- Consider evaluation components

**Option B (Document Intelligence):**
- Document Ingestion (PDF, images)
- OCR/Vision components
- Extraction pipeline
- QA system
- Export formatters

**Option C (Agent Swarm):**
- Individual agent definitions
- Coordinator/Orchestrator
- Shared memory system
- Tool registry
- Safety/approval layer

**Option D (Training Pipeline):**
- Data collection
- Preprocessing
- Training loop
- Evaluation framework
- Model registry
- Deployment automation

</details>

---

## Part 3: DGX Spark Resource Planning

Your DGX Spark has incredible resources. Let's plan how to use them effectively.

In [None]:
# DGX Spark Resource Planner

@dataclass
class ModelFootprint:
    """Memory footprint of a model."""
    name: str
    params: str  # e.g., "70B"
    fp32_gb: float
    bf16_gb: float
    int8_gb: float
    int4_gb: float
    fp4_gb: float  # Blackwell native
    
# Common models and their footprints
MODEL_FOOTPRINTS = [
    ModelFootprint("Llama 3.3 8B", "8B", 32.0, 16.0, 8.0, 4.0, 4.0),
    ModelFootprint("Llama 3.3 70B", "70B", 280.0, 140.0, 70.0, 35.0, 35.0),
    ModelFootprint("Llama 3.1 405B", "405B", 1620.0, 810.0, 405.0, 202.0, 202.0),
    ModelFootprint("Qwen2.5 7B", "7B", 28.0, 14.0, 7.0, 3.5, 3.5),
    ModelFootprint("Qwen2.5 72B", "72B", 288.0, 144.0, 72.0, 36.0, 36.0),
    ModelFootprint("BGE-M3 (embedding)", "568M", 2.3, 1.1, 0.6, 0.3, 0.3),
    ModelFootprint("LLaVA 1.6 34B", "34B", 136.0, 68.0, 34.0, 17.0, 17.0),
    ModelFootprint("Whisper Large v3", "1.5B", 6.0, 3.0, 1.5, 0.8, 0.8),
]

DGX_SPARK_MEMORY_GB = 128.0

def plan_memory_usage(models: List[tuple], additional_gb: float = 10.0):
    """
    Plan memory usage for a set of models.
    
    Args:
        models: List of (model_name, precision) tuples
        additional_gb: Buffer for KV cache, activations, etc.
    """
    print("\nüíæ DGX SPARK MEMORY PLANNING")
    print("="*60)
    print(f"Available Memory: {DGX_SPARK_MEMORY_GB} GB\n")
    
    total_used = 0
    
    print("üì¶ Model Allocations:")
    for model_name, precision in models:
        # Find model footprint
        footprint = next((m for m in MODEL_FOOTPRINTS if m.name == model_name), None)
        if not footprint:
            print(f"  ‚ö†Ô∏è Unknown model: {model_name}")
            continue
        
        # Get memory for precision
        precision_map = {
            "fp32": footprint.fp32_gb,
            "bf16": footprint.bf16_gb,
            "int8": footprint.int8_gb,
            "int4": footprint.int4_gb,
            "fp4": footprint.fp4_gb,
        }
        memory = precision_map.get(precision, footprint.bf16_gb)
        total_used += memory
        
        print(f"  ‚Ä¢ {model_name} ({precision}): {memory:.1f} GB")
    
    # Add additional memory
    total_used += additional_gb
    print(f"  ‚Ä¢ Additional (cache, activations): {additional_gb:.1f} GB")
    
    remaining = DGX_SPARK_MEMORY_GB - total_used
    
    print(f"\nüìä Summary:")
    print(f"  Total Used: {total_used:.1f} GB")
    print(f"  Remaining: {remaining:.1f} GB")
    
    if remaining < 0:
        print(f"  ‚ùå OVER BUDGET by {-remaining:.1f} GB!")
        print("  Consider: Use lower precision or smaller models")
    elif remaining < 10:
        print(f"  ‚ö†Ô∏è Tight on memory - be careful with batch sizes")
    else:
        print(f"  ‚úÖ Good memory headroom!")
    
    # Visualize
    print("\n  Memory Bar:")
    used_pct = min(100, (total_used / DGX_SPARK_MEMORY_GB) * 100)
    bar = "‚ñà" * int(used_pct / 2) + "‚ñë" * (50 - int(used_pct / 2))
    print(f"  [{bar}] {used_pct:.0f}%")

# Example: Plan for Option A (AI Assistant)
print("\nüéØ Example: Option A - AI Assistant")
plan_memory_usage([
    ("Llama 3.3 70B", "int4"),
    ("BGE-M3 (embedding)", "bf16"),
], additional_gb=15.0)

print("\n" + "-"*60)

# Example: Plan for Option C (Agent Swarm with multiple smaller models)
print("\nüéØ Example: Option C - Agent Swarm")
plan_memory_usage([
    ("Llama 3.3 8B", "bf16"),  # Main agent
    ("Qwen2.5 7B", "bf16"),     # Code agent
    ("BGE-M3 (embedding)", "bf16"),  # Embedding
], additional_gb=20.0)

In [None]:
# Show all available models
print("\nüìã AVAILABLE MODEL FOOTPRINTS")
print("="*80)
print(f"{'Model':<25} {'Params':<8} {'FP32':<10} {'BF16':<10} {'INT8':<10} {'INT4':<10} {'FP4':<10}")
print("-"*80)

for model in MODEL_FOOTPRINTS:
    print(f"{model.name:<25} {model.params:<8} {model.fp32_gb:<10.1f} {model.bf16_gb:<10.1f} "
          f"{model.int8_gb:<10.1f} {model.int4_gb:<10.1f} {model.fp4_gb:<10.1f}")

print("\n" + "-"*80)
print(f"DGX Spark Memory: {DGX_SPARK_MEMORY_GB} GB (unified CPU+GPU)")
print(f"Note: FP4 is exclusive to Blackwell architecture!")

---

## Part 4: API Contract Design

If your project includes an API, defining contracts early prevents integration headaches.

In [None]:
# API Contract Templates

from typing import List, Optional, Dict, Any
from datetime import datetime
from enum import Enum

# Try to import pydantic for schema validation
try:
    from pydantic import BaseModel, Field
    PYDANTIC_AVAILABLE = True
except ImportError:
    PYDANTIC_AVAILABLE = False
    # Fallback to dataclasses
    from dataclasses import dataclass as BaseModel
    def Field(default=None, description=""):
        return default

# Common API schemas for AI projects

class MessageRole(str, Enum):
    SYSTEM = "system"
    USER = "user"
    ASSISTANT = "assistant"
    TOOL = "tool"

if PYDANTIC_AVAILABLE:
    class Message(BaseModel):
        """A chat message."""
        role: MessageRole = Field(description="Role of the message sender")
        content: str = Field(description="Message content")
        name: Optional[str] = Field(default=None, description="Name for tool messages")
        tool_calls: Optional[List[Dict]] = Field(default=None, description="Tool calls made")

    class ChatRequest(BaseModel):
        """Request to the chat API."""
        messages: List[Message] = Field(description="Conversation history")
        stream: bool = Field(default=True, description="Enable streaming")
        temperature: float = Field(default=0.7, description="Sampling temperature")
        max_tokens: int = Field(default=2048, description="Max tokens to generate")
        tools: Optional[List[Dict]] = Field(default=None, description="Available tools")

    class ChatResponse(BaseModel):
        """Response from the chat API."""
        id: str = Field(description="Response ID")
        message: Message = Field(description="Assistant's response")
        usage: Dict[str, int] = Field(description="Token usage stats")
        latency_ms: float = Field(description="Response latency in milliseconds")

    class DocumentUploadRequest(BaseModel):
        """Request to upload documents to knowledge base."""
        files: List[str] = Field(description="File paths to upload")
        collection: str = Field(default="default", description="Target collection")
        chunk_size: int = Field(default=512, description="Chunk size for splitting")
        chunk_overlap: int = Field(default=50, description="Overlap between chunks")

    class SearchRequest(BaseModel):
        """Request to search the knowledge base."""
        query: str = Field(description="Search query")
        top_k: int = Field(default=5, description="Number of results")
        collection: Optional[str] = Field(default=None, description="Collection to search")
        filters: Optional[Dict] = Field(default=None, description="Metadata filters")

    class SearchResult(BaseModel):
        """A single search result."""
        content: str = Field(description="Document content")
        score: float = Field(description="Relevance score")
        metadata: Dict = Field(description="Document metadata")

    print("‚úÖ API schemas defined with Pydantic")
    
    # Show schema
    print("\nüìã ChatRequest Schema:")
    print(ChatRequest.model_json_schema())
else:
    print("‚ö†Ô∏è Pydantic not available. Install with: pip install pydantic")
    print("   Schemas shown as examples only.")

In [None]:
# API Endpoint Documentation Template

api_endpoints = [
    {
        "method": "POST",
        "path": "/api/v1/chat",
        "description": "Send a message and get a response",
        "request": "ChatRequest",
        "response": "ChatResponse (or SSE stream)",
        "example": {
            "messages": [{"role": "user", "content": "How do I create an S3 bucket?"}],
            "stream": True
        }
    },
    {
        "method": "POST",
        "path": "/api/v1/documents",
        "description": "Upload documents to knowledge base",
        "request": "DocumentUploadRequest",
        "response": "{status, document_ids, chunks_created}",
        "example": {
            "files": ["/data/aws-docs/s3-guide.pdf"],
            "collection": "aws-docs"
        }
    },
    {
        "method": "POST",
        "path": "/api/v1/search",
        "description": "Search the knowledge base",
        "request": "SearchRequest",
        "response": "List[SearchResult]",
        "example": {
            "query": "S3 bucket policy",
            "top_k": 5
        }
    },
    {
        "method": "GET",
        "path": "/api/v1/health",
        "description": "Health check endpoint",
        "request": "None",
        "response": "{status, model_loaded, memory_usage}",
        "example": None
    },
]

print("\nüîå API ENDPOINTS")
print("="*70)
for endpoint in api_endpoints:
    print(f"\n{endpoint['method']} {endpoint['path']}")
    print(f"  Description: {endpoint['description']}")
    print(f"  Request: {endpoint['request']}")
    print(f"  Response: {endpoint['response']}")
    if endpoint['example']:
        import json
        print(f"  Example: {json.dumps(endpoint['example'], indent=4)}")

---

## Part 5: Evaluation Framework Planning

How will you know if your project is successful? Define metrics and evaluation strategy now.

In [None]:
# Evaluation Framework Template

@dataclass
class EvaluationMetric:
    """Definition of an evaluation metric."""
    name: str
    description: str
    target: str  # Target value or range
    measurement_method: str
    frequency: str  # How often to measure

@dataclass  
class EvaluationPlan:
    """Complete evaluation plan for the project."""
    project_name: str
    metrics: List[EvaluationMetric]
    datasets: List[Dict[str, str]]  # [{"name": ..., "source": ..., "size": ...}]
    baselines: List[str]
    
    def display(self):
        print(f"\nüìä EVALUATION PLAN: {self.project_name}")
        print("="*60)
        
        print("\nüìè Metrics:")
        for metric in self.metrics:
            print(f"\n  {metric.name}")
            print(f"    Description: {metric.description}")
            print(f"    Target: {metric.target}")
            print(f"    Method: {metric.measurement_method}")
            print(f"    Frequency: {metric.frequency}")
        
        print("\nüìö Datasets:")
        for ds in self.datasets:
            print(f"  ‚Ä¢ {ds['name']}: {ds['size']} samples from {ds['source']}")
        
        print("\nüéØ Baselines:")
        for baseline in self.baselines:
            print(f"  ‚Ä¢ {baseline}")

# Example: Evaluation plan for AI Assistant
assistant_eval = EvaluationPlan(
    project_name="AWS AI Assistant",
    metrics=[
        EvaluationMetric(
            name="Answer Accuracy",
            description="Percentage of correct answers on test set",
            target="‚â• 80%",
            measurement_method="Human evaluation + automated checks",
            frequency="Weekly + final"
        ),
        EvaluationMetric(
            name="Retrieval Recall@5",
            description="Relevant docs in top 5 retrieved",
            target="‚â• 90%",
            measurement_method="Test with known-answer queries",
            frequency="After RAG changes"
        ),
        EvaluationMetric(
            name="Latency P95",
            description="95th percentile response time",
            target="< 3 seconds",
            measurement_method="API benchmarking",
            frequency="After optimization changes"
        ),
        EvaluationMetric(
            name="Throughput",
            description="Requests per second",
            target="‚â• 10 req/s",
            measurement_method="Load testing",
            frequency="Final evaluation"
        ),
        EvaluationMetric(
            name="User Satisfaction",
            description="Self-reported helpfulness score",
            target="‚â• 4.0/5.0",
            measurement_method="Demo session feedback",
            frequency="Demo sessions"
        ),
    ],
    datasets=[
        {"name": "AWS FAQ Test Set", "source": "Hand-curated from AWS forums", "size": "100"},
        {"name": "CLI Command Dataset", "source": "Generated from AWS CLI docs", "size": "500"},
        {"name": "Edge Cases", "source": "Identified during development", "size": "50"},
    ],
    baselines=[
        "Raw Llama 3.3 70B (no fine-tuning, no RAG)",
        "GPT-4 with AWS docs in context (if available)",
        "AWS official documentation search",
    ]
)

assistant_eval.display()

---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Vague Architecture
```python
# ‚ùå Wrong: Too vague
components = ["data stuff", "model", "api"]

# ‚úÖ Right: Specific and actionable
components = [
    "DocumentParser: Extract text from PDFs using pypdf2",
    "ChunkingService: Split into 512-token chunks with 50 overlap",
    "EmbeddingService: Use BGE-M3 to create 768-dim vectors",
    "VectorStore: FAISS index with IVF for fast search",
    "..."
]
```

### Mistake 2: No Interface Definitions
```python
# ‚ùå Wrong: "The retriever will talk to the LLM somehow"

# ‚úÖ Right: Clear interface
def retrieve(query: str, top_k: int = 5) -> List[Document]:
    """Returns documents with content, score, and metadata."""
    ...

def generate(prompt: str, context: List[Document]) -> str:
    """Generates response using context."""
    ...
```

### Mistake 3: Ignoring Memory Constraints
```python
# ‚ùå Wrong: "I'll just load everything"
models = ["llama-70b-fp16", "llava-34b-fp16", "whisper-large"]  # 85+68+3 = 156 GB!

# ‚úÖ Right: Plan memory carefully
models = [
    ("llama-70b", "int4", 35),  # 35 GB
    ("bge-m3", "bf16", 1.1),     # 1.1 GB  
    ("overhead", "-", 20),       # 20 GB for KV cache, etc.
]  # Total: 56.1 GB - plenty of headroom!
```

---

## üéâ Checkpoint

You've completed project planning! You should now have:

- ‚úÖ System architecture with all components defined
- ‚úÖ Build order based on dependencies
- ‚úÖ Memory plan for DGX Spark
- ‚úÖ API contracts (if applicable)
- ‚úÖ Evaluation plan with metrics and datasets

---

## üöÄ Next Steps

1. **Document your architecture** in `docs/architecture.md`
2. **Review your project proposal** - update based on planning
3. **Open your project-specific guide:**
   - Option A: `lab-4.3.2-option-a-ai-assistant.ipynb`
   - Option B: `lab-4.3.3-option-b-document-intelligence.ipynb`
   - Option C: `lab-4.3.4-option-c-agent-swarm.ipynb`
   - Option D: `lab-4.3.5-option-d-training-pipeline.ipynb`
4. **Start building!**

---

## üìñ Further Reading

- [Designing Machine Learning Systems (Chip Huyen)](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/)
- [System Design for LLM Applications](https://huyenchip.com/2023/04/11/llm-engineering.html)
- [The Architecture of Open Source Applications](https://aosabook.org/en/index.html)

---

In [None]:
# üßπ Cleanup
print("‚úÖ No cleanup needed - your architecture plans are saved!")
print("\nüìù Next: Document your architecture and start implementation.")