# Lab 4.6.1: Project Planning & Architecture Design

**Module:** 4.6 - Capstone Project (Domain 4: Production AI)
**Time:** 4-6 hours
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Design a complete system architecture for your project
- [ ] Create detailed component specifications
- [ ] Plan your DGX Spark memory budget
- [ ] Define API contracts and interfaces
- [ ] Plan your safety considerations üõ°Ô∏è
- [ ] Set up evaluation infrastructure

---

## üìö Prerequisites

- Completed: `lab-4.6.0-project-kickoff.ipynb`
- Selected: Your project option (A, B, C, or D)
- Started: Initial project proposal

---

## üåç Real-World Context

At companies like Google, Meta, and OpenAI, engineers spend **20-30% of project time** on planning and design. This isn't wasted time - it's the foundation for everything that follows.

### Why Architecture Matters

| Without Architecture | With Architecture |
|---------------------|-------------------|
| "Let me just start coding..." | "Here's what we're building..." |
| Constant rewrites | Incremental progress |
| Integration nightmares | Clean interfaces |
| "It works on my machine" | Reproducible everywhere |
| Safety as afterthought | Safety built-in üõ°Ô∏è |

This notebook guides you through the same planning process used in production AI systems.

---

## üßí ELI5: System Architecture

> **Imagine you're building a treehouse.** Before picking up a hammer, you'd want to:
>
> 1. **Sketch a plan** - Where does the door go? How big is the window?
> 2. **List materials** - How much wood? What kind of nails?
> 3. **Plan the order** - Build the floor before the walls!
> 4. **Think about safety** - Add railings so nobody falls! üõ°Ô∏è
> 5. **Think about problems** - What if it rains during construction?
>
> **System architecture is your blueprint.** It shows:
> - What pieces you're building (components)
> - How they connect (interfaces)
> - What they're made of (technologies)
> - What order to build them (dependencies)
> - How to keep it safe (guardrails)
>
> **Without a blueprint**, you might build the roof first and realize you can't attach it. With one, you build systematically and everything fits together.

---

## Part 1: Architecture Patterns for AI Systems

Let's explore common patterns used in production AI systems. Your project will combine elements from these.

In [None]:
# Architecture Pattern Reference
# These are the building blocks for your system design

architecture_patterns = {
    "rag_pipeline": {
        "name": "RAG Pipeline",
        "description": "Retrieval-Augmented Generation for knowledge-grounded responses",
        "components": [
            "Document Loader",
            "Chunker/Splitter",
            "Embedding Model",
            "Vector Store",
            "Retriever (with reranking)",
            "LLM Generator",
            "Response Formatter",
        ],
        "data_flow": "Query ‚Üí Embed ‚Üí Search ‚Üí Retrieve ‚Üí Rerank ‚Üí Augment ‚Üí Generate ‚Üí Response",
        "best_for": ["Option A", "Option B"],
        "dgx_advantage": "Keep embedding model + LLM in memory simultaneously",
        "diagram": r"""
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Query     ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Embedder   ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Vector DB   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ
                                               ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Response   ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ     LLM     ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ  Documents  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "agent_orchestrator": {
        "name": "Agent Orchestrator",
        "description": "Central coordinator managing specialized agents with safety",
        "components": [
            "Orchestrator/Router",
            "Task Planner",
            "Agent Pool",
            "Tool Registry",
            "Memory Manager",
            "Safety Layer üõ°Ô∏è",
            "Human Approval Gate",
        ],
        "data_flow": "Task ‚Üí Plan ‚Üí Route ‚Üí Execute ‚Üí Verify Safety ‚Üí Aggregate ‚Üí Output",
        "best_for": ["Option C"],
        "dgx_advantage": "Run multiple smaller models concurrently",
        "diagram": r"""
                    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                    ‚îÇ  Orchestrator   ‚îÇ
                    ‚îÇ  + Safety üõ°Ô∏è    ‚îÇ
                    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
              ‚ñº              ‚ñº              ‚ñº
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ Agent A  ‚îÇ  ‚îÇ Agent B  ‚îÇ  ‚îÇ Agent C  ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
             ‚îÇ             ‚îÇ             ‚îÇ
             ‚ñº             ‚ñº             ‚ñº
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ Tools A  ‚îÇ  ‚îÇ Tools B  ‚îÇ  ‚îÇ Tools C  ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "training_pipeline": {
        "name": "Training Pipeline",
        "description": "End-to-end model training and deployment workflow",
        "components": [
            "Data Collector",
            "Quality Filter",
            "Preprocessor",
            "Trainer (SFT/DPO)",
            "Evaluator",
            "Model Registry",
            "Deployment Manager",
            "Red Team Evaluator üõ°Ô∏è",
        ],
        "data_flow": "Collect ‚Üí Filter ‚Üí Clean ‚Üí Train ‚Üí Evaluate ‚Üí Safety Check ‚Üí Register ‚Üí Deploy",
        "best_for": ["Option D"],
        "dgx_advantage": "QLoRA for 100B+ models, full fine-tune for 16B",
        "diagram": r"""
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Data   ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Clean  ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Train  ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Eval   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ
                                     Pass?  ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ Fail?
                                        ‚îÇ      ‚îÇ      ‚îÇ
                                        ‚ñº      ‚îÇ      ‚ñº
                                   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                   ‚îÇ Deploy  ‚îÇ ‚îÇ ‚îÇ Iterate ‚îÇ
                                   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                               ‚îÇ      ‚îÇ
                                               ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
    
    "multimodal_processor": {
        "name": "Multimodal Processor",
        "description": "Process and understand multiple data modalities",
        "components": [
            "Input Router",
            "Vision Encoder (OCR/Layout)",
            "Text Encoder",
            "Fusion Layer",
            "Vision-Language Model",
            "Task Head",
            "Output Formatter",
            "Content Filter üõ°Ô∏è",
        ],
        "data_flow": "Input ‚Üí Route ‚Üí Encode ‚Üí Fuse ‚Üí Process ‚Üí Filter ‚Üí Format ‚Üí Output",
        "best_for": ["Option B"],
        "dgx_advantage": "34B VLMs with full resolution images",
        "diagram": r"""
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
        ‚îÇ  Image  ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Vision  ‚îÇ‚îÄ‚îÄ‚îê
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ Encoder ‚îÇ  ‚îÇ
                            ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  ‚îÇ     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                         ‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Fusion  ‚îÇ
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê  ‚îÇ     ‚îÇ   VLM   ‚îÇ
        ‚îÇ  Text   ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Text   ‚îÇ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ Encoder ‚îÇ             ‚îÇ
                            ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò             ‚ñº
                                              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                              ‚îÇ  Output ‚îÇ
                                              ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        """
    },
}

def show_pattern(pattern_name: str):
    """Display details of an architecture pattern."""
    pattern = architecture_patterns.get(pattern_name)
    if not pattern:
        print(f"‚ùå Unknown pattern: {pattern_name}")
        return
    
    print(f"\nüèóÔ∏è {pattern['name']}")
    print("="*70)
    print(f"\nüìù {pattern['description']}")
    print(f"\nüéØ Best for: {', '.join(pattern['best_for'])}")
    print(f"üöÄ DGX Advantage: {pattern['dgx_advantage']}")
    print(f"\nüì¶ Components:")
    for comp in pattern['components']:
        print(f"   ‚Ä¢ {comp}")
    print(f"\nüîÑ Data Flow:")
    print(f"   {pattern['data_flow']}")
    print(f"\nüìä Architecture:{pattern['diagram']}")

# Show all patterns
for pattern in architecture_patterns:
    show_pattern(pattern)
    print("\n" + "-"*70)

### üîç What Just Happened?

We explored four common architecture patterns:

1. **RAG Pipeline** - For knowledge-grounded generation (Options A, B)
2. **Agent Orchestrator** - For multi-agent coordination (Option C)
3. **Training Pipeline** - For model development (Option D)
4. **Multimodal Processor** - For mixed media processing (Option B)

Your project will likely **combine elements** from multiple patterns. For example, an AI Assistant (Option A) might use:
- RAG Pipeline for knowledge retrieval
- Agent-like tool calling for actions
- Safety layer from orchestrator pattern

---

## Part 2: Component Specification Template

Every component in your system should be well-defined. Here's a template for component specifications.

In [None]:
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional
from enum import Enum

class ComponentStatus(Enum):
    """Status of a component in development."""
    PLANNED = "üìã Planned"
    IN_PROGRESS = "üîÑ In Progress"
    COMPLETE = "‚úÖ Complete"
    BLOCKED = "üö´ Blocked"

@dataclass
class ComponentSpec:
    """
    Specification for a system component.
    
    Use this to document each piece of your system.
    """
    
    name: str
    purpose: str
    inputs: List[Dict[str, str]]  # [{"name": ..., "type": ..., "description": ...}]
    outputs: List[Dict[str, str]]
    dependencies: List[str]  # Names of components this depends on
    technologies: List[str]
    estimated_hours: float
    memory_gb: float = 0.0  # GPU memory requirement
    status: ComponentStatus = ComponentStatus.PLANNED
    safety_considerations: str = ""  # üõ°Ô∏è
    notes: str = ""
    
    def to_markdown(self) -> str:
        """Generate markdown documentation for this component."""
        md = f"## {self.name}\n\n"
        md += f"**Purpose:** {self.purpose}\n\n"
        md += f"**Status:** {self.status.value}\n\n"
        md += f"**Estimated Hours:** {self.estimated_hours}\n\n"
        md += f"**Memory Requirement:** {self.memory_gb} GB\n\n"
        
        md += "### Inputs\n\n"
        md += "| Name | Type | Description |\n"
        md += "|------|------|-------------|\n"
        for inp in self.inputs:
            md += f"| {inp['name']} | `{inp['type']}` | {inp['description']} |\n"
        
        md += "\n### Outputs\n\n"
        md += "| Name | Type | Description |\n"
        md += "|------|------|-------------|\n"
        for out in self.outputs:
            md += f"| {out['name']} | `{out['type']}` | {out['description']} |\n"
        
        md += f"\n### Dependencies\n\n"
        for dep in self.dependencies:
            md += f"- {dep}\n"
        
        md += f"\n### Technologies\n\n"
        for tech in self.technologies:
            md += f"- {tech}\n"
        
        if self.safety_considerations:
            md += f"\n### Safety Considerations üõ°Ô∏è\n\n{self.safety_considerations}\n"
        
        if self.notes:
            md += f"\n### Notes\n\n{self.notes}\n"
        
        return md

# Example: RAG Retriever Component
example_component = ComponentSpec(
    name="RAG Retriever",
    purpose="Retrieve relevant documents from the knowledge base for a given query",
    inputs=[
        {"name": "query", "type": "str", "description": "User's natural language question"},
        {"name": "top_k", "type": "int", "description": "Number of documents to retrieve"},
        {"name": "filters", "type": "dict", "description": "Optional metadata filters"},
    ],
    outputs=[
        {"name": "documents", "type": "List[Document]", "description": "Retrieved documents with scores"},
        {"name": "metadata", "type": "dict", "description": "Retrieval metadata (time, scores, etc.)"},
    ],
    dependencies=["Embedding Model", "Vector Store"],
    technologies=["sentence-transformers", "FAISS", "LangChain"],
    estimated_hours=4.0,
    memory_gb=1.5,
    safety_considerations="Filter retrieved content for sensitive information before passing to LLM.",
    notes="Consider hybrid search (dense + sparse) for better recall."
)

print(example_component.to_markdown())

In [None]:
# System Architecture Builder

@dataclass
class SystemArchitecture:
    """
    Complete system architecture specification.
    
    Use this to plan your entire capstone project.
    """
    
    name: str
    description: str
    option: str  # A, B, C, or D
    components: List[ComponentSpec] = field(default_factory=list)
    
    def add_component(self, component: ComponentSpec):
        """Add a component to the architecture."""
        self.components.append(component)
    
    def get_build_order(self) -> List[str]:
        """
        Get components in dependency order (topological sort).
        Build these in order!
        """
        # Build dependency graph
        all_names = {c.name for c in self.components}
        graph = {c.name: set(c.dependencies).intersection(all_names) for c in self.components}
        
        # Topological sort
        order = []
        remaining = set(graph.keys())
        
        while remaining:
            # Find nodes with no unprocessed dependencies
            ready = [
                name for name in remaining 
                if not graph[name].intersection(remaining - {name})
            ]
            if not ready:
                raise ValueError("Circular dependency detected!")
            
            order.extend(sorted(ready))
            for name in ready:
                remaining.remove(name)
        
        return order
    
    def total_hours(self) -> float:
        """Calculate total estimated hours."""
        return sum(c.estimated_hours for c in self.components)
    
    def total_memory(self) -> float:
        """Calculate total memory requirement (may overlap)."""
        return sum(c.memory_gb for c in self.components)
    
    def summary(self):
        """Print architecture summary."""
        status_counts = {}
        for c in self.components:
            status_counts[c.status.value] = status_counts.get(c.status.value, 0) + 1
        
        print(f"\nüèõÔ∏è ARCHITECTURE: {self.name}")
        print("="*70)
        print(f"Option: {self.option}")
        print(f"\n{self.description}\n")
        
        print(f"üì¶ Components ({len(self.components)}):")
        for c in self.components:
            mem_str = f" [{c.memory_gb}GB]" if c.memory_gb > 0 else ""
            print(f"  {c.status.value} {c.name} ({c.estimated_hours}h){mem_str}")
        
        print(f"\nüìä Status Summary:")
        for status, count in status_counts.items():
            print(f"  {status}: {count}")
        
        print(f"\n‚è±Ô∏è Total Estimated Hours: {self.total_hours():.0f}")
        print(f"üíæ Total Memory Estimate: {self.total_memory():.1f} GB")
        print(f"   DGX Spark Available: 128 GB ‚Üí {'‚úÖ Fits!' if self.total_memory() < 110 else '‚ö†Ô∏è Review memory plan'}")
        
        print(f"\nüî® Recommended Build Order:")
        for i, name in enumerate(self.get_build_order(), 1):
            print(f"  {i}. {name}")

# Example: Architecture for Option A (AI Assistant)
print("\nüìù EXAMPLE: Building an architecture for Option A\n")

assistant_arch = SystemArchitecture(
    name="AWS Infrastructure AI Assistant",
    description="Fine-tuned LLM with RAG, custom tools, safety guardrails, and streaming API",
    option="A"
)

In [None]:
# Add components to the example architecture

assistant_arch.add_component(ComponentSpec(
    name="Embedding Model",
    purpose="Convert text to vector embeddings for similarity search",
    inputs=[{"name": "text", "type": "str", "description": "Text to embed"}],
    outputs=[{"name": "embedding", "type": "np.ndarray", "description": "1024-dim vector"}],
    dependencies=[],
    technologies=["sentence-transformers", "BGE-M3"],
    estimated_hours=2.0,
    memory_gb=1.5,
))

assistant_arch.add_component(ComponentSpec(
    name="Vector Store",
    purpose="Store and search document embeddings efficiently",
    inputs=[
        {"name": "embeddings", "type": "np.ndarray", "description": "Vectors to store"},
        {"name": "query", "type": "np.ndarray", "description": "Query vector"},
    ],
    outputs=[{"name": "results", "type": "List[tuple]", "description": "(id, score) pairs"}],
    dependencies=["Embedding Model"],
    technologies=["FAISS-GPU", "ChromaDB"],
    estimated_hours=3.0,
    memory_gb=2.0,
))

assistant_arch.add_component(ComponentSpec(
    name="Document Processor",
    purpose="Parse, chunk, and prepare documents for indexing",
    inputs=[{"name": "documents", "type": "List[Path]", "description": "Files to process"}],
    outputs=[{"name": "chunks", "type": "List[Chunk]", "description": "Processed chunks"}],
    dependencies=[],
    technologies=["LangChain", "unstructured", "PyPDF"],
    estimated_hours=4.0,
    memory_gb=0.5,
))

assistant_arch.add_component(ComponentSpec(
    name="RAG Retriever",
    purpose="Retrieve relevant context for user queries",
    inputs=[{"name": "query", "type": "str", "description": "User question"}],
    outputs=[{"name": "context", "type": "str", "description": "Retrieved context"}],
    dependencies=["Embedding Model", "Vector Store", "Document Processor"],
    technologies=["LangChain", "Hybrid Search"],
    estimated_hours=4.0,
    memory_gb=0.5,
    safety_considerations="Filter retrieved content for sensitive PII before use.",
))

assistant_arch.add_component(ComponentSpec(
    name="Fine-tuned LLM",
    purpose="Generate domain-specific responses using QLoRA-trained 70B model",
    inputs=[
        {"name": "prompt", "type": "str", "description": "System + user prompt"},
        {"name": "context", "type": "str", "description": "RAG context"},
    ],
    outputs=[{"name": "response", "type": "str", "description": "Model response"}],
    dependencies=[],
    technologies=["transformers", "PEFT", "bitsandbytes"],
    estimated_hours=12.0,
    memory_gb=38.0,  # 70B in INT4
    safety_considerations="Apply guardrails BEFORE returning response.",
))

assistant_arch.add_component(ComponentSpec(
    name="Tool Registry",
    purpose="Manage available tools and execute them safely",
    inputs=[{"name": "tool_call", "type": "ToolCall", "description": "Tool request"}],
    outputs=[{"name": "result", "type": "str", "description": "Tool output"}],
    dependencies=[],
    technologies=["LangChain Tools", "custom"],
    estimated_hours=6.0,
    memory_gb=0.1,
    safety_considerations="Validate tool inputs, limit destructive operations.",
))

assistant_arch.add_component(ComponentSpec(
    name="Safety Guardrails üõ°Ô∏è",
    purpose="Filter inputs/outputs for safety violations",
    inputs=[{"name": "text", "type": "str", "description": "Text to check"}],
    outputs=[{"name": "safe", "type": "bool", "description": "Whether text is safe"}],
    dependencies=[],
    technologies=["NeMo Guardrails", "Llama Guard"],
    estimated_hours=5.0,
    memory_gb=4.0,  # Llama Guard 8B
    safety_considerations="This IS the safety component!",
))

assistant_arch.add_component(ComponentSpec(
    name="Orchestrator",
    purpose="Coordinate RAG, LLM, tools, and safety for query handling",
    inputs=[{"name": "user_message", "type": "str", "description": "User input"}],
    outputs=[{"name": "response", "type": "AssistantResponse", "description": "Full response"}],
    dependencies=["RAG Retriever", "Fine-tuned LLM", "Tool Registry", "Safety Guardrails üõ°Ô∏è"],
    technologies=["custom"],
    estimated_hours=6.0,
    memory_gb=0.1,
))

assistant_arch.add_component(ComponentSpec(
    name="Streaming API",
    purpose="FastAPI endpoint with SSE streaming support",
    inputs=[{"name": "request", "type": "ChatRequest", "description": "API request"}],
    outputs=[{"name": "stream", "type": "AsyncGenerator", "description": "Token stream"}],
    dependencies=["Orchestrator"],
    technologies=["FastAPI", "SSE", "uvicorn"],
    estimated_hours=4.0,
    memory_gb=0.1,
))

assistant_arch.add_component(ComponentSpec(
    name="Gradio Demo",
    purpose="Interactive chat interface for demonstrations",
    inputs=[{"name": "message", "type": "str", "description": "User message"}],
    outputs=[{"name": "response", "type": "str", "description": "Assistant response"}],
    dependencies=["Streaming API"],
    technologies=["Gradio"],
    estimated_hours=3.0,
    memory_gb=0.1,
))

# Display the architecture summary
assistant_arch.summary()

### ‚úã Try It Yourself

Create a `SystemArchitecture` for YOUR chosen project. Use the template above as a starting point.

<details>
<summary>üí° Component Ideas for Each Option</summary>

**Option A (AI Assistant):** Use the example above!

**Option B (Document Intelligence):**
- PDF Parser, Image Processor, OCR Engine
- Vision-Language Model (LLaVA/Qwen-VL)
- Schema Extractor, Entity Recognizer
- Multimodal RAG, QA System
- Export Formatter (JSON, CSV)
- Content Filter (PII detection)

**Option C (Agent Swarm):**
- Coordinator Agent
- Specialized Agents (Research, Code, Data, etc.)
- Tool Registry
- Shared Memory Store
- Human Approval Gate üõ°Ô∏è
- Action Validator üõ°Ô∏è

**Option D (Training Pipeline):**
- Data Collector, Quality Filter
- Preprocessor, Format Converter
- SFT Trainer, DPO Trainer
- Evaluation Suite
- Model Registry (MLflow)
- Red Team Evaluator üõ°Ô∏è
</details>

---

## Part 3: DGX Spark Memory Planning

Your DGX Spark has 128GB unified memory. Let's plan how to use it effectively.

In [None]:
# DGX Spark Memory Planner

from dataclasses import dataclass
from typing import List, Tuple

@dataclass
class ModelFootprint:
    """Memory footprint of a model at different precisions."""
    name: str
    params: str  # e.g., "70B"
    fp32_gb: float
    bf16_gb: float
    int8_gb: float
    int4_gb: float
    nvfp4_gb: float  # Blackwell native

# Common models and their footprints (approximate)
MODEL_FOOTPRINTS = [
    ModelFootprint("Llama 3.3 8B", "8B", 32, 16, 8, 4.5, 4.5),
    ModelFootprint("Llama 3.3 70B", "70B", 280, 140, 70, 38, 38),
    ModelFootprint("Llama 3.1 405B", "405B", 1620, 810, 405, 210, 210),
    ModelFootprint("Qwen2.5 7B", "7B", 28, 14, 7, 4, 4),
    ModelFootprint("Qwen2.5 72B", "72B", 288, 144, 72, 40, 40),
    ModelFootprint("LLaVA 1.6 34B", "34B", 136, 68, 34, 18, 18),
    ModelFootprint("Qwen2-VL 7B", "7B", 28, 14, 7, 4, 4),
    ModelFootprint("BGE-M3 (embedding)", "568M", 2.3, 1.2, 0.6, 0.4, 0.4),
    ModelFootprint("Llama Guard 3 8B", "8B", 32, 16, 8, 4.5, 4.5),
    ModelFootprint("Whisper Large v3", "1.5B", 6, 3, 1.5, 0.8, 0.8),
]

DGX_SPARK_MEMORY_GB = 128.0
SYSTEM_RESERVE_GB = 8.0  # Leave some headroom

def plan_memory(
    models: List[Tuple[str, str]],  # [(model_name, precision), ...]
    additional_gb: float = 10.0,    # KV cache, activations, etc.
    training: bool = False          # Training needs more memory
):
    """
    Plan memory usage for a set of models.
    
    Args:
        models: List of (model_name, precision) tuples
                precision: "fp32", "bf16", "int8", "int4", "nvfp4"
        additional_gb: Extra memory for KV cache, activations
        training: If True, add training overhead (gradients, optimizer)
    """
    print("\nüíæ DGX SPARK MEMORY PLAN")
    print("="*70)
    print(f"Available: {DGX_SPARK_MEMORY_GB} GB (unified CPU+GPU)")
    print(f"System Reserve: {SYSTEM_RESERVE_GB} GB")
    print(f"Usable: {DGX_SPARK_MEMORY_GB - SYSTEM_RESERVE_GB} GB\n")
    
    total_used = 0
    
    print("üì¶ Model Allocations:")
    print("-"*70)
    
    for model_name, precision in models:
        # Find model footprint
        footprint = None
        for m in MODEL_FOOTPRINTS:
            if m.name.lower() == model_name.lower():
                footprint = m
                break
        
        if not footprint:
            print(f"  ‚ö†Ô∏è Unknown model: {model_name}")
            continue
        
        # Get memory for precision
        precision_map = {
            "fp32": footprint.fp32_gb,
            "bf16": footprint.bf16_gb,
            "fp16": footprint.bf16_gb,
            "int8": footprint.int8_gb,
            "int4": footprint.int4_gb,
            "nvfp4": footprint.nvfp4_gb,
        }
        memory = precision_map.get(precision.lower(), footprint.bf16_gb)
        total_used += memory
        
        print(f"  ‚Ä¢ {model_name:<25} ({precision:>6}): {memory:>6.1f} GB")
    
    # Training overhead
    training_overhead = 0
    if training:
        # Gradients + optimizer states for LoRA
        training_overhead = 8.0
        print(f"  ‚Ä¢ Training overhead (LoRA):          {training_overhead:>6.1f} GB")
        total_used += training_overhead
    
    # Additional memory
    print(f"  ‚Ä¢ KV cache, activations, buffer:     {additional_gb:>6.1f} GB")
    total_used += additional_gb
    
    print("-"*70)
    remaining = DGX_SPARK_MEMORY_GB - SYSTEM_RESERVE_GB - total_used
    
    print(f"\nüìä Summary:")
    print(f"  Total Model Memory: {total_used - additional_gb - training_overhead:.1f} GB")
    print(f"  Total Used: {total_used:.1f} GB")
    print(f"  Remaining: {remaining:.1f} GB")
    
    # Status bar
    used_pct = min(100, (total_used / (DGX_SPARK_MEMORY_GB - SYSTEM_RESERVE_GB)) * 100)
    bar = "‚ñà" * int(used_pct / 2) + "‚ñë" * (50 - int(used_pct / 2))
    print(f"\n  [{bar}] {used_pct:.0f}%")
    
    if remaining < 0:
        print(f"\n  ‚ùå OVER BUDGET by {-remaining:.1f} GB!")
        print("  Consider: Use INT4/NVFP4 or smaller models")
    elif remaining < 15:
        print(f"\n  ‚ö†Ô∏è Tight on memory - reduce batch size if needed")
    else:
        print(f"\n  ‚úÖ Good memory headroom!")
    
    return total_used, remaining

# Example: Option A - AI Assistant
print("\nüéØ EXAMPLE: Option A - AI Assistant Memory Plan")
plan_memory([
    ("Llama 3.3 70B", "int4"),        # Main LLM
    ("BGE-M3 (embedding)", "bf16"),    # Embedding model
    ("Llama Guard 3 8B", "int4"),      # Safety model
], additional_gb=15.0, training=True)

In [None]:
# More examples for other options

print("\nüéØ EXAMPLE: Option B - Document Intelligence Memory Plan")
plan_memory([
    ("LLaVA 1.6 34B", "int4"),         # Vision-Language model
    ("BGE-M3 (embedding)", "bf16"),    # Multimodal embedding
], additional_gb=20.0)  # Higher for image processing

print("\n" + "="*70)

print("\nüéØ EXAMPLE: Option C - Agent Swarm Memory Plan")
plan_memory([
    ("Llama 3.3 8B", "bf16"),          # Coordinator
    ("Qwen2.5 7B", "bf16"),            # Code agent
    ("Qwen2.5 7B", "bf16"),            # Research agent
    ("BGE-M3 (embedding)", "bf16"),    # Memory embedding
    ("Llama Guard 3 8B", "int4"),      # Safety
], additional_gb=15.0)

print("\n" + "="*70)

print("\nüéØ EXAMPLE: Option D - Training Pipeline Memory Plan")
plan_memory([
    ("Llama 3.3 70B", "int4"),         # Base model for QLoRA
], additional_gb=20.0, training=True)

In [None]:
# Model footprint reference table

print("\nüìã MODEL FOOTPRINT REFERENCE")
print("="*90)
print(f"{'Model':<28} {'Params':<8} {'FP32':>8} {'BF16':>8} {'INT8':>8} {'INT4':>8} {'NVFP4':>8}")
print("-"*90)

for model in MODEL_FOOTPRINTS:
    print(f"{model.name:<28} {model.params:<8} {model.fp32_gb:>7.1f}G {model.bf16_gb:>7.1f}G "
          f"{model.int8_gb:>7.1f}G {model.int4_gb:>7.1f}G {model.nvfp4_gb:>7.1f}G")

print("-"*90)
print(f"\nüí° DGX Spark Capacity: {DGX_SPARK_MEMORY_GB}GB unified memory")
print("   NVFP4 is exclusive to Blackwell architecture!")

---

## Part 4: API Contract Design

If your project includes an API, defining contracts early prevents integration headaches.

In [None]:
# API Contract Templates using Pydantic

from typing import List, Optional, Dict, Any
from enum import Enum
import json

# Try Pydantic v2, fall back to v1
try:
    from pydantic import BaseModel, Field
    PYDANTIC_AVAILABLE = True
except ImportError:
    PYDANTIC_AVAILABLE = False
    print("‚ö†Ô∏è Pydantic not installed. Run: pip install pydantic")

if PYDANTIC_AVAILABLE:
    # Common schemas for AI projects
    
    class MessageRole(str, Enum):
        SYSTEM = "system"
        USER = "user"
        ASSISTANT = "assistant"
        TOOL = "tool"

    class Message(BaseModel):
        """A chat message."""
        role: MessageRole = Field(description="Role of the message sender")
        content: str = Field(description="Message content")
        name: Optional[str] = Field(default=None, description="Name for tool messages")

    class ToolCall(BaseModel):
        """A tool call request."""
        id: str = Field(description="Unique tool call ID")
        name: str = Field(description="Tool name")
        arguments: Dict[str, Any] = Field(description="Tool arguments")

    class ChatRequest(BaseModel):
        """Request to the chat API."""
        messages: List[Message] = Field(description="Conversation history")
        stream: bool = Field(default=True, description="Enable streaming")
        temperature: float = Field(default=0.7, ge=0, le=2, description="Sampling temperature")
        max_tokens: int = Field(default=2048, ge=1, le=8192, description="Max tokens to generate")
        tools: Optional[List[Dict]] = Field(default=None, description="Available tools")

    class UsageStats(BaseModel):
        """Token usage statistics."""
        prompt_tokens: int
        completion_tokens: int
        total_tokens: int

    class ChatResponse(BaseModel):
        """Response from the chat API."""
        id: str = Field(description="Response ID")
        message: Message = Field(description="Assistant's response")
        tool_calls: Optional[List[ToolCall]] = Field(default=None, description="Tool calls if any")
        sources: List[Dict[str, Any]] = Field(default=[], description="Retrieved sources")
        usage: UsageStats = Field(description="Token usage")
        latency_ms: float = Field(description="Response latency")
        safety_filtered: bool = Field(default=False, description="Whether safety filter was triggered")

    class HealthResponse(BaseModel):
        """Health check response."""
        status: str = Field(description="Service status")
        model_loaded: bool = Field(description="Whether model is loaded")
        gpu_memory_used_gb: float = Field(description="GPU memory in use")
        guardrails_active: bool = Field(default=True, description="Safety guardrails status")

    print("‚úÖ API schemas defined with Pydantic")
    
    # Show example request
    example_request = ChatRequest(
        messages=[
            Message(role=MessageRole.SYSTEM, content="You are a helpful assistant."),
            Message(role=MessageRole.USER, content="How do I create an S3 bucket?"),
        ],
        stream=True,
        temperature=0.7,
        max_tokens=1024,
    )
    
    print("\nüìã Example ChatRequest:")
    print(example_request.model_dump_json(indent=2))

In [None]:
# API Endpoint Documentation

api_endpoints = [
    {
        "method": "POST",
        "path": "/v1/chat/completions",
        "description": "Send messages and get AI response (OpenAI compatible)",
        "request": "ChatRequest",
        "response": "ChatResponse or SSE stream",
        "safety": "Input validated by guardrails before processing",
    },
    {
        "method": "POST",
        "path": "/v1/embeddings",
        "description": "Generate embeddings for text",
        "request": "{input: str | List[str]}",
        "response": "{embeddings: List[List[float]]}",
        "safety": "N/A",
    },
    {
        "method": "POST",
        "path": "/v1/documents",
        "description": "Upload documents to knowledge base",
        "request": "{files: List[File], collection: str}",
        "response": "{document_ids: List[str], chunks_created: int}",
        "safety": "Files scanned for malicious content",
    },
    {
        "method": "POST",
        "path": "/v1/search",
        "description": "Search the knowledge base",
        "request": "{query: str, top_k: int, filters: dict}",
        "response": "{results: List[SearchResult]}",
        "safety": "N/A",
    },
    {
        "method": "GET",
        "path": "/health",
        "description": "Health check endpoint",
        "request": "None",
        "response": "HealthResponse",
        "safety": "Reports guardrails status",
    },
]

print("\nüîå API ENDPOINTS")
print("="*70)

for endpoint in api_endpoints:
    print(f"\n{endpoint['method']} {endpoint['path']}")
    print(f"  üìù {endpoint['description']}")
    print(f"  ‚û°Ô∏è Request: {endpoint['request']}")
    print(f"  ‚¨ÖÔ∏è Response: {endpoint['response']}")
    print(f"  üõ°Ô∏è Safety: {endpoint['safety']}")

---

## Part 5: Safety Planning üõ°Ô∏è

Every capstone project must include safety considerations. Let's plan yours.

In [None]:
# Safety Planning Template

@dataclass
class SafetyPlan:
    """Safety plan for your capstone project."""
    
    project_option: str
    input_validation: List[str]      # How you validate inputs
    output_filtering: List[str]      # How you filter outputs
    guardrails_used: List[str]       # What guardrails you're using
    human_oversight: List[str]       # Where humans are in the loop
    evaluation_plan: List[str]       # How you'll evaluate safety
    risk_mitigations: Dict[str, str] # Risk -> Mitigation
    
    def display(self):
        print(f"\nüõ°Ô∏è SAFETY PLAN - Option {self.project_option}")
        print("="*70)
        
        print("\nüì• Input Validation:")
        for item in self.input_validation:
            print(f"   ‚Ä¢ {item}")
        
        print("\nüì§ Output Filtering:")
        for item in self.output_filtering:
            print(f"   ‚Ä¢ {item}")
        
        print("\nüöß Guardrails:")
        for item in self.guardrails_used:
            print(f"   ‚Ä¢ {item}")
        
        print("\nüë§ Human Oversight:")
        for item in self.human_oversight:
            print(f"   ‚Ä¢ {item}")
        
        print("\nüìä Safety Evaluation:")
        for item in self.evaluation_plan:
            print(f"   ‚Ä¢ {item}")
        
        print("\n‚ö†Ô∏è Risk Mitigations:")
        for risk, mitigation in self.risk_mitigations.items():
            print(f"   ‚Ä¢ {risk}")
            print(f"     ‚Üí {mitigation}")

# Example safety plan for Option A
option_a_safety = SafetyPlan(
    project_option="A",
    input_validation=[
        "Check input length (max 4096 tokens)",
        "Run through Llama Guard for harmful content",
        "Detect and reject prompt injection attempts",
        "Rate limiting per user",
    ],
    output_filtering=[
        "Run all outputs through NeMo Guardrails",
        "Filter PII from responses",
        "Check for hallucinated commands (AWS)",
        "Validate code snippets before returning",
    ],
    guardrails_used=[
        "NeMo Guardrails with custom rails",
        "Llama Guard 3 for content classification",
        "Custom AWS command validator",
    ],
    human_oversight=[
        "Destructive AWS commands require confirmation",
        "Production deployment changes need approval",
        "Logging all interactions for review",
    ],
    evaluation_plan=[
        "Test with red team prompts (PromptFoo)",
        "Run harmful content benchmark",
        "Measure guardrail false positive rate",
        "User study for edge cases",
    ],
    risk_mitigations={
        "Harmful output": "Multi-layer filtering: Llama Guard + NeMo + custom rules",
        "Prompt injection": "Input sanitization + jailbreak detection",
        "Dangerous commands": "Whitelist safe commands, require confirmation for others",
        "Data leakage": "PII detection and masking in all outputs",
    }
)

option_a_safety.display()

---

## Part 6: Evaluation Planning

How will you know if your project is successful? Define metrics now.

In [None]:
# Evaluation Plan Template

@dataclass
class EvaluationMetric:
    """An evaluation metric definition."""
    name: str
    description: str
    target: str
    measurement: str
    frequency: str

@dataclass
class EvaluationPlan:
    """Complete evaluation plan."""
    project_name: str
    metrics: List[EvaluationMetric]
    datasets: List[Dict[str, str]]
    baselines: List[str]
    safety_metrics: List[EvaluationMetric]  # üõ°Ô∏è
    
    def display(self):
        print(f"\nüìä EVALUATION PLAN: {self.project_name}")
        print("="*70)
        
        print("\nüìè Performance Metrics:")
        for m in self.metrics:
            print(f"\n  üìå {m.name}")
            print(f"     {m.description}")
            print(f"     Target: {m.target}")
            print(f"     Measured by: {m.measurement}")
            print(f"     Frequency: {m.frequency}")
        
        print("\nüõ°Ô∏è Safety Metrics:")
        for m in self.safety_metrics:
            print(f"\n  üìå {m.name}")
            print(f"     {m.description}")
            print(f"     Target: {m.target}")
        
        print("\nüìö Evaluation Datasets:")
        for ds in self.datasets:
            print(f"  ‚Ä¢ {ds['name']}: {ds['size']} samples from {ds['source']}")
        
        print("\nüéØ Baselines for Comparison:")
        for b in self.baselines:
            print(f"  ‚Ä¢ {b}")

# Example evaluation plan for Option A
option_a_eval = EvaluationPlan(
    project_name="AWS AI Assistant",
    metrics=[
        EvaluationMetric(
            name="Answer Accuracy",
            description="Correct answers on AWS-specific test set",
            target="‚â• 80%",
            measurement="Human eval + automated checks",
            frequency="Weekly + final"
        ),
        EvaluationMetric(
            name="Retrieval Recall@5",
            description="Relevant docs in top 5 retrieved",
            target="‚â• 90%",
            measurement="Test with known-answer queries",
            frequency="After RAG changes"
        ),
        EvaluationMetric(
            name="Response Latency (P95)",
            description="95th percentile response time",
            target="< 3 seconds",
            measurement="API load testing",
            frequency="After optimization"
        ),
        EvaluationMetric(
            name="User Satisfaction",
            description="Helpfulness rating",
            target="‚â• 4.0/5.0",
            measurement="Demo session feedback",
            frequency="Demo sessions"
        ),
    ],
    safety_metrics=[
        EvaluationMetric(
            name="Harmful Output Rate",
            description="% of outputs flagged as harmful",
            target="< 0.1%",
            measurement="Red team prompts + Llama Guard",
            frequency="Final evaluation"
        ),
        EvaluationMetric(
            name="Guardrail False Positive Rate",
            description="% of safe outputs incorrectly blocked",
            target="< 2%",
            measurement="Benign test set",
            frequency="After guardrail changes"
        ),
        EvaluationMetric(
            name="Jailbreak Success Rate",
            description="% of jailbreak attempts that succeed",
            target="< 1%",
            measurement="Jailbreak prompt suite",
            frequency="Final evaluation"
        ),
    ],
    datasets=[
        {"name": "AWS FAQ Test Set", "source": "Curated from AWS forums", "size": "100"},
        {"name": "CLI Command Dataset", "source": "AWS CLI docs", "size": "500"},
        {"name": "Safety Red Team Set", "source": "PromptFoo + custom", "size": "200"},
    ],
    baselines=[
        "Raw Llama 3.3 70B (no fine-tuning, no RAG)",
        "GPT-4 with AWS docs in context",
        "AWS official documentation search",
    ]
)

option_a_eval.display()

---

## ‚ö†Ô∏è Common Planning Mistakes

### Mistake 1: Vague Architecture
```python
# ‚ùå Too vague
components = ["data stuff", "model", "api"]

# ‚úÖ Specific and actionable
components = [
    "DocumentParser: Extract text from PDFs using pypdf2",
    "ChunkingService: 512-token chunks with 50 overlap",
    "EmbeddingService: BGE-M3 for 1024-dim vectors",
    "VectorStore: FAISS-GPU IVF index",
]
```

### Mistake 2: No Memory Planning
```python
# ‚ùå "I'll just load everything"
models = ["llama-70b-fp16", "llava-34b-fp16"]  # 140+68 = 208 GB! üí•

# ‚úÖ Plan memory carefully
models = [
    ("llama-70b", "int4", 38),      # 38 GB
    ("bge-m3", "bf16", 1.2),        # 1.2 GB  
    ("overhead", "-", 15),          # 15 GB for KV cache
]  # Total: 54.2 GB - plenty of headroom!
```

### Mistake 3: Safety as Afterthought
```python
# ‚ùå "I'll add guardrails at the end"
week_6 = ["Add safety stuff", "Fix issues", "Demo"]

# ‚úÖ Safety built in from the start
week_1 = ["Plan safety architecture"]
week_4 = ["Implement guardrails", "Safety testing"]
week_5 = ["Red team evaluation", "Fix safety gaps"]
```

---

## üéâ Checkpoint

You've completed project planning! You should now have:

- ‚úÖ System architecture with all components defined
- ‚úÖ Build order based on dependencies
- ‚úÖ Memory plan for DGX Spark
- ‚úÖ API contracts (if applicable)
- ‚úÖ Safety plan üõ°Ô∏è
- ‚úÖ Evaluation plan with metrics and datasets

---

## üöÄ Next Steps

1. **Document your architecture** in `docs/architecture.md`

2. **Complete your project proposal** using `templates/project-proposal.md`

3. **Open your project-specific guide:**
   - Option A: `lab-4.6.2-option-a-ai-assistant.ipynb`
   - Option B: `lab-4.6.3-option-b-document-intelligence.ipynb`
   - Option C: `lab-4.6.4-option-c-agent-swarm.ipynb`
   - Option D: `lab-4.6.5-option-d-training-pipeline.ipynb`

4. **Start building!**

---

## üìñ Further Reading

- [Designing Machine Learning Systems](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/) by Chip Huyen
- [Building LLM Applications](https://huyenchip.com/2023/04/11/llm-engineering.html)
- [NeMo Guardrails Documentation](https://github.com/NVIDIA/NeMo-Guardrails)

In [None]:
# üßπ Cleanup
print("‚úÖ No cleanup needed - your architecture plans are saved!")
print("\nüìù Next: Open your project-specific guide and start implementing.")