# Lab 4.6.0: Capstone Project Kickoff

**Module:** 4.6 - Capstone Project (Domain 4: Production AI)
**Time:** 2-3 hours
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê

---

## üéâ Congratulations on Reaching the Capstone!

You've completed an incredible journey through the DGX Spark AI Curriculum. From understanding the fundamentals of neural networks to fine-tuning 70B parameter models, from building RAG systems to deploying production APIs with safety guardrails - you've acquired a remarkable set of skills.

**Now it's time to put it all together.**

This capstone is your chance to build something substantial - a portfolio piece that demonstrates your mastery of modern AI engineering on cutting-edge hardware.

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Understand the four capstone project options
- [ ] Evaluate which project best matches your interests and goals
- [ ] Verify your DGX Spark environment is ready
- [ ] Complete your project selection
- [ ] Create your project timeline

---

## üìö Prerequisites

- Completed: All modules in Domains 1-4
- Knowledge of: LLM fine-tuning, RAG systems, agents, deployment, AI safety
- Access to: DGX Spark with 128GB unified memory

---

## üåç Real-World Context

The capstone project mirrors what AI engineers do in industry every day: identify a problem, design a solution, implement it with production-quality code, ensure it's safe, and evaluate its effectiveness.

Companies like OpenAI, Anthropic, Google, and Meta all follow similar processes when building AI products:

1. **Problem Definition** ‚Üí What are we solving?
2. **Architecture Design** ‚Üí How will we solve it?
3. **Implementation** ‚Üí Build the solution
4. **Safety Evaluation** ‚Üí Is it safe to deploy? üõ°Ô∏è
5. **Optimization** ‚Üí Make it fast and efficient
6. **Documentation** ‚Üí Can others use and extend it?

Your capstone follows this exact pattern, preparing you for real-world AI engineering roles.

### What Makes This Special: DGX Spark

You have access to hardware that enables things impossible on consumer GPUs:

| Capability | Consumer GPU (24GB) | DGX Spark (128GB) | Advantage |
|------------|--------------------|--------------------|----------|
| Max model size (FP16) | ~12B | **~55B** | 4.5x larger |
| Max model (INT4) | ~24B | **~120B** | 5x larger |
| Fine-tune (QLoRA) | ~13B | **~100B** | 8x larger |
| NVFP4 (Blackwell) | ‚ùå | ‚úÖ **~200B** | Exclusive! |

Your capstone should showcase what's possible with this unique hardware.

---

## üßí ELI5: What is a Capstone Project?

> **Imagine you've been learning to cook for months.** You've mastered chopping vegetables, making sauces, baking bread, grilling meat, and plating dishes. Each skill was practiced in isolation.
>
> **Now, you're going to prepare a complete dinner party.** You need to:
> - Plan a menu that works together
> - Prep all the ingredients
> - Cook multiple dishes that complement each other
> - Time everything so it's ready together
> - Present it beautifully
> - Make sure nobody gets food poisoning! üõ°Ô∏è
>
> **That's a capstone.** It's not about learning one new thing - it's about combining everything you've learned into one impressive, complete creation.
>
> **In AI terms:** You've learned fine-tuning, RAG, agents, deployment, safety, and more. Your capstone combines these into a complete, working AI system that solves a real problem - safely.

---

## Part 1: Environment Verification

Before choosing your project, let's verify your DGX Spark environment is properly configured. Your capstone will push the hardware to its limits!

In [None]:
# Capstone Environment Verification
# This cell verifies your DGX Spark is ready for capstone development

import sys
import os
from datetime import datetime

print("="*70)
print("üöÄ DGX SPARK CAPSTONE ENVIRONMENT CHECK")
print(f"üìÖ Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*70)

# Python version
print(f"\nüêç Python Version: {sys.version.split()[0]}")

# Check critical packages
packages_status = []
critical_packages = [
    ("torch", "PyTorch", True),
    ("transformers", "Transformers", True),
    ("peft", "PEFT (LoRA/QLoRA)", True),
    ("bitsandbytes", "BitsAndBytes (Quantization)", True),
    ("sentence_transformers", "Sentence Transformers", True),
    ("langchain", "LangChain", False),
    ("langgraph", "LangGraph", False),
    ("fastapi", "FastAPI", True),
    ("gradio", "Gradio", True),
    ("faiss", "FAISS (Vector Search)", False),
    ("nemo_guardrails", "NeMo Guardrails", False),
]

print("\nüì¶ Package Status:")
for pkg_name, display_name, critical in critical_packages:
    try:
        module = __import__(pkg_name.replace('-', '_'))
        version = getattr(module, '__version__', 'installed')
        print(f"  ‚úÖ {display_name}: {version}")
        packages_status.append(True)
    except ImportError:
        icon = "‚ùå" if critical else "‚ö†Ô∏è"
        status = "REQUIRED" if critical else "recommended"
        print(f"  {icon} {display_name}: NOT INSTALLED ({status})")
        packages_status.append(False)

In [None]:
# GPU and Memory Check
import torch

print("\nüéÆ GPU Status:")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"  ‚úÖ GPU: {gpu_name}")
    print(f"  ‚úÖ GPU Memory: {gpu_memory:.1f} GB")
    
    # Check for Blackwell features
    compute_capability = torch.cuda.get_device_capability(0)
    cc_str = f"{compute_capability[0]}.{compute_capability[1]}"
    print(f"  ‚úÖ Compute Capability: {cc_str}")
    
    # Check if Blackwell (CC 10.x expected)
    if compute_capability[0] >= 10:
        print(f"  üåü Blackwell architecture detected! NVFP4 available.")
    
    # Memory allocation test
    print("\nüíæ Memory Status:")
    print(f"  Current allocation: {torch.cuda.memory_allocated()/1e9:.2f} GB")
    print(f"  Current reserved: {torch.cuda.memory_reserved()/1e9:.2f} GB")
    print(f"  Available for models: ~{gpu_memory - 5:.1f} GB (with 5GB system reserve)")
else:
    print("  ‚ùå CUDA not available!")
    print("  Make sure you're running in an NGC container with GPU access.")

# System memory (unified memory detection)
try:
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            if 'MemTotal' in line:
                mem_gb = int(line.split()[1]) / 1e6
                print(f"\nüñ•Ô∏è System Memory: {mem_gb:.1f} GB")
                if mem_gb > 100:
                    print("  ‚úÖ 128GB unified memory configuration detected!")
                    print("  ‚úÖ No CPU‚ÜîGPU transfers needed - massive advantage!")
                break
except:
    print("\nüñ•Ô∏è Could not read system memory info")

In [None]:
# Disk Space and Cache Check
import shutil

print("\nüíø Disk Space:")
paths_to_check = [
    ("/workspace", "Workspace"),
    (os.path.expanduser("~/.cache/huggingface"), "HuggingFace Cache"),
]

for path, name in paths_to_check:
    if os.path.exists(path):
        total, used, free = shutil.disk_usage(path)
        status = "‚úÖ" if free/1e9 > 100 else "‚ö†Ô∏è" if free/1e9 > 50 else "‚ùå"
        print(f"  {name} ({path}):")
        print(f"    Total: {total/1e9:.1f} GB")
        print(f"    Used: {used/1e9:.1f} GB")
        print(f"    Free: {free/1e9:.1f} GB {status}")
    else:
        print(f"  ‚ö†Ô∏è {name}: Path not found")

# Capstone-specific capacity estimates
print("\nüìä Capstone Model Capacity (with your 128GB):")
print("  ‚Ä¢ Llama 3.3 70B (INT4): ~35GB ‚Üí ‚úÖ FITS with 93GB headroom")
print("  ‚Ä¢ Llama 3.3 70B (QLoRA training): ~50GB ‚Üí ‚úÖ FITS")
print("  ‚Ä¢ Qwen2.5 72B + Embedding model: ~40GB ‚Üí ‚úÖ FITS")
print("  ‚Ä¢ Multi-agent: 3√ó 8B models: ~15GB ‚Üí ‚úÖ FITS easily")

print("\n" + "="*70)
print("Environment check complete!")
print("="*70)

### üîç What Just Happened?

We verified:
1. **Python Environment** - All required packages are installed
2. **GPU Access** - Blackwell GPU is available with sufficient memory
3. **Unified Memory** - 128GB configuration is active
4. **Disk Space** - Enough space for models and data

**If any checks failed**, resolve them before proceeding. You'll need full access to DGX Spark capabilities for your capstone.

**Common fixes:**
- Missing packages: `pip install langchain langgraph faiss-gpu nemo-guardrails`
- Low disk space: Clean HuggingFace cache: `huggingface-cli cache clean`
- No GPU: Ensure you're in an NGC container with `--gpus all`

---

## Part 2: Project Options Overview

You have four project options, each emphasizing different skills. All are designed to showcase DGX Spark's unique capabilities and include safety considerations.

### Quick Comparison

| Option | Focus | Model Size | Key Skills | Safety Component |
|--------|-------|------------|------------|-----------------|
| **A** | AI Assistant | 70B | Fine-tuning, RAG, Tools | NeMo Guardrails |
| **B** | Document Intelligence | 34B VLM | Vision, OCR, Extraction | Content filtering |
| **C** | Agent Swarm | Multi-model | Agents, Planning, Coordination | Human-in-the-loop |
| **D** | Training Pipeline | 70B | SFT, DPO, MLOps | Red teaming eval |

In [None]:
# Detailed Project Options

project_options = {
    "A": {
        "name": "Domain-Specific AI Assistant",
        "tagline": "Build a complete AI assistant specialized for a domain of your choice",
        "components": [
            "Fine-tuned LLM (70B with QLoRA)",
            "RAG with domain knowledge base",
            "Custom tools and API integrations",
            "NeMo Guardrails for safety",
            "FastAPI with streaming",
            "Gradio demo interface",
        ],
        "dgx_advantage": "70B models fit entirely in memory - no offloading needed",
        "example_domains": ["DevOps/AWS", "Financial Analysis", "Code Review", "Medical Literature", "Legal Documents"],
        "best_for": "Those interested in conversational AI, LLM customization, and practical applications",
        "skills_used": ["Module 3.1 (Fine-tuning)", "Module 3.5 (RAG)", "Module 3.6 (Agents)", "Module 4.2 (Safety)"],
        "hours": "35-45",
    },
    "B": {
        "name": "Multimodal Document Intelligence",
        "tagline": "Build a system that processes and understands complex documents",
        "components": [
            "Document ingestion (PDF, images, diagrams)",
            "Vision-Language Model (LLaVA/Qwen-VL)",
            "Structured information extraction",
            "Multimodal RAG",
            "Export to JSON/CSV",
            "Interactive demo",
        ],
        "dgx_advantage": "34B VLMs with high-res image processing fit easily",
        "example_domains": ["Invoice Processing", "Research Paper Analysis", "Technical Manual QA", "Contract Review"],
        "best_for": "Those interested in computer vision, document processing, and multimodal AI",
        "skills_used": ["Module 2.2 (Vision)", "Module 4.1 (Multimodal)", "Module 3.5 (RAG)"],
        "hours": "35-45",
    },
    "C": {
        "name": "AI Agent Swarm with Safety",
        "tagline": "Build a multi-agent system where specialized agents collaborate safely",
        "components": [
            "4+ specialized agents",
            "Central coordinator/orchestrator",
            "Tool registry and execution",
            "Shared + individual memory",
            "Human-in-the-loop approval",
            "Safety guardrails on actions",
        ],
        "dgx_advantage": "Multiple smaller models can run concurrently in memory",
        "example_domains": ["Research Team", "Software Dev Team", "Data Analysis Pipeline", "Content Creation"],
        "best_for": "Those interested in agentic AI, planning systems, and complex automation",
        "skills_used": ["Module 3.6 (Agents)", "Module 3.4 (TTC)", "Module 4.2 (Safety)"],
        "hours": "35-45",
    },
    "D": {
        "name": "Custom Training Pipeline",
        "tagline": "Build infrastructure for continuous model improvement",
        "components": [
            "Data collection and curation",
            "SFT + DPO/ORPO training",
            "Automated evaluation",
            "Model versioning (MLflow)",
            "A/B testing framework",
            "Red teaming evaluation",
        ],
        "dgx_advantage": "Full fine-tuning of 16B models possible, QLoRA for 100B+",
        "example_domains": ["Domain Adaptation", "Preference Learning", "Distillation Pipeline", "Continual Learning"],
        "best_for": "Those interested in MLOps, training infrastructure, and model development",
        "skills_used": ["Module 3.1 (Fine-tuning)", "Module 4.3 (MLOps)", "Module 3.2 (Quantization)"],
        "hours": "35-45",
    },
}

# Display options
for key, option in project_options.items():
    print(f"\n{'='*70}")
    print(f"üìå OPTION {key}: {option['name']}")
    print(f"{'='*70}")
    print(f"\n\"{option['tagline']}\"")
    print(f"\n‚è±Ô∏è Estimated time: {option['hours']} hours")
    print(f"\nüèóÔ∏è Components you'll build:")
    for comp in option['components']:
        print(f"   ‚Ä¢ {comp}")
    print(f"\nüöÄ DGX Spark Advantage: {option['dgx_advantage']}")
    print(f"\nüí° Example domains: {', '.join(option['example_domains'])}")
    print(f"\nüë§ Best for: {option['best_for']}")

---

## Part 3: Project Selection Decision Helper

Use this interactive tool to find your best project match based on your interests.

In [None]:
# Project Selection Helper

def project_selector():
    """
    Interactive project selection based on interests.
    Run this cell and answer the prompts!
    """
    
    print("üéØ CAPSTONE PROJECT SELECTOR")
    print("="*60)
    print("Rate your interest in each area (1-5)")
    print("1 = Not interested, 5 = Very interested\n")
    
    questions = {
        "A": [
            "Building chatbots and conversational AI",
            "Fine-tuning LLMs for specific domains",
            "Building RAG systems with knowledge bases",
            "Creating practical, deployable AI services",
        ],
        "B": [
            "Working with images and visual data",
            "Processing PDFs and documents",
            "Extracting structured data from unstructured sources",
            "Combining vision and language models",
        ],
        "C": [
            "Building autonomous AI agents",
            "Multi-step planning and reasoning",
            "Tool use and function calling",
            "Coordinating multiple AI systems safely",
        ],
        "D": [
            "Training and fine-tuning workflows",
            "Building ML infrastructure and pipelines",
            "Model evaluation and benchmarking",
            "Experiment tracking and versioning",
        ],
    }
    
    scores = {"A": 0, "B": 0, "C": 0, "D": 0}
    
    # Flatten and shuffle questions
    all_questions = []
    for option, q_list in questions.items():
        for q in q_list:
            all_questions.append((option, q))
    
    import random
    random.seed(42)  # Reproducible order
    random.shuffle(all_questions)
    
    for i, (option, question) in enumerate(all_questions, 1):
        while True:
            try:
                response = input(f"{i}. {question}: ")
                score = int(response)
                if 1 <= score <= 5:
                    scores[option] += score
                    break
                print("   Please enter 1-5")
            except ValueError:
                print("   Please enter a number 1-5")
            except KeyboardInterrupt:
                print("\n\nSelection cancelled.")
                return
    
    # Results
    print("\n" + "="*60)
    print("üìä YOUR RESULTS")
    print("="*60)
    
    option_names = {
        "A": "Domain-Specific AI Assistant",
        "B": "Multimodal Document Intelligence",
        "C": "AI Agent Swarm",
        "D": "Custom Training Pipeline",
    }
    
    max_possible = 20  # 4 questions √ó 5 max
    sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    
    medals = ["ü•á", "ü•à", "ü•â", "  "]
    
    for rank, (option, score) in enumerate(sorted_scores):
        pct = (score / max_possible) * 100
        bar = "‚ñà" * int(pct / 5) + "‚ñë" * (20 - int(pct / 5))
        print(f"\n{medals[rank]} Option {option}: {option_names[option]}")
        print(f"   {bar} {pct:.0f}% ({score}/{max_possible})")
    
    winner = sorted_scores[0][0]
    print("\n" + "="*60)
    print(f"üéØ RECOMMENDED: Option {winner} - {option_names[winner]}")
    print("="*60)
    
    return winner

# Uncomment to run interactively:
# recommended = project_selector()

print("üí° To use the interactive selector, uncomment the last line and run this cell.")
print("   Or choose directly using the quick selection below.")

In [None]:
# Quick Project Selection - Set your choice here!

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# üéØ SELECT YOUR PROJECT OPTION (Uncomment ONE line)
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# SELECTED_PROJECT = "A"  # Domain-Specific AI Assistant
# SELECTED_PROJECT = "B"  # Multimodal Document Intelligence  
# SELECTED_PROJECT = "C"  # AI Agent Swarm
# SELECTED_PROJECT = "D"  # Custom Training Pipeline

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

try:
    project = project_options[SELECTED_PROJECT]
    print(f"\n‚úÖ You've selected: Option {SELECTED_PROJECT} - {project['name']}")
    print(f"\nüìì Next notebook: lab-4.6.2-option-{SELECTED_PROJECT.lower()}-*.ipynb")
except NameError:
    print("‚ö†Ô∏è No project selected yet!")
    print("\nUncomment one of the SELECTED_PROJECT lines above and run this cell again.")
    print("\nOptions:")
    print("  A: Domain-Specific AI Assistant (Fine-tuning + RAG + Tools)")
    print("  B: Multimodal Document Intelligence (Vision + OCR + Extraction)")
    print("  C: AI Agent Swarm (Multi-agent + Planning + Safety)")
    print("  D: Custom Training Pipeline (SFT + DPO + MLOps)")

---

## Part 4: Skills Mapping

Each project builds on skills from previous modules. Let's see which skills you'll apply.

In [None]:
# Skills Matrix

skills_matrix = {
    "A": {
        "required": [
            ("Module 3.1: LLM Fine-tuning", "QLoRA for 70B models"),
            ("Module 3.5: RAG Systems", "Vector databases, retrieval"),
            ("Module 3.3: Deployment", "FastAPI, streaming"),
            ("Module 4.2: AI Safety", "NeMo Guardrails"),
        ],
        "helpful": [
            "Module 3.2: Quantization",
            "Module 2.5: HuggingFace",
            "Module 4.5: Demo Building",
        ]
    },
    "B": {
        "required": [
            ("Module 2.2: Computer Vision", "Image processing"),
            ("Module 4.1: Multimodal", "Vision-Language models"),
            ("Module 3.5: RAG Systems", "Multimodal retrieval"),
        ],
        "helpful": [
            "Module 2.3: NLP & Transformers",
            "Module 3.3: Deployment",
            "Module 4.5: Demo Building",
        ]
    },
    "C": {
        "required": [
            ("Module 3.6: AI Agents", "Agent frameworks, tools"),
            ("Module 3.4: Test-Time Compute", "Reasoning chains"),
            ("Module 4.2: AI Safety", "Human-in-the-loop, guardrails"),
        ],
        "helpful": [
            "Module 3.1: LLM Fine-tuning",
            "Module 3.5: RAG Systems",
            "Module 4.1: Multimodal",
        ]
    },
    "D": {
        "required": [
            ("Module 3.1: LLM Fine-tuning", "SFT, DPO, ORPO"),
            ("Module 4.3: MLOps", "Experiment tracking, versioning"),
            ("Module 2.5: HuggingFace", "Trainer API, datasets"),
        ],
        "helpful": [
            "Module 3.2: Quantization",
            "Module 3.3: Deployment",
            "Module 4.2: AI Safety",
        ]
    },
}

def show_skills(option):
    """Display required and helpful skills for a project."""
    skills = skills_matrix[option]
    name = project_options[option]['name']
    
    print(f"\nüìö SKILLS FOR OPTION {option}: {name}")
    print("="*60)
    
    print("\n‚úÖ Required Skills (must be comfortable with these):")
    for module, skill in skills["required"]:
        print(f"   ‚Ä¢ {module}")
        print(f"     Key: {skill}")
    
    print("\nüìò Helpful Background (nice to have):")
    for module in skills["helpful"]:
        print(f"   ‚Ä¢ {module}")

# Show all options
for opt in ["A", "B", "C", "D"]:
    show_skills(opt)

---

## Part 5: Timeline Planning

Your capstone spans 6 weeks. Here's how to structure your time effectively.

In [None]:
# Timeline Generator
from datetime import datetime, timedelta

def generate_timeline(start_date=None):
    """Generate a 6-week capstone timeline."""
    
    if start_date:
        start = datetime.strptime(start_date, "%Y-%m-%d")
    else:
        start = datetime.now()
    
    weeks = [
        {
            "week": 1,
            "name": "Planning & Setup",
            "hours": "6-8",
            "tasks": [
                "Complete project proposal (use template)",
                "Design system architecture",
                "Set up development environment",
                "Create git repository with structure",
                "Identify and download base models",
            ],
            "deliverable": "Approved proposal + Architecture diagram"
        },
        {
            "week": 2,
            "name": "Foundation (Part 1)",
            "hours": "8-10",
            "tasks": [
                "Implement core component #1",
                "Set up data pipeline",
                "Create initial tests",
                "Document as you build",
            ],
            "deliverable": "Working prototype of primary component"
        },
        {
            "week": 3,
            "name": "Foundation (Part 2)",
            "hours": "8-10",
            "tasks": [
                "Implement core component #2",
                "Model training/fine-tuning",
                "Basic integration tests",
                "Establish performance baseline",
            ],
            "deliverable": "All core components working independently"
        },
        {
            "week": 4,
            "name": "Integration",
            "hours": "8-10",
            "tasks": [
                "Connect all components end-to-end",
                "Build API layer",
                "Add safety guardrails üõ°Ô∏è",
                "End-to-end testing",
            ],
            "deliverable": "Complete integrated system with safety"
        },
        {
            "week": 5,
            "name": "Optimization & Evaluation",
            "hours": "6-8",
            "tasks": [
                "Performance profiling",
                "Memory optimization",
                "Run evaluation suite",
                "Red teaming / safety testing üõ°Ô∏è",
            ],
            "deliverable": "Optimized system with benchmark results"
        },
        {
            "week": 6,
            "name": "Documentation & Demo",
            "hours": "6-8",
            "tasks": [
                "Complete technical report",
                "Create model card with safety info",
                "Build Gradio demo",
                "Record demo video (5-10 min)",
                "Final code cleanup",
            ],
            "deliverable": "All deliverables complete!"
        },
    ]
    
    print("\nüìÖ YOUR CAPSTONE TIMELINE")
    print("="*70)
    
    total_hours = 0
    for week_info in weeks:
        week_start = start + timedelta(weeks=week_info["week"]-1)
        week_end = week_start + timedelta(days=6)
        
        print(f"\nüìå Week {week_info['week']}: {week_info['name']}")
        print(f"   {week_start.strftime('%b %d')} - {week_end.strftime('%b %d')}")
        print(f"   ‚è±Ô∏è Estimated: {week_info['hours']} hours")
        
        print("\n   Tasks:")
        for task in week_info["tasks"]:
            print(f"   [ ] {task}")
        
        print(f"\n   üì¶ Deliverable: {week_info['deliverable']}")
        
        # Parse hours for total
        hours_range = week_info["hours"].split("-")
        total_hours += (int(hours_range[0]) + int(hours_range[1])) / 2
    
    final_date = start + timedelta(weeks=6)
    print("\n" + "="*70)
    print(f"üéØ Target Completion: {final_date.strftime('%B %d, %Y')}")
    print(f"‚è±Ô∏è Total Estimated: {total_hours:.0f} hours")
    print("="*70)

# Generate timeline starting today
generate_timeline()

---

## Part 6: Create Your Project Structure

Let's create a well-organized project folder.

In [None]:
# Project Structure Creator
from pathlib import Path

def create_project(project_name: str, option: str, base_path: str = "/workspace"):
    """
    Create a complete project structure for your capstone.
    
    Args:
        project_name: Name of your project (e.g., "aws-assistant")
        option: Project option (A, B, C, or D)
        base_path: Where to create the project
    """
    
    structures = {
        "A": [  # AI Assistant
            "src/models", "src/rag", "src/tools", "src/api", "src/safety",
            "data/raw", "data/processed", "data/knowledge_base",
            "training/configs", "training/outputs",
            "evaluation/benchmarks", "evaluation/results",
            "notebooks", "tests", "docs", "demo",
        ],
        "B": [  # Document Intelligence
            "src/ingestion", "src/vision", "src/extraction", "src/qa", "src/export",
            "data/documents", "data/processed", "data/outputs",
            "models", "evaluation", "notebooks", "tests", "docs", "demo",
        ],
        "C": [  # Agent Swarm
            "src/agents", "src/coordinator", "src/tools", "src/memory", "src/safety",
            "workflows", "evaluation", "notebooks", "tests", "docs", "demo",
        ],
        "D": [  # Training Pipeline
            "src/data", "src/training", "src/evaluation", "src/serving",
            "configs", "data/raw", "data/processed",
            "experiments", "models/checkpoints", "models/exported",
            "notebooks", "tests", "docs",
        ],
    }
    
    dirs = structures.get(option.upper())
    if not dirs:
        print(f"‚ùå Invalid option: {option}")
        return
    
    project_path = Path(base_path) / project_name
    
    print(f"\nüèóÔ∏è Creating project: {project_name}")
    print(f"   Option: {option}")
    print(f"   Location: {project_path}")
    print("="*60)
    
    # Create directories
    for dir_path in dirs:
        full_path = project_path / dir_path
        full_path.mkdir(parents=True, exist_ok=True)
        (full_path / ".gitkeep").touch()
        print(f"  üìÅ {dir_path}/")
    
    # Create common files
    files = {
        "README.md": f"""# {project_name}

Capstone Project - Option {option}: {project_options[option]['name']}

## Overview

[Describe your project here]

## Quick Start

```bash
# Install dependencies
pip install -r requirements.txt

# Run the demo
python demo/app.py
```

## Project Structure

```
{project_name}/
‚îú‚îÄ‚îÄ src/          # Source code
‚îú‚îÄ‚îÄ data/         # Data files
‚îú‚îÄ‚îÄ notebooks/    # Jupyter notebooks
‚îú‚îÄ‚îÄ tests/        # Test files
‚îú‚îÄ‚îÄ docs/         # Documentation
‚îî‚îÄ‚îÄ demo/         # Demo application
```

## DGX Spark Optimization

This project is optimized for DGX Spark with 128GB unified memory.

## License

MIT
""",
        "requirements.txt": """# Core
torch>=2.5.0
transformers>=4.46.0
accelerate>=1.0.0

# Fine-tuning
peft>=0.13.0
bitsandbytes>=0.44.0
trl>=0.12.0

# RAG
sentence-transformers>=3.0.0
faiss-gpu>=1.7.0
chromadb>=0.5.0

# API & Demo
fastapi>=0.115.0
uvicorn>=0.32.0
gradio>=5.0.0

# Safety
nemoguardrails>=0.10.0

# Utils
python-dotenv>=1.0.0
pydantic>=2.0.0
tqdm>=4.66.0
""",
        ".gitignore": """# Python
__pycache__/
*.pyc
*.pyo
.ipynb_checkpoints/

# Environment
.env
*.env
venv/

# Data (don't commit large files)
data/raw/*
!data/raw/.gitkeep
*.parquet
*.csv

# Models (never commit model weights)
*.bin
*.safetensors
*.gguf
models/checkpoints/*
!models/checkpoints/.gitkeep

# Logs
*.log
logs/
wandb/
mlruns/

# IDE
.vscode/
.idea/
*.swp
""",
    }
    
    for filename, content in files.items():
        file_path = project_path / filename
        file_path.write_text(content)
        print(f"  üìÑ {filename}")
    
    # Create __init__.py files
    for dir_path in dirs:
        if dir_path.startswith("src/"):
            init_path = project_path / dir_path / "__init__.py"
            init_path.touch()
    
    print("\n" + "="*60)
    print(f"‚úÖ Project created at: {project_path}")
    print("\nNext steps:")
    print(f"  1. cd {project_path}")
    print("  2. git init")
    print("  3. Open the project proposal template")
    print("  4. Start building!")
    
    return project_path

# Example - uncomment to create your project:
# create_project("my-aws-assistant", "A")

print("üí° Uncomment the create_project() call above to create your project structure!")
print("   Example: create_project('my-aws-assistant', 'A')")

---

## ‚ö†Ô∏è Common Mistakes to Avoid

### Mistake 1: Scope Creep
```python
# ‚ùå Too ambitious
project_goals = [
    "Fine-tune 70B model",
    "Build RAG with 1M documents",
    "Add multimodal support",
    "Create mobile app",
    "Deploy to Kubernetes",
    "Build training pipeline",
]

# ‚úÖ Focused and achievable
project_goals = [
    "Fine-tune 70B model for AWS CLI help",
    "Build RAG with 1000 AWS doc pages",
    "Create FastAPI endpoint with streaming",
]
stretch_goals = ["Gradio UI", "Guardrails"]
```
**Why:** A complete, polished project is better than an ambitious, unfinished one.

---

### Mistake 2: Waiting Until Week 5 to Test
```python
# ‚ùå Testing at the end
week_5_plan = ["Integration testing", "Fix all bugs", "Add safety"]

# ‚úÖ Test continuously
every_week = [
    "Unit tests for new code",
    "Integration check",
    "Quick safety audit",
]
```
**Why:** Finding bugs early is 10x cheaper than finding them late.

---

### Mistake 3: "I'll Document Later"
```python
# ‚ùå Undocumented code
def proc(q, ctx, opts):
    ...

# ‚úÖ Documented as you write
def process_query(
    query: str,
    context: list[Document],
    options: ProcessingOptions
) -> QueryResult:
    """
    Process a user query using RAG.
    
    Args:
        query: User's natural language question
        context: Retrieved documents for context
        options: Processing configuration
        
    Returns:
        QueryResult with answer and sources
    """
```
**Why:** You WILL forget why you did things. Future you will thank present you.

---

## üéâ Checkpoint

You've completed the capstone kickoff! You should now have:

- ‚úÖ Verified your DGX Spark environment is ready
- ‚úÖ Understood all four project options
- ‚úÖ Selected your project (or know which one you're leaning toward)
- ‚úÖ Understood the 6-week timeline
- ‚úÖ (Optional) Created your project structure

---

## üöÄ Next Steps

1. **Complete your project proposal** using `templates/project-proposal.md`

2. **Open the planning notebook:** `lab-4.6.1-project-planning.ipynb`

3. **Then open your project-specific guide:**
   - Option A: `lab-4.6.2-option-a-ai-assistant.ipynb`
   - Option B: `lab-4.6.3-option-b-document-intelligence.ipynb`
   - Option C: `lab-4.6.4-option-c-agent-swarm.ipynb`
   - Option D: `lab-4.6.5-option-d-training-pipeline.ipynb`

4. **Don't forget the shared notebooks:**
   - `lab-4.6.6-evaluation-framework.ipynb` - How to evaluate your project
   - `lab-4.6.7-documentation-guide.ipynb` - How to document your work

---

## üìñ Resources

- [DGX Spark Playbooks](https://build.nvidia.com/spark) - Official NVIDIA examples
- [Hugging Face Hub](https://huggingface.co/) - Models and datasets
- [Papers With Code](https://paperswithcode.com/) - Research and benchmarks
- [LangChain Documentation](https://python.langchain.com/) - Agent frameworks
- [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) - AI Safety

In [None]:
# üßπ Cleanup
print("‚úÖ No cleanup needed - ready to proceed!")
print("\nüéØ Next: Open lab-4.6.1-project-planning.ipynb to design your architecture.")