# Day 2, Session 5 Lab: Deploy Invoice Agent as Production API

## 🎯 Learning Objectives

By the end of this 45-minute hands-on lab, you will:

1. **Transform LangGraph agents into production APIs** using FastAPI
2. **Implement real-time streaming** with Server-Sent Events (SSE) 
3. **Add enterprise security** with authentication and rate limiting
4. **Create monitoring endpoints** for health checks and metrics
5. **Containerize with Docker** for consistent deployment
6. **Deploy to cloud platforms** using Hugging Face Spaces

## 🏗️ What We're Building

A **production-ready Invoice Processing API** that:
- ✅ Accepts file uploads and processing requests
- ✅ Streams results as they're generated (no waiting!)
- ✅ Authenticates users and prevents abuse
- ✅ Runs anywhere with Docker containers
- ✅ Deploys to free cloud hosting

## 🚀 Why This Matters

**Real-world scenario**: Your company's procurement team needs to process hundreds of invoices daily. They need:
- **Fast processing** - stream results as they come
- **Secure access** - only authorized users can process invoices  
- **Reliable deployment** - works consistently across environments
- **Scalable architecture** - handles multiple users simultaneously

This lab teaches you to **ship AI agents to production** - the most valuable skill for AI engineers!

## ⏰ Time Allocation
- **Setup & Core API** (8 minutes)
- **Authentication & Security** (7 minutes) 
- **File Upload Handling** (8 minutes)
- **Streaming Implementation** (10 minutes)
- **Health & Monitoring** (5 minutes)
- **Containerization** (5 minutes)
- **Deployment** (7 minutes)

## 📋 Prerequisites Checklist
- [x] Completed Day 2 Sessions 1-4 labs
- [x] Understanding of FastAPI basics
- [x] Familiarity with LangGraph agents
- [x] Docker basics (for deployment)

---

**Ready to ship your first AI agent to production?** Let's build something amazing! 🎉

In [None]:
# Global configuration - Instructor will fill these
OLLAMA_URL = "http://XX.XX.XX.XX"  # Course server IP (port 80)
API_TOKEN = "YOUR_TOKEN_HERE"      # Instructor provides token
MODEL = "qwen3:8b"                  # Default model on server

# Load API keys from environment variables (NEVER hardcode in notebooks!)
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Voice API Keys (loaded securely from environment)
CARTESIA_API_KEY = os.getenv("CARTESIA_API_KEY")  # TTS
DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")  # STT

# Verify keys are loaded
if not CARTESIA_API_KEY:
    print("⚠️  Warning: CARTESIA_API_KEY not found in environment variables")
if not DEEPGRAM_API_KEY:
    print("⚠️  Warning: DEEPGRAM_API_KEY not found in environment variables")

print("✅ Configuration loaded from environment variables")

## Task 1: Setup Environment and Core API (8 minutes)

### 🎯 Goal
Create the foundational FastAPI application with proper configuration, error handling, and CORS setup for web clients.

### 💡 What You'll Learn
- How to structure a production FastAPI application
- Why CORS is essential for web-based clients
- How to manage application state and lifecycle events
- Best practices for API documentation

### 📝 Implementation Guide

**FastAPI** is perfect for AI APIs because it provides:
- **Automatic documentation** (Swagger UI)
- **Type validation** with Pydantic models
- **Async support** for concurrent requests
- **Easy deployment** with Docker/cloud platforms

**CORS (Cross-Origin Resource Sharing)** allows your API to be called from web browsers. Without it, browser security blocks requests from different domains.

In [None]:
# Install required packages
!pip install fastapi uvicorn python-multipart sse-starlette redis pydantic python-dotenv

In [None]:
# Import all required libraries
from fastapi import FastAPI, File, UploadFile, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.responses import StreamingResponse, JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from sse_starlette.sse import EventSourceResponse
from pydantic import BaseModel
import asyncio
from typing import Optional, Dict, List
import uuid
from datetime import datetime
import json
import time
from functools import wraps
import requests

In [None]:
# TODO: Initialize FastAPI app with metadata
# HINT: Use title, description, version parameters for good documentation
app = FastAPI(
    title="Invoice Processing Agent API",
    description="Multimodal agent for invoice extraction and processing with voice support",
    version="1.0.0"
)

# TODO: Add CORS middleware for web access
# HINT: Configure origins, credentials, methods, and headers
# NOTE: In production, restrict origins to your domain!
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # TODO: Replace with your domain in production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

print("✅ FastAPI app initialized with CORS support")

In [None]:
# TODO: Create global state management class
# HINT: This will store our LangGraph agent, Redis connection, and active jobs
class AgentState:
    def __init__(self):
        self.langgraph_app = None  # Will be initialized
        self.redis_client = None   # For caching and sessions
        self.processing_jobs = {}  # In-memory job tracking
        self.startup_time = datetime.now()

# Create global state instance
agent_state = AgentState()

# TODO: Add startup event handler
# HINT: This runs when the API starts - perfect for loading models
@app.on_event("startup")
async def startup_event():
    """Initialize agent and connections on startup"""
    print("🚀 Starting Invoice Agent API...")
    
    # TODO: Initialize LangGraph agent here
    # agent_state.langgraph_app = load_invoice_agent()
    
    # TODO: Connect to Redis for caching (optional)
    # agent_state.redis_client = redis.Redis(host='localhost', port=6379)
    
    # TODO: Preload any required models
    # Example: Load OCR models, embedding models, etc.
    
    print("✅ Invoice Agent API started successfully")

# TODO: Add shutdown event handler
@app.on_event("shutdown")
async def shutdown_event():
    """Cleanup on shutdown"""
    print("👋 Shutting down Invoice Agent API...")
    
    # TODO: Close connections gracefully
    # TODO: Save any pending state if needed
    
    print("✅ Shutdown complete")

## Task 2: Implement Authentication & Rate Limiting (7 minutes)

### 🎯 Goal
Secure your API with token-based authentication and prevent abuse with rate limiting.

### 💡 What You'll Learn
- How to implement Bearer token authentication in FastAPI
- Token bucket algorithm for rate limiting
- User tier management (free vs premium)
- Security best practices for production APIs

### 📝 Why Authentication Matters

**Without authentication**, anyone can:
- ❌ Overload your API with requests
- ❌ Process sensitive documents
- ❌ Run up your hosting costs
- ❌ Access your AI models for free

**With authentication**, you can:
- ✅ Track usage per user
- ✅ Implement pricing tiers
- ✅ Prevent abuse and spam
- ✅ Meet compliance requirements

In [None]:
# TODO: Set up HTTP Bearer token security
# HINT: This extracts the token from 'Authorization: Bearer <token>' header
security = HTTPBearer()

# TODO: Create mock user database
# NOTE: In production, use a real database with hashed tokens!
API_KEYS = {
    "demo-key-123": {
        "user": "demo_user",
        "tier": "free",
        "requests_remaining": 100,
        "requests_per_hour": 10
    },
    "premium-key-456": {
        "user": "premium_user",
        "tier": "premium",
        "requests_remaining": 1000,
        "requests_per_hour": 100
    },
    # TODO: Add your own test keys here
}

print(f"🔑 Configured {len(API_KEYS)} API keys")

In [None]:
# TODO: Implement token verification function
# HINT: This will be used as a FastAPI dependency
async def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    """
    Verify API key and check rate limits
    Returns user info if valid, raises HTTPException if invalid
    """
    token = credentials.credentials
    
    # TODO: Check if token exists
    if token not in API_KEYS:
        raise HTTPException(
            status_code=403, 
            detail="Invalid API key. Please check your authorization header."
        )
    
    user_info = API_KEYS[token]
    
    # TODO: Check rate limits
    if user_info["requests_remaining"] <= 0:
        raise HTTPException(
            status_code=429, 
            detail=f"Rate limit exceeded. Upgrade to premium for higher limits."
        )
    
    # TODO: Decrement request counter
    # NOTE: In production, this should be atomic and persistent
    user_info["requests_remaining"] -= 1
    
    return user_info

print("🔒 Token verification configured")

In [None]:
# TODO: Implement rate limiter using token bucket algorithm
# HINT: Token bucket allows burst traffic while maintaining average rate
class RateLimiter:
    """Token bucket rate limiter for controlling request frequency"""
    
    def __init__(self, rate: int = 10, per: int = 60):
        """
        Args:
            rate: Number of requests allowed
            per: Time period in seconds
        """
        self.rate = rate  # requests
        self.per = per    # seconds
        self.buckets = {}  # user_id -> bucket state
    
    def is_allowed(self, key: str) -> bool:
        """Check if request is allowed for given key"""
        now = time.time()
        
        # TODO: Initialize bucket if not exists
        if key not in self.buckets:
            self.buckets[key] = {"tokens": self.rate, "last": now}
            return True
        
        bucket = self.buckets[key]
        
        # TODO: Refill tokens based on time passed
        time_passed = now - bucket["last"]
        tokens_to_add = time_passed * (self.rate / self.per)
        bucket["tokens"] = min(self.rate, bucket["tokens"] + tokens_to_add)
        bucket["last"] = now
        
        # TODO: Check if tokens available
        if bucket["tokens"] >= 1:
            bucket["tokens"] -= 1
            return True
        
        return False

# Create global rate limiter: 10 requests per minute
rate_limiter = RateLimiter(rate=10, per=60)

print("⏱️ Rate limiter initialized (10 req/min)")

## Task 3: File Upload & Processing Endpoints (8 minutes)

### 🎯 Goal
Create endpoints to receive invoice images and initiate asynchronous processing jobs.

### 💡 What You'll Learn
- How to handle file uploads in FastAPI
- Pydantic models for request/response validation
- Asynchronous job processing patterns
- File validation and security considerations

### 📝 Why Async Processing?

**Synchronous processing** (bad):
- ❌ User waits 10-30 seconds for response
- ❌ Request timeouts if processing takes too long
- ❌ Server can't handle other requests meanwhile
- ❌ Poor user experience

**Asynchronous processing** (good):
- ✅ Immediate response with job ID
- ✅ User can check status or stream updates
- ✅ Server handles multiple requests concurrently
- ✅ Better scalability and user experience

In [None]:
# TODO: Define Pydantic models for request/response validation
# HINT: These ensure type safety and automatic API documentation

class ProcessingRequest(BaseModel):
    """Request model for invoice processing configuration"""
    instructions: Optional[str] = "Extract all invoice data including vendor, amounts, dates, and line items"
    output_format: str = "json"  # json, text, or both
    include_confidence: bool = True
    voice_response: bool = False  # Enable voice synthesis
    webhook_url: Optional[str] = None  # For notifications

class ProcessingResponse(BaseModel):
    """Response with job information"""
    job_id: str
    status: str  # queued, processing, completed, failed
    message: str
    estimated_time_seconds: int
    result_url: Optional[str] = None
    stream_url: Optional[str] = None

print("📝 Request/response models defined")

In [None]:
# TODO: Create main file upload endpoint
@app.post("/process/invoice", response_model=ProcessingResponse)
async def process_invoice(
    file: UploadFile = File(..., description="Invoice image file (PNG, JPG, PDF)"),
    request: ProcessingRequest = ProcessingRequest(),
    user_info: Dict = Depends(verify_token)
):
    """
    Upload invoice image for processing
    
    This endpoint:
    1. Validates the uploaded file
    2. Creates an async processing job
    3. Returns job ID for status tracking
    """
    
    # TODO: Validate file type
    # HINT: Check content_type to ensure it's an image or PDF
    allowed_types = ["image/png", "image/jpeg", "image/jpg", "application/pdf"]
    if file.content_type not in allowed_types:
        raise HTTPException(
            status_code=400, 
            detail=f"File type {file.content_type} not supported. Use PNG, JPG, or PDF."
        )
    
    # TODO: Check file size limit (10MB max)
    # HINT: Large files slow down processing and use more memory
    max_size = 10 * 1024 * 1024  # 10MB
    if file.size and file.size > max_size:
        raise HTTPException(
            status_code=400, 
            detail=f"File too large ({file.size/1024/1024:.1f}MB). Maximum size is 10MB."
        )
    
    # TODO: Check rate limit
    if not rate_limiter.is_allowed(user_info["user"]):
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded. Please wait before uploading another file."
        )
    
    # TODO: Generate unique job ID
    job_id = str(uuid.uuid4())
    
    # TODO: Read and store file content
    # HINT: In production, save to S3/GCS instead of memory
    file_content = await file.read()
    
    # TODO: Create processing job record
    job = {
        "id": job_id,
        "user": user_info["user"],
        "status": "queued",
        "created_at": datetime.now().isoformat(),
        "file_name": file.filename,
        "file_size": len(file_content),
        "file_type": file.content_type,
        "instructions": request.instructions,
        "output_format": request.output_format,
        "voice_response": request.voice_response,
        "progress": 0
    }
    
    # TODO: Store job in global state
    # NOTE: In production, use Redis or database
    agent_state.processing_jobs[job_id] = job
    
    # TODO: Start async processing task
    # HINT: Use asyncio.create_task for background processing
    asyncio.create_task(process_job_async(job_id, file_content, request))
    
    # TODO: Estimate processing time based on file size and user tier
    base_time = 15  # seconds
    size_factor = len(file_content) / (1024 * 1024)  # MB
    tier_factor = 0.5 if user_info["tier"] == "premium" else 1.0
    estimated_time = int(base_time + (size_factor * 5) * tier_factor)
    
    return ProcessingResponse(
        job_id=job_id,
        status="queued",
        message="Invoice processing started. Use the stream URL for real-time updates.",
        estimated_time_seconds=estimated_time,
        result_url=f"/results/{job_id}",
        stream_url=f"/stream/{job_id}"
    )

print("📤 File upload endpoint created")

In [None]:
# TODO: Implement async job processing function
async def process_job_async(job_id: str, image_data: bytes, request: ProcessingRequest):
    """
    Process invoice asynchronously in background
    
    This simulates the full LangGraph processing pipeline:
    1. Image preprocessing 
    2. OCR extraction
    3. LLM analysis
    4. Result formatting
    5. Optional voice synthesis
    """
    
    try:
        job = agent_state.processing_jobs[job_id]
        
        # TODO: Update status to processing
        job["status"] = "processing"
        job["started_at"] = datetime.now().isoformat()
        
        # TODO: Simulate processing steps with progress updates
        # HINT: In real implementation, call your LangGraph agent here
        
        # Step 1: Image preprocessing (10%)
        job["progress"] = 10
        job["current_step"] = "Preprocessing image..."
        await asyncio.sleep(1)  # Simulate processing time
        
        # Step 2: OCR extraction (30%)
        job["progress"] = 30
        job["current_step"] = "Extracting text with OCR..."
        await asyncio.sleep(2)
        
        # Step 3: LLM analysis (70%)
        job["progress"] = 70
        job["current_step"] = "Analyzing with LLM..."
        await asyncio.sleep(3)
        
        # Step 4: Result formatting (90%)
        job["progress"] = 90
        job["current_step"] = "Formatting results..."
        await asyncio.sleep(1)
        
        # TODO: Generate mock results
        # HINT: Replace with actual LangGraph agent call:
        # result = await agent_state.langgraph_app.ainvoke({
        #     "image_data": image_data,
        #     "instructions": request.instructions
        # })
        
        mock_result = {
            "vendor": "ABC Supplies Inc.",
            "invoice_number": "INV-2024-001",
            "date": "2024-01-15",
            "total_amount": 1250.00,
            "currency": "USD",
            "line_items": [
                {"description": "Office Supplies", "quantity": 10, "unit_price": 25.00, "total": 250.00},
                {"description": "Printing Paper", "quantity": 20, "unit_price": 50.00, "total": 1000.00}
            ],
            "confidence_scores": {
                "vendor": 0.95,
                "total_amount": 0.98,
                "date": 0.92
            }
        }
        
        # TODO: Add voice synthesis if requested
        voice_url = None
        if request.voice_response:
            job["current_step"] = "Generating voice response..."
            # TODO: Call Cartesia TTS API here
            # voice_url = await synthesize_voice_response(mock_result)
            await asyncio.sleep(2)  # Simulate TTS processing
            voice_url = f"/audio/{job_id}.mp3"  # Mock URL
        
        # TODO: Mark job as completed
        job["status"] = "completed"
        job["progress"] = 100
        job["current_step"] = "Complete!"
        job["completed_at"] = datetime.now().isoformat()
        job["result"] = mock_result
        job["voice_url"] = voice_url
        
        print(f"✅ Job {job_id} completed successfully")
        
    except Exception as e:
        # TODO: Handle errors gracefully
        job["status"] = "failed"
        job["error"] = str(e)
        job["failed_at"] = datetime.now().isoformat()
        print(f"❌ Job {job_id} failed: {e}")

print("⚙️ Async processing function defined")

## Task 4: Streaming Results with Server-Sent Events (10 minutes)

### 🎯 Goal
Implement real-time streaming of processing updates using Server-Sent Events (SSE) for better user experience.

### 💡 What You'll Learn
- How Server-Sent Events provide real-time updates
- EventSourceResponse for streaming in FastAPI
- Progress tracking and status updates
- Browser compatibility considerations

### 📝 Why Streaming Matters

**Without streaming** (polling approach):
- ❌ User repeatedly calls `/status/{job_id}` every few seconds
- ❌ Wastes bandwidth and server resources
- ❌ Delayed updates (only on poll intervals)
- ❌ Poor mobile battery life

**With streaming** (SSE approach):
- ✅ Server pushes updates as they happen
- ✅ Minimal bandwidth usage
- ✅ Instant notifications
- ✅ Better user experience

**SSE vs WebSockets:**
- SSE: Simple, unidirectional (server → client), works through proxies
- WebSockets: Complex, bidirectional, may have firewall issues

In [None]:
# TODO: Create streaming endpoint with Server-Sent Events
@app.get("/stream/{job_id}")
async def stream_results(
    job_id: str,
    user_info: Dict = Depends(verify_token)
):
    """
    Stream processing updates via Server-Sent Events
    
    Client usage:
    ```javascript
    const eventSource = new EventSource('/stream/job_123');
    eventSource.onmessage = (event) => {
        const data = JSON.parse(event.data);
        console.log('Progress:', data.progress);
    };
    ```
    """
    
    # TODO: Validate job exists
    if job_id not in agent_state.processing_jobs:
        raise HTTPException(status_code=404, detail="Job not found")
    
    job = agent_state.processing_jobs[job_id]
    
    # TODO: Verify job ownership for security
    if job["user"] != user_info["user"]:
        raise HTTPException(status_code=403, detail="Access denied to this job")
    
    async def event_generator():
        """
        Generate Server-Sent Events for job progress
        
        SSE format:
        event: progress
        data: {"progress": 50, "message": "Processing..."}
        
        """
        
        # TODO: Send initial status
        yield {
            "event": "status",
            "data": json.dumps({
                "job_id": job_id,
                "status": job["status"],
                "created_at": job["created_at"]
            })
        }
        
        last_progress = -1
        
        # TODO: Stream updates while job is active
        # HINT: Poll job state and send updates when changed
        while job["status"] in ["queued", "processing"]:
            await asyncio.sleep(0.5)  # Check every 500ms
            
            # Only send update if progress changed
            current_progress = job.get("progress", 0)
            if current_progress != last_progress:
                yield {
                    "event": "progress",
                    "data": json.dumps({
                        "progress": current_progress,
                        "message": job.get("current_step", "Processing..."),
                        "estimated_remaining": max(0, 15 - (current_progress * 0.15))
                    })
                }
                last_progress = current_progress
        
        # TODO: Send final result or error
        if job["status"] == "completed":
            yield {
                "event": "completed",
                "data": json.dumps({
                    "result": job["result"],
                    "voice_url": job.get("voice_url"),
                    "processing_time": calculate_processing_time(job)
                })
            }
        elif job["status"] == "failed":
            yield {
                "event": "error",
                "data": json.dumps({
                    "error": job.get("error", "Unknown error occurred"),
                    "failed_at": job.get("failed_at")
                })
            }
        
        # TODO: Send close event
        yield {
            "event": "close",
            "data": json.dumps({"message": "Stream ended"})
        }
    
    return EventSourceResponse(event_generator())

print("📡 Streaming endpoint created")

In [None]:
# TODO: Add helper function for processing time calculation
def calculate_processing_time(job: dict) -> float:
    """Calculate total processing time in seconds"""
    if "started_at" not in job or "completed_at" not in job:
        return 0.0
    
    start = datetime.fromisoformat(job["started_at"])
    end = datetime.fromisoformat(job["completed_at"])
    return (end - start).total_seconds()

print("⏱️ Helper functions defined")

## Task 5: Status and Results Endpoints (5 minutes)

### 🎯 Goal
Add traditional polling endpoints for clients that can't use Server-Sent Events.

### 💡 What You'll Learn
- REST API design patterns for async operations
- Status codes and their meanings
- Backward compatibility considerations
- Mobile app integration patterns

### 📝 When to Use Polling vs Streaming

**Use Streaming (SSE) when:**
- ✅ Web browsers with modern JavaScript
- ✅ Real-time dashboard applications
- ✅ Long-running operations (>10 seconds)
- ✅ Network supports persistent connections

**Use Polling when:**
- ✅ Mobile apps (better battery life)
- ✅ Legacy systems or limited JavaScript
- ✅ Corporate firewalls block SSE
- ✅ Short operations (<5 seconds)

In [None]:
# TODO: Create job status endpoint for polling
@app.get("/status/{job_id}")
async def get_job_status(
    job_id: str,
    user_info: Dict = Depends(verify_token)
):
    """
    Get current job status (polling endpoint)
    
    Status codes:
    - 200: Job found and accessible
    - 404: Job not found
    - 403: Access denied (not your job)
    """
    
    # TODO: Check job exists
    if job_id not in agent_state.processing_jobs:
        raise HTTPException(status_code=404, detail="Job not found")
    
    job = agent_state.processing_jobs[job_id]
    
    # TODO: Verify ownership
    if job["user"] != user_info["user"]:
        raise HTTPException(status_code=403, detail="Access denied")
    
    # TODO: Return status information
    # HINT: Don't include full results here, just status
    return {
        "job_id": job_id,
        "status": job["status"],
        "progress": job.get("progress", 0),
        "current_step": job.get("current_step", "Initializing..."),
        "created_at": job["created_at"],
        "estimated_completion": calculate_estimated_completion(job),
        "file_name": job["file_name"]
    }

print("📊 Status endpoint created")

In [None]:
# TODO: Create results retrieval endpoint  
@app.get("/results/{job_id}")
async def get_results(
    job_id: str,
    format: str = "json",  # json, text, or voice
    user_info: Dict = Depends(verify_token)
):
    """
    Get processing results once job is complete
    
    Status codes:
    - 200: Results available
    - 202: Still processing (check back later)
    - 404: Job not found
    - 410: Job failed
    """
    
    # TODO: Validate job exists and check ownership
    if job_id not in agent_state.processing_jobs:
        raise HTTPException(status_code=404, detail="Job not found")
    
    job = agent_state.processing_jobs[job_id]
    
    if job["user"] != user_info["user"]:
        raise HTTPException(status_code=403, detail="Access denied")
    
    # TODO: Check job completion status
    if job["status"] in ["queued", "processing"]:
        raise HTTPException(
            status_code=202, 
            detail=f"Job still {job['status']}. Progress: {job.get('progress', 0)}%"
        )
    
    if job["status"] == "failed":
        raise HTTPException(
            status_code=410, 
            detail=f"Job failed: {job.get('error', 'Unknown error')}"
        )
    
    # TODO: Return results in requested format
    result = job["result"]
    
    if format == "text":
        # Convert JSON to human-readable text
        text_result = format_as_text(result)
        return {"result": text_result, "format": "text"}
    elif format == "voice" and job.get("voice_url"):
        return {
            "voice_url": job["voice_url"],
            "text": format_as_text(result),
            "format": "voice"
        }
    else:
        # Default JSON format
        return {
            "result": result,
            "format": "json",
            "processing_time": calculate_processing_time(job),
            "voice_available": bool(job.get("voice_url"))
        }

print("📥 Results endpoint created")

In [None]:
# TODO: Add job management endpoints
@app.delete("/jobs/{job_id}")
async def cancel_job(
    job_id: str,
    user_info: Dict = Depends(verify_token)
):
    """Cancel a processing job"""
    
    if job_id not in agent_state.processing_jobs:
        raise HTTPException(status_code=404, detail="Job not found")
    
    job = agent_state.processing_jobs[job_id]
    
    if job["user"] != user_info["user"]:
        raise HTTPException(status_code=403, detail="Access denied")
    
    if job["status"] in ["completed", "failed"]:
        return {"message": "Job already finished", "status": job["status"]}
    
    # TODO: Implement actual cancellation logic
    job["status"] = "cancelled"
    job["cancelled_at"] = datetime.now().isoformat()
    
    return {"message": "Job cancelled successfully", "job_id": job_id}

@app.get("/jobs")
async def list_user_jobs(
    limit: int = 10,
    user_info: Dict = Depends(verify_token)
):
    """List user's recent jobs"""
    
    user_jobs = [
        {
            "job_id": job_id,
            "status": job["status"],
            "file_name": job["file_name"],
            "created_at": job["created_at"]
        }
        for job_id, job in agent_state.processing_jobs.items()
        if job["user"] == user_info["user"]
    ]
    
    # Sort by creation time (newest first) and limit
    user_jobs.sort(key=lambda x: x["created_at"], reverse=True)
    
    return {
        "jobs": user_jobs[:limit],
        "total": len(user_jobs)
    }

# TODO: Helper functions
def format_as_text(result: dict) -> str:
    """Convert JSON result to human-readable text"""
    return f"""
Invoice Analysis Results:
========================
Vendor: {result.get('vendor', 'Unknown')}
Invoice #: {result.get('invoice_number', 'Unknown')}
Date: {result.get('date', 'Unknown')}
Total: {result.get('currency', '')} {result.get('total_amount', 0)}

Line Items:
{chr(10).join([f"- {item['description']}: {item['total']}" for item in result.get('line_items', [])])}
""".strip()

def calculate_estimated_completion(job: dict) -> str:
    """Calculate estimated completion time"""
    if job["status"] in ["completed", "failed", "cancelled"]:
        return "N/A"
    
    progress = job.get("progress", 0)
    if progress == 0:
        return "Calculating..."
    
    # Simple estimation based on current progress
    remaining_seconds = int(15 * (100 - progress) / 100)
    return f"{remaining_seconds} seconds"

print("🔧 Job management endpoints created")

## Task 6: Health Checks and Monitoring (5 minutes)

### 🎯 Goal
Add operational endpoints for monitoring the API in production environments.

### 💡 What You'll Learn
- Health check patterns for microservices
- Prometheus-style metrics collection
- Dependency health monitoring
- Production readiness indicators

### 📝 Why Health Checks Matter

**Load balancers** use health checks to:
- ✅ Route traffic only to healthy instances
- ✅ Automatically remove failed instances
- ✅ Trigger auto-scaling based on health

**Monitoring systems** use health checks to:
- ✅ Alert on service degradation
- ✅ Track uptime and availability
- ✅ Diagnose performance issues

In [None]:
# TODO: Create basic health check endpoint
@app.get("/health")
async def health_check():
    """
    Basic health check for load balancers
    
    Returns 200 if service is running
    Load balancers should use this endpoint
    """
    return {
        "status": "healthy",
        "timestamp": datetime.now().isoformat(),
        "version": "1.0.0",
        "uptime_seconds": (datetime.now() - agent_state.startup_time).total_seconds()
    }

print("❤️ Basic health check created")

In [None]:
# TODO: Create detailed health check with dependencies
@app.get("/health/detailed")
async def detailed_health():
    """
    Detailed health check with component status
    
    Checks all dependencies and returns detailed status
    Useful for debugging and monitoring dashboards
    """
    
    health_status = {
        "api": "healthy",
        "langgraph": "unknown",
        "redis": "unknown",
        "llm_backend": "unknown",
        "cartesia_tts": "unknown",
        "deepgram_stt": "unknown"
    }
    
    # TODO: Check LangGraph agent
    try:
        if agent_state.langgraph_app:
            health_status["langgraph"] = "healthy"
        else:
            health_status["langgraph"] = "not_initialized"
    except Exception as e:
        health_status["langgraph"] = f"unhealthy: {str(e)}"
    
    # TODO: Check Redis connection
    try:
        if agent_state.redis_client:
            agent_state.redis_client.ping()
            health_status["redis"] = "healthy"
        else:
            health_status["redis"] = "not_configured"
    except Exception as e:
        health_status["redis"] = f"unhealthy: {str(e)}"
    
    # TODO: Check Ollama LLM backend
    try:
        response = requests.get(f"{OLLAMA_URL}/health", timeout=2)
        if response.status_code == 200:
            health_status["llm_backend"] = "healthy"
        else:
            health_status["llm_backend"] = f"unhealthy: HTTP {response.status_code}"
    except Exception as e:
        health_status["llm_backend"] = f"unhealthy: {str(e)}"
    
    # TODO: Check Cartesia TTS API
    try:
        if CARTESIA_API_KEY:
            # Mock check - in production, make actual API call
            health_status["cartesia_tts"] = "configured"
        else:
            health_status["cartesia_tts"] = "not_configured"
    except Exception as e:
        health_status["cartesia_tts"] = f"error: {str(e)}"
    
    # TODO: Check Deepgram STT API
    try:
        if DEEPGRAM_API_KEY:
            health_status["deepgram_stt"] = "configured"
        else:
            health_status["deepgram_stt"] = "not_configured"
    except Exception as e:
        health_status["deepgram_stt"] = f"error: {str(e)}"
    
    # TODO: Determine overall status
    critical_components = ["api", "llm_backend"]
    critical_healthy = all(
        health_status[comp] == "healthy" 
        for comp in critical_components
    )
    
    overall_status = "healthy" if critical_healthy else "degraded"
    
    return {
        "status": overall_status,
        "components": health_status,
        "active_jobs": len(agent_state.processing_jobs),
        "jobs_by_status": get_job_stats(),
        "uptime_seconds": (datetime.now() - agent_state.startup_time).total_seconds(),
        "timestamp": datetime.now().isoformat()
    }

print("🔍 Detailed health check created")

In [None]:
# TODO: Create metrics endpoint for Prometheus
@app.get("/metrics")
async def get_metrics():
    """
    Prometheus-style metrics for monitoring
    
    These metrics can be scraped by Prometheus/Grafana
    for dashboards and alerting
    """
    
    # TODO: Calculate various metrics
    job_stats = get_job_stats()
    uptime = (datetime.now() - agent_state.startup_time).total_seconds()
    
    # TODO: Format as Prometheus metrics
    metrics = []
    
    # System metrics
    metrics.append(f"invoice_api_uptime_seconds {uptime}")
    metrics.append(f"invoice_api_active_jobs {len(agent_state.processing_jobs)}")
    
    # Job status metrics
    for status, count in job_stats.items():
        metrics.append(f'invoice_api_jobs_total{{status="{status}"}} {count}')
    
    # User tier metrics
    user_stats = get_user_stats()
    for tier, count in user_stats.items():
        metrics.append(f'invoice_api_users_total{{tier="{tier}"}} {count}')
    
    # Success rate (last 100 jobs)
    success_rate = calculate_success_rate()
    metrics.append(f"invoice_api_success_rate {success_rate}")
    
    # Average processing time
    avg_time = calculate_average_processing_time()
    metrics.append(f"invoice_api_avg_processing_seconds {avg_time}")
    
    return "\n".join(metrics), {"Content-Type": "text/plain"}

print("📊 Metrics endpoint created")

In [None]:
# TODO: Helper functions for metrics
def get_job_stats() -> Dict[str, int]:
    """Count jobs by status"""
    stats = {"queued": 0, "processing": 0, "completed": 0, "failed": 0, "cancelled": 0}
    
    for job in agent_state.processing_jobs.values():
        status = job["status"]
        if status in stats:
            stats[status] += 1
    
    return stats

def get_user_stats() -> Dict[str, int]:
    """Count users by tier"""
    stats = {"free": 0, "premium": 0}
    
    for user_info in API_KEYS.values():
        tier = user_info["tier"]
        if tier in stats:
            stats[tier] += 1
    
    return stats

def calculate_success_rate() -> float:
    """Calculate success rate for recent jobs"""
    completed_jobs = [
        job for job in agent_state.processing_jobs.values()
        if job["status"] in ["completed", "failed"]
    ]
    
    if not completed_jobs:
        return 1.0
    
    successful = sum(1 for job in completed_jobs if job["status"] == "completed")
    return successful / len(completed_jobs)

def calculate_average_processing_time() -> float:
    """Calculate average processing time for completed jobs"""
    completed_jobs = [
        job for job in agent_state.processing_jobs.values()
        if job["status"] == "completed" and "processing_time" in job
    ]
    
    if not completed_jobs:
        return 0.0
    
    total_time = sum(calculate_processing_time(job) for job in completed_jobs)
    return total_time / len(completed_jobs)

print("📈 Metrics helper functions defined")

## Task 7: Docker Containerization (5 minutes)

### 🎯 Goal
Create Docker configuration for consistent deployment across environments.

### 💡 What You'll Learn
- Multi-stage Docker builds for smaller images
- Security best practices (non-root user)
- Health checks in containers
- Environment variable management

### 📝 Why Docker?

**Without Docker:**
- ❌ "Works on my machine" problems
- ❌ Complex deployment procedures
- ❌ Dependency conflicts
- ❌ Different environments behave differently

**With Docker:**
- ✅ Identical behavior everywhere
- ✅ Simple deployment (`docker run`)
- ✅ Isolated dependencies
- ✅ Easy scaling with orchestration

In [None]:
# TODO: Create requirements.txt file
requirements_content = """
# FastAPI and server
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6

# Streaming and SSE
sse-starlette==1.6.5

# Data models and validation
pydantic==2.5.0

# LangGraph and AI
langgraph==0.1.0
langchain==0.1.0

# Storage and caching
redis==5.0.1

# HTTP requests
requests==2.31.0

# Image processing
Pillow==10.1.0

# Voice APIs
# cartesia-tts==1.0.0  # Add when available
# deepgram-sdk==3.0.0  # Add when available
""".strip()

# Write requirements.txt
with open('/tmp/requirements.txt', 'w') as f:
    f.write(requirements_content)

print("📦 requirements.txt created")
print(requirements_content)

In [None]:
# TODO: Create Dockerfile with multi-stage build
dockerfile_content = """
# Multi-stage build for smaller production image
FROM python:3.10-slim as builder

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Production stage
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy installed packages from builder stage
COPY --from=builder /root/.local /root/.local

# Copy application code
COPY . .

# Create non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=2)"

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]
""".strip()

# Write Dockerfile
with open('/tmp/Dockerfile', 'w') as f:
    f.write(dockerfile_content)

print("🐳 Dockerfile created")
print(dockerfile_content)

In [None]:
# TODO: Create docker-compose.yml for local development
docker_compose_content = """
version: '3.8'

services:
  invoice-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OLLAMA_URL=http://host.docker.internal:80
      - API_TOKEN=your_token_here
      - MODEL=qwen3:8b
      - CARTESIA_API_KEY=sk_car_opGv9cytcCL97oHBNCns6r
      - DEEPGRAM_API_KEY=3038f0650ad0fd4955efd0191b10948a6fe95b74
    volumes:
      - ./:/app
    depends_on:
      - redis
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  redis_data:
""".strip()

# Write docker-compose.yml
with open('/tmp/docker-compose.yml', 'w') as f:
    f.write(docker_compose_content)

print("🔧 docker-compose.yml created")
print(docker_compose_content)

## Task 8: Deploy to Hugging Face Spaces (7 minutes)

### 🎯 Goal
Deploy your API to free cloud hosting using Hugging Face Spaces.

### 💡 What You'll Learn
- Hugging Face Spaces deployment process
- Gradio wrapper for FastAPI applications
- Environment variable management in cloud
- Production deployment considerations

### 📝 Deployment Options

**Hugging Face Spaces:**
- ✅ Free hosting for public projects
- ✅ Automatic HTTPS and domain
- ✅ Docker support
- ✅ Great for demos and prototypes
- ❌ Limited resources (2 CPU, 16GB RAM)

**Other options:**
- **Railway**: Fast deployment, good for startups
- **Render**: Free tier, easy to use
- **Google Cloud Run**: Pay-per-use, auto-scaling
- **AWS Lambda**: Serverless, very cost-effective

In [None]:
# TODO: Create app.py for Hugging Face Spaces
spaces_app_content = """
# Hugging Face Spaces entry point
import gradio as gr
import requests
import json
from PIL import Image
import io
import time

# Import your FastAPI app
# from main import app

# Configuration
API_BASE_URL = "http://localhost:8000"  # Update for production
API_KEY = "demo-key-123"  # Use environment variable in production

def process_invoice_gradio(image, instructions, voice_enabled):
    \"\"\"
    Gradio interface function for invoice processing
    \"\"\"
    
    if image is None:
        return "Please upload an image", None, None
    
    try:
        # Convert PIL image to bytes
        img_bytes = io.BytesIO()
        image.save(img_bytes, format='PNG')
        img_bytes.seek(0)
        
        # Upload to API
        files = {'file': ('invoice.png', img_bytes, 'image/png')}
        data = {
            'instructions': instructions,
            'voice_response': voice_enabled
        }
        headers = {'Authorization': f'Bearer {API_KEY}'}
        
        response = requests.post(
            f"{API_BASE_URL}/process/invoice",
            files=files,
            data=data,
            headers=headers
        )
        
        if response.status_code == 200:
            job_info = response.json()
            job_id = job_info['job_id']
            
            # Poll for results
            max_wait = 60  # seconds
            waited = 0
            
            while waited < max_wait:
                time.sleep(2)
                waited += 2
                
                status_response = requests.get(
                    f"{API_BASE_URL}/status/{job_id}",
                    headers=headers
                )
                
                if status_response.status_code == 200:
                    status = status_response.json()
                    
                    if status['status'] == 'completed':
                        # Get results
                        results_response = requests.get(
                            f"{API_BASE_URL}/results/{job_id}",
                            headers=headers
                        )
                        
                        if results_response.status_code == 200:
                            results = results_response.json()
                            
                            # Format results
                            result_text = format_results(results['result'])
                            result_json = json.dumps(results['result'], indent=2)
                            
                            # Audio URL if available
                            audio_url = results.get('voice_url')
                            
                            return result_text, result_json, audio_url
                    
                    elif status['status'] == 'failed':
                        return f"Processing failed: {status.get('error', 'Unknown error')}", None, None
            
            return "Processing timed out. Please try again.", None, None
        
        else:
            return f"Upload failed: {response.text}", None, None
    
    except Exception as e:
        return f"Error: {str(e)}", None, None

def format_results(result):
    \"\"\"Format JSON results as readable text\"\"\"
    return f\"\"\"Invoice Analysis Results:
========================
Vendor: {result.get('vendor', 'Unknown')}
Invoice #: {result.get('invoice_number', 'Unknown')}
Date: {result.get('date', 'Unknown')}
Total: {result.get('currency', '')} {result.get('total_amount', 0)}

Line Items:
{chr(10).join([f"- {item['description']}: {item['total']}" for item in result.get('line_items', [])])}
\"\"\"

# Create Gradio interface
with gr.Blocks(title="Invoice Processing Agent") as demo:
    gr.Markdown(\"\"\"# 📄 Invoice Processing Agent
    
    Upload an invoice image to extract structured data using our multimodal AI agent.
    
    **Features:**
    - 🔍 Automatic text extraction with OCR
    - 🧠 Smart data structuring with LLM
    - 🎵 Optional voice response
    - ⚡ Real-time processing updates
    \"\"\")
    
    with gr.Row():
        with gr.Column():
            image_input = gr.Image(
                type="pil", 
                label="📤 Upload Invoice Image",
                height=300
            )
            
            instructions_input = gr.Textbox(
                label="📝 Processing Instructions",
                value="Extract all invoice data including vendor, amounts, dates, and line items",
                lines=3
            )
            
            voice_checkbox = gr.Checkbox(
                label="🎵 Generate Voice Response",
                value=False
            )
            
            process_btn = gr.Button(
                "🚀 Process Invoice", 
                variant="primary",
                size="lg"
            )
        
        with gr.Column():
            result_text = gr.Textbox(
                label="📋 Extracted Data (Formatted)",
                lines=10,
                max_lines=20
            )
            
            result_json = gr.JSON(
                label="🔧 Raw JSON Output"
            )
            
            audio_output = gr.Audio(
                label="🎵 Voice Response",
                visible=False
            )
    
    # Event handlers
    process_btn.click(
        fn=process_invoice_gradio,
        inputs=[image_input, instructions_input, voice_checkbox],
        outputs=[result_text, result_json, audio_output]
    )
    
    # Show audio player when voice is enabled
    voice_checkbox.change(
        fn=lambda x: gr.update(visible=x),
        inputs=[voice_checkbox],
        outputs=[audio_output]
    )
    
    gr.Markdown(\"\"\"### 📚 How to Use
    
    1. **Upload** an invoice image (PNG, JPG, or PDF)
    2. **Customize** processing instructions if needed
    3. **Enable** voice response for audio output
    4. **Click** "Process Invoice" and wait for results
    
    ### 🛠️ Technical Details
    
    - **Backend**: FastAPI with LangGraph agents
    - **Vision**: Multiple OCR engines + LLM analysis
    - **Voice**: Cartesia TTS for natural speech
    - **Deployment**: Docker container on Hugging Face Spaces
    \"\"\")

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0", server_port=7860)
""".strip()

# Write app.py
with open('/tmp/app.py', 'w') as f:
    f.write(spaces_app_content)

print("🎨 Gradio app.py created for Hugging Face Spaces")

In [None]:
# TODO: Create README.md for Hugging Face Spaces
readme_content = """
---
title: Invoice Processing Agent API
emoji: 📄
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
---

# 📄 Invoice Processing Agent API

A production-ready multimodal AI agent for invoice processing with voice support.

## 🚀 Features

- **📤 File Upload**: Support for PNG, JPG, and PDF invoices
- **🔍 Smart Extraction**: Advanced OCR + LLM analysis
- **⚡ Real-time Streaming**: Server-Sent Events for live updates
- **🔐 Authentication**: Secure API key-based access
- **🎵 Voice Support**: Optional TTS responses
- **📊 Monitoring**: Health checks and metrics
- **🐳 Containerized**: Docker-ready for any deployment

## 🛠️ Technology Stack

- **Framework**: FastAPI with async processing
- **AI Pipeline**: LangGraph for document workflows
- **Vision**: Multiple OCR engines (Tesseract, EasyOCR)
- **Language Model**: Qwen3 8B via Ollama
- **Voice**: Cartesia TTS + Deepgram STT
- **Deployment**: Docker on Hugging Face Spaces

## 📚 API Documentation

Once deployed, visit `/docs` for interactive Swagger documentation.

### Key Endpoints

- `POST /process/invoice` - Upload and process invoice
- `GET /stream/{job_id}` - Real-time processing updates
- `GET /results/{job_id}` - Get final results
- `GET /health` - Service health check

### Example Usage

```bash
# Upload invoice for processing
curl -X POST "https://your-space.hf.space/process/invoice" \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@invoice.jpg" \
  -F 'request={"instructions":"Extract vendor and total"}'

# Stream real-time updates
curl -N "https://your-space.hf.space/stream/job-id" \
  -H "Authorization: Bearer your-api-key" \
  -H "Accept: text/event-stream"
```

## 🔧 Local Development

```bash
# Clone and setup
git clone <your-repo>
cd invoice-agent-api

# Run with Docker Compose
docker-compose up --build

# Or run directly
pip install -r requirements.txt
uvicorn main:app --reload
```

## 🔐 Security

- API key authentication required
- Rate limiting per user tier
- File type and size validation
- Non-root container execution

## 📊 Monitoring

- Health checks at `/health` and `/health/detailed`
- Prometheus metrics at `/metrics`
- Job tracking and user statistics

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## 📄 License

MIT License - see LICENSE file for details.
""".strip()

# Write README.md
with open('/tmp/README.md', 'w') as f:
    f.write(readme_content)

print("📖 README.md created for Hugging Face Spaces")

In [None]:
# TODO: Show deployment instructions
deployment_instructions = """
🚀 DEPLOYMENT INSTRUCTIONS
==========================

### Option 1: Hugging Face Spaces (Recommended)

1. **Create new Space**:
   - Go to https://huggingface.co/new-space
   - Choose "Docker" as SDK
   - Set app_port to 8000

2. **Upload files**:
   - Dockerfile
   - requirements.txt
   - app.py (Gradio wrapper)
   - main.py (your FastAPI code)
   - README.md

3. **Set environment variables**:
   - OLLAMA_URL=your_ollama_server
   - API_TOKEN=your_token
   - CARTESIA_API_KEY=sk_car_opGv9cytcCL97oHBNCns6r
   - DEEPGRAM_API_KEY=3038f0650ad0fd4955efd0191b10948a6fe95b74

### Option 2: Railway (Fast deployment)

1. Connect GitHub repo to Railway
2. Railway auto-detects Dockerfile
3. Set environment variables
4. Deploy automatically

### Option 3: Google Cloud Run

```bash
# Build and deploy
gcloud builds submit --tag gcr.io/PROJECT_ID/invoice-api
gcloud run deploy --image gcr.io/PROJECT_ID/invoice-api --platform managed
```

### Option 4: Local Docker

```bash
# Build image
docker build -t invoice-api .

# Run container
docker run -p 8000:8000 \
  -e OLLAMA_URL=http://your-server \
  -e API_TOKEN=your-token \
  invoice-api
```

### Testing Your Deployment

1. **Health check**: GET /health
2. **API docs**: Visit /docs
3. **Upload test**: POST /process/invoice
4. **Monitor**: GET /metrics

🎉 Your API is now live and ready for production!
"""

print(deployment_instructions)

## 🎯 Lab Completion & Testing

### ✅ Assessment Checklist

**Core Functionality:**
- [ ] FastAPI application starts without errors
- [ ] File upload endpoint accepts images and PDFs
- [ ] Authentication blocks invalid API keys
- [ ] Rate limiting prevents abuse
- [ ] Async processing returns job IDs
- [ ] Streaming endpoint provides real-time updates
- [ ] Results endpoint returns structured data

**Production Readiness:**
- [ ] Health checks return 200 status
- [ ] Metrics endpoint provides monitoring data
- [ ] Docker container builds successfully
- [ ] Application handles errors gracefully
- [ ] API documentation is accessible at `/docs`

**Advanced Features:**
- [ ] Voice synthesis integration works
- [ ] Job cancellation functions
- [ ] User job listing works
- [ ] Detailed health checks all components

### 🧪 Testing Your API

Run these commands to test your deployment:

In [None]:
# TODO: Test the API endpoints (if running locally)
# HINT: Uncomment and run these tests if your API is running

import requests
import time

# Configuration
BASE_URL = "http://localhost:8000"  # Change to your deployed URL
TEST_API_KEY = "demo-key-123"
headers = {"Authorization": f"Bearer {TEST_API_KEY}"}

def test_health_check():
    """Test basic health endpoint"""
    try:
        response = requests.get(f"{BASE_URL}/health")
        print(f"✅ Health check: {response.status_code} - {response.json()['status']}")
        return True
    except Exception as e:
        print(f"❌ Health check failed: {e}")
        return False

def test_authentication():
    """Test authentication with invalid key"""
    try:
        bad_headers = {"Authorization": "Bearer invalid-key"}
        response = requests.get(f"{BASE_URL}/status/test", headers=bad_headers)
        if response.status_code == 403:
            print("✅ Authentication: Correctly rejects invalid keys")
            return True
        else:
            print(f"❌ Authentication: Expected 403, got {response.status_code}")
            return False
    except Exception as e:
        print(f"❌ Authentication test failed: {e}")
        return False

def test_file_upload():
    """Test file upload with mock image"""
    try:
        # Create a small test image
        from PIL import Image
        import io
        
        # Create test image
        img = Image.new('RGB', (100, 100), color='white')
        img_bytes = io.BytesIO()
        img.save(img_bytes, format='PNG')
        img_bytes.seek(0)
        
        # Upload test
        files = {'file': ('test.png', img_bytes, 'image/png')}
        data = {'instructions': 'Test processing'}
        
        response = requests.post(
            f"{BASE_URL}/process/invoice",
            files=files,
            data=data,
            headers=headers
        )
        
        if response.status_code == 200:
            result = response.json()
            print(f"✅ File upload: Job {result['job_id']} created")
            return result['job_id']
        else:
            print(f"❌ File upload failed: {response.status_code} - {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ File upload test failed: {e}")
        return None

def test_job_status(job_id):
    """Test job status endpoint"""
    try:
        response = requests.get(f"{BASE_URL}/status/{job_id}", headers=headers)
        if response.status_code == 200:
            status = response.json()
            print(f"✅ Job status: {status['status']} ({status['progress']}%)")
            return True
        else:
            print(f"❌ Job status failed: {response.status_code}")
            return False
    except Exception as e:
        print(f"❌ Job status test failed: {e}")
        return False

# Run tests if API is available
print("🧪 Starting API Tests...")
print("\n⚠️  Note: These tests require the API to be running locally")
print("   Start your API first: uvicorn main:app --reload\n")

# Uncomment these lines to run tests:
# if test_health_check():
#     test_authentication()
#     job_id = test_file_upload()
#     if job_id:
#         time.sleep(2)
#         test_job_status(job_id)

print("\n🎉 Lab completed! Your production API is ready to deploy.")

## 🏆 Congratulations!

You've successfully built a **production-ready Invoice Processing API** with:

### 🎯 What You Accomplished
- ✅ **FastAPI Backend** with async processing
- ✅ **Authentication & Security** with API keys and rate limiting
- ✅ **File Upload & Processing** with validation and job management
- ✅ **Real-time Streaming** with Server-Sent Events
- ✅ **Voice Integration** with Cartesia TTS and Deepgram STT
- ✅ **Health Monitoring** with metrics and diagnostics
- ✅ **Docker Containerization** for consistent deployment
- ✅ **Cloud Deployment** ready for Hugging Face Spaces

### 🚀 Key Skills Learned
1. **Production API Design** - REST patterns, async processing, error handling
2. **Security Implementation** - Authentication, rate limiting, input validation
3. **Real-time Communication** - Server-Sent Events for live updates
4. **Containerization** - Docker best practices for AI applications
5. **Cloud Deployment** - Platform-agnostic deployment strategies
6. **Monitoring & Observability** - Health checks, metrics, logging

### 🔄 Next Steps
- **Deploy** your API to Hugging Face Spaces or your preferred platform
- **Integrate** with a frontend application (React, Vue, or mobile app)
- **Scale** with Redis for session management and job queuing
- **Monitor** with Prometheus/Grafana for production insights
- **Enhance** with additional AI capabilities (document classification, fraud detection)

### 💡 Real-World Applications
This pattern applies to any AI service:
- **Document Processing** (contracts, receipts, forms)
- **Image Analysis** (medical scans, quality inspection)
- **Voice Assistants** (customer service, accessibility)
- **Content Generation** (reports, summaries, translations)

**You now have the skills to ship AI agents to production!** 🎉

---

## 📚 Additional Resources

- **FastAPI Documentation**: https://fastapi.tiangolo.com/
- **Docker Best Practices**: https://docs.docker.com/develop/dev-best-practices/
- **Hugging Face Spaces**: https://huggingface.co/docs/hub/spaces
- **Server-Sent Events**: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events
- **API Security**: https://owasp.org/www-project-api-security/

## 🏃‍♂️ Bonus Challenges (For Fast Finishers)

### Challenge 1: Add WebSocket Support
Implement bidirectional communication for real-time voice chat with the invoice agent.

### Challenge 2: Implement Caching
Add Redis caching for processed invoices to avoid reprocessing identical documents.

### Challenge 3: Add Batch Processing
Create an endpoint that accepts multiple invoices and processes them in parallel.

### Challenge 4: Build a Frontend
Create a React/Vue.js frontend that consumes your API with drag-and-drop file upload.

### Challenge 5: Add Database Integration
Replace in-memory job storage with PostgreSQL or MongoDB for persistence.

### Challenge 6: Implement A/B Testing
Add multiple processing pipelines and test which performs better for different invoice types.

**Time remaining? Pick a challenge and level up your production skills!** 🚀