# üîß Week 5: Backend Development with FastAPI

This notebook covers building production-ready ML APIs with FastAPI.

## Table of Contents
1. [FastAPI Fundamentals](#1-fastapi-fundamentals)
2. [Request/Response Models](#2-requestresponse-models)
3. [Model Serving](#3-model-serving)
4. [Async Processing](#4-async-processing)
5. [Error Handling](#5-error-handling)
6. [API Best Practices](#6-api-best-practices)

---

## 1. FastAPI Fundamentals

### 1.1 Why FastAPI?

| Feature | Benefit |
|---------|--------|
| **Fast** | Built on Starlette/uvicorn, very high performance |
| **Type hints** | Automatic validation and documentation |
| **Async support** | Native async/await for I/O-bound operations |
| **OpenAPI** | Auto-generated Swagger/ReDoc documentation |
| **Pydantic** | Data validation with Python type hints |

### 1.2 Basic Application Structure

In [None]:
# Basic FastAPI application structure
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import uvicorn

# Create app instance
app = FastAPI(
    title="ML API",
    description="Machine Learning Model Serving API",
    version="1.0.0"
)

# Health check endpoint
@app.get("/health")
async def health_check():
    """Check if API is running."""
    return {"status": "healthy"}

# Root endpoint
@app.get("/")
async def root():
    return {"message": "Welcome to the ML API"}

print("‚úÖ Basic FastAPI app structure defined!")
print("Run with: uvicorn app:app --reload")

---

## 2. Request/Response Models

### 2.1 Pydantic Models

Pydantic models define the schema for request/response data with automatic validation.

In [None]:
from pydantic import BaseModel, Field, validator
from typing import List, Optional
from enum import Enum

# Enum for model types
class ModelType(str, Enum):
    CLASSIFICATION = "classification"
    REGRESSION = "regression"
    EMBEDDING = "embedding"

# Request model for predictions
class PredictionRequest(BaseModel):
    """
    Request schema for ML predictions.
    
    Pydantic automatically validates:
    - Type correctness
    - Required fields
    - Value constraints
    """
    text: str = Field(..., min_length=1, description="Input text for prediction")
    model_type: ModelType = Field(default=ModelType.CLASSIFICATION)
    top_k: int = Field(default=5, ge=1, le=100, description="Number of results")
    include_confidence: bool = Field(default=True)
    
    @validator('text')
    def text_not_empty(cls, v):
        if not v.strip():
            raise ValueError('Text cannot be empty or whitespace only')
        return v.strip()
    
    class Config:
        schema_extra = {
            "example": {
                "text": "This product is amazing!",
                "model_type": "classification",
                "top_k": 3
            }
        }

# Response model
class PredictionResult(BaseModel):
    label: str
    confidence: float = Field(..., ge=0, le=1)

class PredictionResponse(BaseModel):
    """Response schema for ML predictions."""
    request_id: str
    predictions: List[PredictionResult]
    model_version: str
    processing_time_ms: float

# Test validation
try:
    valid_request = PredictionRequest(text="Hello world", top_k=3)
    print(f"‚úÖ Valid request: {valid_request}")
except Exception as e:
    print(f"‚ùå Validation error: {e}")

try:
    invalid_request = PredictionRequest(text="", top_k=200)
except Exception as e:
    print(f"‚úÖ Caught invalid request: {e}")

### 2.2 API Endpoint with Models

In [None]:
import uuid
import time

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """
    Generate predictions for input text.
    
    - **text**: Input text to classify
    - **model_type**: Type of model to use
    - **top_k**: Number of predictions to return
    """
    start_time = time.time()
    
    # Simulate prediction (replace with actual model)
    predictions = [
        PredictionResult(label="positive", confidence=0.85),
        PredictionResult(label="neutral", confidence=0.10),
        PredictionResult(label="negative", confidence=0.05),
    ][:request.top_k]
    
    processing_time = (time.time() - start_time) * 1000
    
    return PredictionResponse(
        request_id=str(uuid.uuid4()),
        predictions=predictions,
        model_version="1.0.0",
        processing_time_ms=processing_time
    )

print("‚úÖ Prediction endpoint defined!")

---

## 3. Model Serving

### 3.1 Loading Models at Startup

Use lifespan events to load models once at startup.

In [None]:
from contextlib import asynccontextmanager

# Global model registry
class ModelRegistry:
    """Singleton to hold loaded models."""
    models: dict = {}
    
    @classmethod
    def load_model(cls, name: str, model):
        cls.models[name] = model
        print(f"Loaded model: {name}")
    
    @classmethod
    def get_model(cls, name: str):
        return cls.models.get(name)

@asynccontextmanager
async def lifespan(app: FastAPI):
    """Lifespan context manager for startup/shutdown."""
    # Startup: Load models
    print("üöÄ Starting up...")
    
    # Load your ML models here
    # ModelRegistry.load_model("classifier", load_classifier())
    # ModelRegistry.load_model("embedder", load_embedder())
    
    yield  # Application runs here
    
    # Shutdown: Cleanup
    print("üëã Shutting down...")
    ModelRegistry.models.clear()

# Create app with lifespan
app_with_lifespan = FastAPI(lifespan=lifespan)

print("‚úÖ Lifespan management configured!")

### 3.2 Dependency Injection for Models

In [None]:
from fastapi import Depends

# Dependency to get model
def get_classifier():
    """Dependency injection for classifier model."""
    model = ModelRegistry.get_model("classifier")
    if model is None:
        raise HTTPException(
            status_code=503,
            detail="Model not loaded"
        )
    return model

# Use dependency in endpoint
@app.post("/classify")
async def classify_text(
    request: PredictionRequest,
    model = Depends(get_classifier)
):
    """
    Classify text using injected model.
    
    Dependencies are resolved automatically by FastAPI.
    """
    # In real code: result = model.predict(request.text)
    return {"result": "predicted"}

print("‚úÖ Dependency injection configured!")

---

## 4. Async Processing

### 4.1 Background Tasks

In [None]:
from fastapi import BackgroundTasks

def log_prediction(request_id: str, text: str, result: dict):
    """Background task to log predictions."""
    # In real code: write to database or log file
    print(f"Logged prediction {request_id}: {text[:50]}...")

@app.post("/predict_async")
async def predict_with_logging(
    request: PredictionRequest,
    background_tasks: BackgroundTasks
):
    """Prediction with async logging."""
    request_id = str(uuid.uuid4())
    
    # Make prediction
    result = {"label": "positive", "confidence": 0.9}
    
    # Add logging as background task (doesn't block response)
    background_tasks.add_task(
        log_prediction, 
        request_id, 
        request.text, 
        result
    )
    
    return {"request_id": request_id, "result": result}

print("‚úÖ Background tasks configured!")

### 4.2 Batch Processing

In [None]:
from typing import List

class BatchPredictionRequest(BaseModel):
    texts: List[str] = Field(..., min_items=1, max_items=100)
    model_type: ModelType = ModelType.CLASSIFICATION

class BatchPredictionResponse(BaseModel):
    request_id: str
    results: List[PredictionResult]
    total_items: int
    processing_time_ms: float

@app.post("/predict/batch", response_model=BatchPredictionResponse)
async def predict_batch(request: BatchPredictionRequest):
    """
    Process multiple predictions in a batch.
    
    More efficient than individual requests for multiple items.
    """
    start_time = time.time()
    
    # Batch process (replace with actual model batch inference)
    results = [
        PredictionResult(label="positive", confidence=0.8 + i*0.01)
        for i, _ in enumerate(request.texts)
    ]
    
    processing_time = (time.time() - start_time) * 1000
    
    return BatchPredictionResponse(
        request_id=str(uuid.uuid4()),
        results=results,
        total_items=len(request.texts),
        processing_time_ms=processing_time
    )

print("‚úÖ Batch processing configured!")

---

## 5. Error Handling

### 5.1 Custom Exception Handlers

In [None]:
from fastapi import Request
from fastapi.responses import JSONResponse

# Custom exception
class ModelError(Exception):
    """Custom exception for model errors."""
    def __init__(self, message: str, model_name: str):
        self.message = message
        self.model_name = model_name

# Exception handler
@app.exception_handler(ModelError)
async def model_error_handler(request: Request, exc: ModelError):
    return JSONResponse(
        status_code=500,
        content={
            "error": "model_error",
            "message": exc.message,
            "model": exc.model_name,
            "path": str(request.url)
        }
    )

# Global exception handler
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
    return JSONResponse(
        status_code=500,
        content={
            "error": "internal_error",
            "message": "An unexpected error occurred",
            "detail": str(exc) if app.debug else None
        }
    )

print("‚úÖ Exception handlers configured!")

---

## 6. API Best Practices

### 6.1 Rate Limiting

In [None]:
from collections import defaultdict
from datetime import datetime, timedelta

class RateLimiter:
    """Simple in-memory rate limiter."""
    def __init__(self, max_requests: int = 100, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)
    
    def is_allowed(self, client_id: str) -> bool:
        now = datetime.now()
        window_start = now - timedelta(seconds=self.window_seconds)
        
        # Remove old requests
        self.requests[client_id] = [
            t for t in self.requests[client_id] if t > window_start
        ]
        
        if len(self.requests[client_id]) >= self.max_requests:
            return False
        
        self.requests[client_id].append(now)
        return True

rate_limiter = RateLimiter(max_requests=100, window_seconds=60)

# Middleware for rate limiting
from fastapi import Request

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    client_ip = request.client.host
    
    if not rate_limiter.is_allowed(client_ip):
        return JSONResponse(
            status_code=429,
            content={"error": "Rate limit exceeded"}
        )
    
    return await call_next(request)

print("‚úÖ Rate limiting configured!")

### 6.2 API Versioning

In [None]:
from fastapi import APIRouter

# Create versioned routers
router_v1 = APIRouter(prefix="/api/v1", tags=["v1"])
router_v2 = APIRouter(prefix="/api/v2", tags=["v2"])

@router_v1.get("/models")
async def list_models_v1():
    """V1: List available models."""
    return {"models": ["classifier-v1"]}

@router_v2.get("/models")
async def list_models_v2():
    """V2: List available models with metadata."""
    return {
        "models": [
            {"name": "classifier-v2", "version": "2.0", "type": "classification"}
        ]
    }

# Include routers in app
app.include_router(router_v1)
app.include_router(router_v2)

print("‚úÖ API versioning configured!")
print("Endpoints: /api/v1/models and /api/v2/models")

---

## üìù Summary

### Key Concepts

| Concept | Best Practice |
|---------|---------------|
| **Request Models** | Use Pydantic with validators |
| **Model Loading** | Load at startup with lifespan |
| **Dependencies** | Use Depends() for injection |
| **Background Tasks** | Non-blocking async operations |
| **Error Handling** | Custom exception handlers |
| **Rate Limiting** | Protect against abuse |
| **Versioning** | Maintain backward compatibility |

### Production Checklist

- [ ] Health check endpoint
- [ ] Request validation with Pydantic
- [ ] Model loading at startup
- [ ] Proper error handling
- [ ] Rate limiting
- [ ] Logging and monitoring
- [ ] API versioning
- [ ] CORS configuration
- [ ] Authentication (if needed)