# Part IX: Production Deployment

## Chapter 21: Performance Tuning

Deploying to production is only the beginning. To handle real-world traffic efficiently, you must tune concurrency settings, implement strategic caching, and optimize connection pooling. This chapter covers the quantitative adjustments that transform a working application into a high-performance system capable of handling thousands of concurrent requests.

---

### 21.1 Concurrency Settings: Workers, Threads, and Limits

Concurrency tuning balances resource utilization against throughput and latency. The optimal configuration depends on your workload: I/O-bound (database/API calls) vs CPU-bound (calculations, data processing).

#### Understanding Concurrency Models

```
┌─────────────────────────────────────────────────────────────────┐
│              Concurrency Model Comparison                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Synchronous Workers (sync)                                     │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐                           │
│  │ Worker 1│ │ Worker 2│ │ Worker 3│                           │
│  │ [====]  │ │ [====]  │ │ [====]  │                           │
│  │ Request │ │ Request │ │ Request │                           │
│  │ (blocks)│ │ (blocks)│ │ (blocks)│                           │
│  └─────────┘ └─────────┘ └─────────┘                           │
│                                                                  │
│  • One request per worker at a time                             │
│  • Worker blocked during I/O (database, API calls)                │
│  • Need many workers for I/O-bound workloads                      │
│  • Memory overhead per worker                                     │
│                                                                  │
│  ─────────────────────────────────────────────────────────────  │
│                                                                  │
│  Async Workers (Uvicorn)                                          │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    Event Loop                            │    │
│  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐       │    │
│  │  │ Req │ │ Req │ │ Req │ │ Req │ │ Req │ │ Req │       │    │
│  │  │ [==│ │ [==│ │ [==│ │ [==│ │ [==│ │ [==│       │    │
│  │  │await│ │await│ │await│ │await│ │await│ │await│       │    │
│  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘       │    │
│  │                                                          │    │
│  │  Single worker handles hundreds of concurrent requests   │    │
│  │  Event loop switches during await (I/O operations)       │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│  • Thousands of concurrent connections per worker               │
│  • Efficient for I/O-bound workloads                            │
│  • Lower memory footprint                                         │
│  • Requires async libraries (asyncpg, httpx, etc.)              │
│                                                                  │
│  ─────────────────────────────────────────────────────────────  │
│                                                                  │
│  Hybrid: Gunicorn + Uvicorn Workers                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    Gunicorn (Master)                     │    │
│  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐       │    │
│  │  │   Worker 1  │ │   Worker 2  │ │   Worker 3  │       │    │
│  │  │  (Uvicorn)  │ │  (Uvicorn)  │ │  (Uvicorn)  │       │    │
│  │  │  Event Loop │ │  Event Loop │ │  Event Loop │       │    │
│  │  │  + App      │ │  + App      │ │  + App      │       │    │
│  │  └─────────────┘ └─────────────┘ └─────────────┘       │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│  Best of both worlds:                                            │
│  • Gunicorn manages worker processes (restarts, load balancing) │
│  • Uvicorn workers handle async concurrency efficiently         │
│  • Multiple workers utilize all CPU cores                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

#### Tuning Formulas and Configuration

```python
# tuning_config.py - Production tuning guide

"""
Tuning Formulas:

1. Worker Count (Gunicorn):
   - CPU-bound: workers = CPU cores
   - I/O-bound: workers = (2 x CPU cores) + 1
   - With threads: workers = CPU cores, threads = 2-4 per worker

2. Uvicorn Workers (within Gunicorn):
   - Each Uvicorn worker runs one event loop
   - Handles hundreds of concurrent connections
   - Limited by memory (each worker loads full app)

3. Database Pool Size:
   - pool_size = (workers x threads) + buffer
   - Example: 4 workers x 2 threads = 8 + 2 buffer = 10

4. Connection Limits:
   - Nginx worker_connections: 1024-4096 per worker
   - System file descriptors: ulimit -n 65535
"""

# gunicorn_prod.conf.py - Production tuning
import multiprocessing
import os

# Server socket
bind = "0.0.0.0:8000"

# Worker configuration
workers = int(os.getenv("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000

# Thread configuration (for gthread worker, not UvicornWorker)
# threads = 4

# Worker lifecycle
max_requests = 10000  # Restart workers after this many requests
max_requests_jitter = 100  # Randomize to prevent thundering herd
timeout = 120  # Seconds before killing silent worker
graceful_timeout = 30  # Seconds to wait for graceful shutdown
keepalive = 5  # Seconds to keep connection alive

# Preload application (saves memory with copy-on-write)
preload_app = True

# Logging
accesslog = "-"  # stdout
errorlog = "-"   # stdout
loglevel = os.getenv("LOG_LEVEL", "info")
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Process naming
proc_name = "fastapi_prod"

# Server mechanics
daemon = False
pidfile = "/tmp/gunicorn.pid"

# SSL (handled by load balancer, but configurable)
forwarded_allow_ips = "*"

# Worker temporary directory
worker_tmp_dir = "/dev/shm"

# Hooks
def on_starting(server):
    """Log startup configuration."""
    print(f"Starting Gunicorn with {workers} workers")

def when_ready(server):
    """Called when workers are spawned."""
    print("Gunicorn is ready to accept connections")

def worker_int(worker):
    """Handle worker interrupt."""
    print(f"Worker {worker.pid} interrupted")

def on_exit(server):
    """Cleanup on shutdown."""
    print("Gunicorn shutting down")
```

---

### 20.2 Reverse Proxies: Nginx Configuration for FastAPI

Nginx serves as the entry point to your application, handling SSL termination, static files, load balancing, and protection against slow clients.

#### Complete Nginx Production Config

```nginx
# /etc/nginx/nginx.conf - Main configuration
user nginx;
worker_processes auto;  # Auto-detect CPU cores
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

# Load modules
load_module modules/ngx_http_headers_more_filter_module.so;

events {
    worker_connections 4096;  # Per worker
    use epoll;  # Linux-specific, efficient
    multi_accept on;
}

http {
    # MIME types
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging format with detailed timing
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'rt=$request_time uct="$upstream_connect_time" '
                    'uht="$upstream_header_time" urt="$upstream_response_time"';

    access_log /var/log/nginx/access.log main;

    # Performance tuning
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;  # Hide nginx version

    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_min_length 1000;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/json
        application/javascript
        application/xml+rss
        application/rss+xml
        font/truetype
        font/opentype
        application/vnd.ms-fontobject
        image/svg+xml;

    # Rate limiting zones
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
    limit_conn_zone $binary_remote_addr zone=addr:10m;

    # Upstream (Gunicorn backend)
    upstream fastapi_app {
        least_conn;  # Load balancing method
        
        # Multiple app instances for horizontal scaling
        server app:8000 weight=5;
        # server app2:8000 weight=5;  # If running multiple containers
        
        keepalive 32;
    }

    # HTTP to HTTPS redirect
    server {
        listen 80;
        server_name api.example.com;
        return 301 https://$server_name$request_uri;
    }

    # HTTPS Server
    server {
        listen 443 ssl http2;
        server_name api.example.com;

        # SSL certificates (Let's Encrypt or commercial)
        ssl_certificate /etc/nginx/ssl/fullchain.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;
        ssl_trusted_certificate /etc/nginx/ssl/chain.pem;

        # Modern SSL configuration
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_prefer_server_ciphers off;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
        ssl_session_timeout 1d;
        ssl_session_cache shared:SSL:50m;
        ssl_stapling on;
        ssl_stapling_verify on;

        # Security headers
        add_header X-Frame-Options "SAMEORIGIN" always;
        add_header X-Content-Type-Options "nosniff" always;
        add_header X-XSS-Protection "1; mode=block" always;
        add_header Referrer-Policy "strict-origin-when-cross-origin" always;
        add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

        # Hide server version
        more_clear_headers Server;

        # Static files (if serving from FastAPI)
        location /static {
            alias /var/www/static;
            expires 1y;
            add_header Cache-Control "public, immutable";
            access_log off;
        }

        # Health check (bypass rate limiting)
        location /health {
            access_log off;
            proxy_pass http://fastapi_app;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }

        # Main API location
        location / {
            # Rate limiting
            limit_req zone=api burst=20 nodelay;
            limit_conn addr 10;

            # Proxy to Gunicorn
            proxy_pass http://fastapi_app;
            proxy_http_version 1.1;

            # Headers
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;

            # Timeouts
            proxy_connect_timeout 30s;
            proxy_send_timeout 30s;
            proxy_read_timeout 30s;

            # Buffering
            proxy_buffering on;
            proxy_buffer_size 4k;
            proxy_buffers 8 4k;
            proxy_busy_buffers_size 8k;

            # WebSocket support
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }

        # Error pages
        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
            root /var/www/errors;
            internal;
        }
    }
}
```

---

### Summary

In this chapter, you deployed FastAPI to production environments:

1. **Process Managers**: Configured Gunicorn with Uvicorn workers, calculated optimal worker counts using `(2 x CPU) + 1` formula, set up graceful shutdown handling, and configured logging and health monitoring.

2. **Reverse Proxies**: Deployed Nginx for SSL termination, static file serving with proper caching headers, rate limiting to prevent abuse, and load balancing across multiple Gunicorn workers with proper timeout and buffering configuration.

3. **Cloud Deployment**: Deployed to AWS ECS with Fargate for serverless containers, Google Cloud Run for automatic scaling to zero, and modern platforms like Render and Fly.io for simplified developer experience with global edge deployment.

4. **CI/CD Pipelines**: Built GitHub Actions workflows for automated testing, Docker image building with layer caching, container registry publishing, database migrations with Alembic, and zero-downtime deployment with health verification and automatic rollback.

**Production Readiness Checklist:**
- [ ] Gunicorn with Uvicorn workers (not development server)
- [ ] Nginx reverse proxy with SSL termination
- [ ] Database migrations run before code deployment
- [ ] Health check endpoints implemented
- [ ] Environment variables for all configuration (12-factor)
- [ ] Logging aggregation (CloudWatch, Stackdriver, etc.)
- [ ] Monitoring and alerting (CPU, memory, response times)
- [ ] Database connection pooling configured
- [ ] Redis/caching layer for hot data
- [ ] Backup strategy for database and file storage

---

### What's Next?

**Chapter 22: Error Handling** will cover:
- **HTTPException**: Raising structured HTTP errors with appropriate status codes and detail messages
- **Custom Exception Handlers**: Global error handling with consistent JSON error responses, logging integration, and user-friendly messages
- **Structured Logging**: Implementing production-grade logging with correlation IDs, log levels, and centralized log aggregation

This next chapter ensures your application handles failures gracefully and provides observability for debugging production issues.