# Tutorial 4: Production Deployment

Deploy Aragora in production with proper configuration, monitoring, and scaling.

**What you'll learn:**
- Production configuration and environment variables
- Starting the HTTP/WebSocket server
- Monitoring and observability
- Rate limiting and resilience

**Time:** ~15 minutes

## Production Configuration

Aragora validates configuration at startup. Set these environment variables for production.

In [None]:
import os

# Show required production configuration
production_config = {
    # Required in production
    "ARAGORA_ENV": "production",
    "DATABASE_URL": "postgresql://user:pass@localhost:5432/aragora",
    "ARAGORA_API_TOKEN": "your-secret-token-at-least-16-chars",

    # At least one LLM API key
    "ANTHROPIC_API_KEY": "sk-ant-...",

    # Recommended
    "REDIS_URL": "redis://localhost:6379",
    "ARAGORA_ALLOWED_ORIGINS": "https://yourdomain.com",
    "ARAGORA_RATE_LIMIT": "60",  # requests per minute
}

for key, example in production_config.items():
    current = os.getenv(key, "(not set)")
    status = "✓" if current != "(not set)" else "✗"

## Configuration Validation

Use the ConfigValidator to check your setup before deployment.

In [None]:
from aragora.server.config_validator import ConfigValidator

# Run validation
result = ConfigValidator.validate_all()


if result.errors:
    for error in result.errors:
        pass

if result.warnings:
    for warning in result.warnings:
        pass

In [None]:
# Get configuration summary (secrets are masked)
summary = ConfigValidator.get_config_summary()

for key, value in summary.items():
    pass

## Starting the Server

Aragora provides a unified HTTP + WebSocket server.

```bash
# Start with default settings
aragora serve

# Custom ports
aragora serve --api-port 8080 --ws-port 8765

# Bind to all interfaces (production)
aragora serve --host 0.0.0.0
```

In [None]:
# You can also start the server programmatically

# This would start the server (don't run in notebook)
# await run_unified_server(
#     http_port=8080,
#     ws_port=8765,
#     static_dir=None  # Serve static files from this directory
# )


## Health Checks

Aragora provides health endpoints for Kubernetes and monitoring.

In [None]:
import urllib.request
import json

SERVER_URL = "http://localhost:8080"

def check_health(endpoint):
    try:
        with urllib.request.urlopen(f"{SERVER_URL}{endpoint}", timeout=5) as resp:
            return resp.status, json.loads(resp.read())
    except Exception as e:
        return None, str(e)

# Health endpoints
endpoints = [
    "/healthz",           # Kubernetes liveness
    "/readyz",            # Kubernetes readiness
    "/api/health",        # Basic health
    "/api/health/detailed" # Detailed component status
]

for endpoint in endpoints:
    status, data = check_health(endpoint)
    if status:
        pass
    else:
        pass

## Rate Limiting

Aragora has per-provider rate limiting to avoid API quota issues.

In [None]:
from aragora.agents.api_agents.rate_limiter import RateLimiter, PROVIDER_LIMITS

for provider, limits in PROVIDER_LIMITS.items():
    pass

In [None]:
# Create a rate limiter
limiter = RateLimiter("anthropic")

# Check if request is allowed
if limiter.is_allowed():
    # Make API call...
    limiter.record_request(tokens_used=1000)
else:
    pass

## Circuit Breaker

The circuit breaker prevents cascading failures when agents fail repeatedly.

In [None]:
from aragora.resilience import CircuitBreaker

# Create a circuit breaker
breaker = CircuitBreaker(
    name="anthropic-api",
    failure_threshold=5,    # Open after 5 failures
    recovery_timeout=60,    # Try again after 60 seconds
    half_open_requests=2    # Allow 2 test requests
)


In [None]:
# Using the circuit breaker
async def call_with_breaker():
    if not breaker.is_available():
        return None

    try:
        # Make API call
        result = "success"  # Simulated
        breaker.record_success()
        return result
    except Exception:
        breaker.record_failure()
        raise


## Redis Caching

Use Redis for distributed caching across multiple server instances.

In [None]:
from aragora.utils.redis_cache import RedisTTLCache

# Create Redis cache (falls back to in-memory if Redis unavailable)
cache = RedisTTLCache(
    redis_url=os.getenv("REDIS_URL"),
    default_ttl=3600,  # 1 hour
    prefix="aragora:"
)

# Cache a debate result
cache.set("debate:123", {"result": "consensus", "confidence": 0.85})

# Retrieve
cached = cache.get("debate:123")

## Observability

Aragora exports Prometheus metrics and supports OpenTelemetry tracing.

In [None]:
# Prometheus metrics are exposed at /metrics
metrics = [
    "aragora_debates_total",
    "aragora_debate_duration_seconds",
    "aragora_consensus_rate",
    "aragora_agent_response_time_seconds",
    "aragora_agent_errors_total",
    "aragora_tokens_used_total",
    "aragora_memory_tier_size",
]

for metric in metrics:
    pass

## Docker Deployment

Deploy with Docker Compose for production.

```yaml
# docker-compose.yml
version: '3.8'
services:
  aragora:
    build: .
    ports:
      - "8080:8080"
      - "8765:8765"
    environment:
      - ARAGORA_ENV=production
      - DATABASE_URL=postgresql://postgres:password@db:5432/aragora
      - REDIS_URL=redis://redis:6379
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    depends_on:
      - db
      - redis
  
  db:
    image: postgres:15
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=aragora
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:
```

## Production Checklist

Before deploying to production:

- [ ] Set `ARAGORA_ENV=production`
- [ ] Use PostgreSQL (not SQLite)
- [ ] Configure Redis for caching
- [ ] Set up CORS allowed origins
- [ ] Configure rate limits
- [ ] Set up monitoring (Prometheus/Grafana)
- [ ] Configure health check endpoints
- [ ] Set API authentication tokens
- [ ] Review security configuration

## Next Steps

- **Tutorial 5**: Advanced features (Gauntlet, ELO tournaments)
- **RUNBOOK.md**: Detailed operations guide
- **TROUBLESHOOTING.md**: Common issues and solutions