# Lab 4.4.2: Docker Compose Stack - SOLUTION

**Module:** 4.4 - Containerization & Cloud Deployment  
**This is the complete solution notebook with all exercises solved.**

---

## Exercise 1 Solution: Add Redis Cache

In [None]:
# Complete Redis service configuration

redis_config = '''
  # ============================================
  # Redis - Response Cache
  # ============================================
  redis:
    image: redis:7-alpine
    container_name: redis-cache
    ports:
      - "6379:6379"
    command: redis-server --appendonly yes --maxmemory 2gb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    restart: unless-stopped
    networks:
      - ml-network
'''

print("REDIS SERVICE CONFIGURATION:")
print("=" * 60)
print(redis_config)

print("\nKEY FEATURES:")
print("  - redis:7-alpine: Lightweight Alpine-based image")
print("  - appendonly yes: Persistent storage")
print("  - maxmemory 2gb: Memory limit for cache")
print("  - allkeys-lru: Eviction policy for cache")
print("  - Health check with redis-cli ping")

In [None]:
# Updated inference server to use Redis cache

inference_with_cache = '''
  inference:
    image: llm-inference:latest
    container_name: llm-inference
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/models
      - CUDA_VISIBLE_DEVICES=0
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - CACHE_TTL=3600
    volumes:
      - ./models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    depends_on:
      redis:
        condition: service_healthy
      vectordb:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    restart: unless-stopped
    networks:
      - ml-network
'''

print("INFERENCE SERVER WITH REDIS CACHE:")
print("=" * 60)
print(inference_with_cache)

In [None]:
# Python code for using Redis cache in inference server

cache_code = '''
"""Redis caching for LLM inference."""

import os
import json
import hashlib
from typing import Optional
import redis


class ResponseCache:
    """Cache LLM responses to reduce latency and costs."""
    
    def __init__(
        self,
        host: str = None,
        port: int = None,
        ttl: int = None,
    ):
        self.host = host or os.environ.get("REDIS_HOST", "localhost")
        self.port = port or int(os.environ.get("REDIS_PORT", 6379))
        self.ttl = ttl or int(os.environ.get("CACHE_TTL", 3600))
        
        self.client = redis.Redis(
            host=self.host,
            port=self.port,
            decode_responses=True,
        )
    
    def _make_key(self, prompt: str, params: dict) -> str:
        """Create cache key from prompt and parameters."""
        key_data = json.dumps({"prompt": prompt, "params": params}, sort_keys=True)
        return f"llm:response:{hashlib.sha256(key_data.encode()).hexdigest()[:16]}"
    
    def get(self, prompt: str, params: dict) -> Optional[str]:
        """Get cached response."""
        key = self._make_key(prompt, params)
        return self.client.get(key)
    
    def set(self, prompt: str, params: dict, response: str) -> None:
        """Cache a response."""
        key = self._make_key(prompt, params)
        self.client.setex(key, self.ttl, response)
    
    def stats(self) -> dict:
        """Get cache statistics."""
        info = self.client.info("stats")
        return {
            "hits": info.get("keyspace_hits", 0),
            "misses": info.get("keyspace_misses", 0),
            "keys": self.client.dbsize(),
        }


# Usage in FastAPI endpoint
cache = ResponseCache()

@app.post("/predict")
async def predict(request: GenerateRequest):
    params = {"max_tokens": request.max_tokens, "temperature": request.temperature}
    
    # Check cache first
    cached = cache.get(request.prompt, params)
    if cached:
        return {"generated_text": cached, "cached": True}
    
    # Generate response
    response = model.generate(request.prompt, **params)
    
    # Cache the response
    cache.set(request.prompt, params, response)
    
    return {"generated_text": response, "cached": False}
'''

print("REDIS CACHE IMPLEMENTATION:")
print("=" * 60)
print(cache_code)

## Exercise 2 Solution: Add Traefik API Gateway

In [None]:
# Complete Traefik configuration with automatic service discovery

traefik_config = '''
  # ============================================
  # Traefik - API Gateway / Load Balancer
  # ============================================
  traefik:
    image: traefik:v3.0
    container_name: traefik
    command:
      # API and Dashboard
      - "--api.dashboard=true"
      - "--api.insecure=true"
      # Docker provider
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.network=ml-network"
      # Entrypoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      # Access logs
      - "--accesslog=true"
      - "--accesslog.format=json"
      # Metrics
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.entrypoint=metrics"
      - "--entrypoints.metrics.address=:8082"
    ports:
      - "80:80"       # HTTP
      - "443:443"     # HTTPS
      - "8080:8080"   # Dashboard
      - "8082:8082"   # Metrics
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik/acme.json:/acme.json
    healthcheck:
      test: ["CMD", "traefik", "healthcheck"]
      interval: 10s
      timeout: 5s
      retries: 3
    restart: unless-stopped
    networks:
      - ml-network
    labels:
      # Dashboard route
      - "traefik.enable=true"
      - "traefik.http.routers.dashboard.rule=Host(`traefik.localhost`)"
      - "traefik.http.routers.dashboard.service=api@internal"
'''

print("TRAEFIK API GATEWAY:")
print("=" * 60)
print(traefik_config)

In [None]:
# Updated inference service with Traefik labels for automatic discovery

inference_with_traefik = '''
  inference:
    image: llm-inference:latest
    container_name: llm-inference
    # No need to expose ports - Traefik handles routing
    environment:
      - MODEL_PATH=/models
      - CUDA_VISIBLE_DEVICES=0
    volumes:
      - ./models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    labels:
      # Enable Traefik routing
      - "traefik.enable=true"
      # HTTP router
      - "traefik.http.routers.inference.rule=Host(`api.localhost`) && PathPrefix(`/v1`)"
      - "traefik.http.routers.inference.entrypoints=web"
      # Service configuration
      - "traefik.http.services.inference.loadbalancer.server.port=8000"
      # Rate limiting middleware
      - "traefik.http.middlewares.inference-ratelimit.ratelimit.average=100"
      - "traefik.http.middlewares.inference-ratelimit.ratelimit.burst=50"
      - "traefik.http.routers.inference.middlewares=inference-ratelimit"
      # Retry middleware
      - "traefik.http.middlewares.inference-retry.retry.attempts=3"
      - "traefik.http.middlewares.inference-retry.retry.initialinterval=100ms"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    networks:
      - ml-network
'''

print("INFERENCE WITH TRAEFIK LABELS:")
print("=" * 60)
print(inference_with_traefik)

## Complete Production Stack

In [None]:
# Complete production docker-compose.yml

complete_stack = '''
version: '3.8'

# ==============================================
# Production ML Inference Stack
# ==============================================
# Components:
#   - Traefik (API Gateway)
#   - Inference Server (GPU-enabled)
#   - Redis (Response Cache)
#   - ChromaDB (Vector Database)
#   - Prometheus (Metrics)
#   - Grafana (Dashboards)
# ==============================================

services:
  # API Gateway
  traefik:
    image: traefik:v3.0
    container_name: traefik
    command:
      - "--api.dashboard=true"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--metrics.prometheus=true"
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - ml-network

  # LLM Inference
  inference:
    image: llm-inference:latest
    container_name: llm-inference
    environment:
      - MODEL_PATH=/models
      - REDIS_HOST=redis
      - CHROMADB_HOST=vectordb
    volumes:
      - ./models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.inference.rule=PathPrefix(`/v1`)"
      - "traefik.http.services.inference.loadbalancer.server.port=8000"
    depends_on:
      - redis
      - vectordb
    networks:
      - ml-network

  # Response Cache
  redis:
    image: redis:7-alpine
    container_name: redis
    command: redis-server --appendonly yes --maxmemory 2gb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    networks:
      - ml-network

  # Vector Database
  vectordb:
    image: chromadb/chroma:latest
    container_name: chromadb
    ports:
      - "8001:8000"
    volumes:
      - chroma_data:/chroma/chroma
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - ml-network

  # Metrics
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    networks:
      - ml-network

  # Dashboards
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus
    networks:
      - ml-network

volumes:
  redis_data:
  chroma_data:
  prometheus_data:
  grafana_data:

networks:
  ml-network:
    driver: bridge
'''

print("COMPLETE PRODUCTION STACK:")
print("=" * 60)
print(complete_stack)

# Save the file
import os
os.makedirs("../docker-examples/production-stack", exist_ok=True)
with open("../docker-examples/production-stack/docker-compose.yml", "w") as f:
    f.write(complete_stack)

print("\nSaved to: ../docker-examples/production-stack/docker-compose.yml")

---

## Summary

This solution demonstrated:

1. **Redis Cache Service**
   - Alpine-based lightweight image
   - Persistence with appendonly
   - Memory limits and LRU eviction
   - Integration with inference server

2. **Traefik API Gateway**
   - Automatic service discovery via Docker labels
   - Rate limiting middleware
   - Retry middleware
   - Prometheus metrics integration

3. **Complete Production Stack**
   - All services connected
   - Health checks throughout
   - Proper dependency ordering
   - Persistent volumes