# Lab 4.4.2: Building ML Stacks with Docker Compose

**Module:** 4.4 - Containerization & Cloud Deployment  
**Time:** 2 hours  
**Difficulty:** ⭐⭐⭐ (Intermediate)

---

## Learning Objectives

By the end of this lab, you will:
- [ ] Understand Docker Compose for multi-container applications
- [ ] Create a complete ML stack with inference server, vector DB, and monitoring
- [ ] Configure networking between services
- [ ] Set up GPU allocation for ML services
- [ ] Implement health checks and dependencies
- [ ] Use volumes for persistent storage

---

## Prerequisites

- Completed: Lab 4.4.1 (Docker ML Image)
- Docker Compose installed
- Basic understanding of YAML

---

## Real-World Context

**Production ML systems are never just one container.**

A typical production ML system includes:
- **Inference Server**: Runs your model (GPU)
- **Vector Database**: Stores embeddings for RAG (ChromaDB, Qdrant, Milvus)
- **Monitoring**: Tracks performance (Prometheus, Grafana)
- **API Gateway**: Handles load balancing, auth (Traefik, Nginx)
- **Cache**: Reduces latency (Redis)

Docker Compose lets you define and run all these services together with a single command.

---

## ELI5: Docker Compose

> **Imagine you're running a restaurant...**
>
> You don't just have a chef. You have:
> - A chef (makes the food) - *Inference Server*
> - A pantry (stores ingredients) - *Vector Database*
> - A manager (tracks orders, timing) - *Monitoring*
> - A host (greets customers, assigns tables) - *API Gateway*
>
> **Docker Compose is like the restaurant blueprints** - it shows where each station goes, how they connect, and what each one needs to operate.
>
> With one command (`docker compose up`), the entire restaurant opens for business!

---

## Part 1: Docker Compose Fundamentals

### Compose File Structure

```yaml
version: '3.8'

services:
  service-name:
    image: image-name:tag
    ports:
      - "host:container"
    environment:
      - KEY=value
    volumes:
      - host-path:container-path
    depends_on:
      - other-service

volumes:
  named-volume:

networks:
  custom-network:
```

### Key Concepts

| Concept | Description | Example |
|---------|-------------|---------|
| **Services** | Individual containers | inference, vectordb, prometheus |
| **Ports** | Expose container ports to host | "8000:8000" |
| **Volumes** | Persist data across restarts | models:/models |
| **Networks** | Communication between services | Internal DNS |
| **depends_on** | Startup order | vectordb before inference |

In [None]:
# Let's start by checking Docker Compose is available
import subprocess
import os

def run_command(cmd):
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    return result.stdout.strip(), result.returncode

print("Docker Compose Check")
print("=" * 60)

# Check Docker Compose (v2 style)
output, code = run_command("docker compose version")
if code == 0:
    print(f"Docker Compose: {output}")
else:
    # Try v1 style
    output, code = run_command("docker-compose --version")
    if code == 0:
        print(f"Docker Compose (legacy): {output}")
    else:
        print(" Docker Compose not installed!")

print("\n" + "=" * 60)

---

## Understanding Our Docker Compose Utilities

This curriculum provides a `DockerComposeManager` class to simplify multi-container stack creation. Let's understand its API before using it.

### DockerComposeManager Class

The `DockerComposeManager` class helps generate docker-compose.yml files programmatically:

| Method | Description | Parameters |
|--------|-------------|------------|
| `add_inference_service()` | Add an ML inference container | `name`, `image`, `port`, `model_path`, `gpu=True` |
| `add_vector_db(type)` | Add a vector database | `"chromadb"`, `"qdrant"`, or `"milvus"`, `port` |
| `add_monitoring(level)` | Add monitoring stack | `"minimal"` (Prometheus) or `"full"` (+ Grafana) |
| `generate()` | Generate YAML content | Returns string |
| `save(path)` | Save to file | Path to docker-compose.yml |

### Example Usage

```python
from scripts.docker_utils import DockerComposeManager

compose = DockerComposeManager()
compose.add_inference_service(name="llm", image="my-llm:v1", port=8000, gpu=True)
compose.add_vector_db("chromadb", port=8001)
compose.add_monitoring("full", prometheus_port=9090, grafana_port=3000)
print(compose.generate())
```

---

## Part 2: Creating an ML Stack

Let's build a complete ML stack with:

1. **Inference Server** - Our LLM from Lab 4.4.1
2. **ChromaDB** - Vector database for RAG
3. **Prometheus** - Metrics collection
4. **Grafana** - Metrics visualization

### Architecture

```
                    ┌─────────────┐
                    │   Client    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Inference  │◄────── GPU
                    │   :8000     │
                    └──────┬──────┘
                           │
               ┌───────────┴───────────┐
               │                       │
        ┌──────▼──────┐         ┌──────▼──────┐
        │  ChromaDB   │         │ Prometheus  │
        │    :8001    │         │   :9090     │
        └─────────────┘         └──────┬──────┘
                                       │
                                ┌──────▼──────┐
                                │   Grafana   │
                                │    :3000    │
                                └─────────────┘
```

In [None]:
# Use our DockerComposeManager to create the stack
import sys
sys.path.insert(0, '..')

from scripts.docker_utils import DockerComposeManager

# Create compose manager
compose = DockerComposeManager()

# Add inference server (GPU-enabled)
compose.add_inference_service(
    name="inference",
    image="llm-inference:latest",
    port=8000,
    model_path="/models",
    gpu=True,
)

# Add vector database
compose.add_vector_db("chromadb", port=8001)

# Add monitoring (Prometheus + Grafana)
compose.add_monitoring("full", prometheus_port=9090, grafana_port=3000)

print("Generated docker-compose.yml:")
print("=" * 60)
print(compose.generate())
print("=" * 60)

In [None]:
# Let's create a more complete, production-ready compose file manually
# This gives us more control over the configuration

docker_compose_yaml = '''version: '3.8'

# ==============================================
# ML Inference Stack for DGX Spark
# ==============================================
# Components:
#   - Inference Server (GPU-enabled)
#   - ChromaDB (Vector Database)
#   - Prometheus (Metrics)
#   - Grafana (Dashboards)
# ==============================================

services:
  # ============================================
  # LLM Inference Server
  # ============================================
  inference:
    image: llm-inference:latest
    build:
      context: ./inference-server
      dockerfile: Dockerfile
    container_name: llm-inference
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/models
      - MODEL_NAME=gpt2
      - CUDA_VISIBLE_DEVICES=0
      - TRANSFORMERS_CACHE=/models/cache
      - CHROMADB_HOST=vectordb
      - CHROMADB_PORT=8000
    volumes:
      - ./models:/models
      - model_cache:/models/cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    depends_on:
      vectordb:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - ml-network

  # ============================================
  # ChromaDB - Vector Database
  # ============================================
  vectordb:
    image: chromadb/chroma:latest
    container_name: chromadb
    ports:
      - "8001:8000"
    environment:
      - ANONYMIZED_TELEMETRY=false
      - CHROMA_SERVER_AUTH_PROVIDER=
    volumes:
      - chroma_data:/chroma/chroma
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    networks:
      - ml-network

  # ============================================
  # Prometheus - Metrics Collection
  # ============================================
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=15d"
      - "--web.enable-lifecycle"
    restart: unless-stopped
    networks:
      - ml-network

  # ============================================
  # Grafana - Visualization
  # ============================================
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/ml-dashboard.json
    volumes:
      - grafana_data:/var/lib/grafana
      - ./monitoring/grafana/provisioning:/etc/grafana/provisioning:ro
      - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards:ro
    depends_on:
      - prometheus
    restart: unless-stopped
    networks:
      - ml-network

# ============================================
# Volumes - Persistent Storage
# ============================================
volumes:
  model_cache:
    name: ml-model-cache
  chroma_data:
    name: ml-chroma-data
  prometheus_data:
    name: ml-prometheus-data
  grafana_data:
    name: ml-grafana-data

# ============================================
# Networks
# ============================================
networks:
  ml-network:
    name: ml-inference-network
    driver: bridge
'''

# Save the compose file
os.makedirs("../docker-examples/ml-stack", exist_ok=True)

with open("../docker-examples/ml-stack/docker-compose.yml", "w") as f:
    f.write(docker_compose_yaml)

print("Created: docker-compose.yml")
print("\nThis configuration includes:")
print("  - Inference server with GPU support")
print("  - ChromaDB for vector storage")
print("  - Prometheus for metrics")
print("  - Grafana for dashboards")
print("  - Health checks for all services")
print("  - Persistent volumes for data")
print("  - Custom network for inter-service communication")

---

## Part 3: Prometheus Configuration

Prometheus needs a configuration file to know what to scrape.

In [None]:
# Create Prometheus configuration
prometheus_config = '''# Prometheus Configuration for ML Stack
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    environment: development
    stack: ml-inference

# Alerting configuration (optional)
alerting:
  alertmanagers:
    - static_configs:
        - targets: []

# Scrape configurations
scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    metrics_path: /metrics

  # LLM Inference Server
  - job_name: 'inference-server'
    static_configs:
      - targets: ['inference:8000']
    metrics_path: /metrics
    scrape_interval: 10s
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'llm-inference'

  # ChromaDB metrics (if exposed)
  - job_name: 'chromadb'
    static_configs:
      - targets: ['vectordb:8000']
    metrics_path: /api/v1/metrics
    scrape_interval: 30s

  # GPU metrics (NVIDIA DCGM exporter if running)
  - job_name: 'gpu-metrics'
    static_configs:
      - targets: ['host.docker.internal:9400']
    scrape_interval: 15s
'''

# Create monitoring directory structure
os.makedirs("../docker-examples/ml-stack/monitoring", exist_ok=True)

with open("../docker-examples/ml-stack/monitoring/prometheus.yml", "w") as f:
    f.write(prometheus_config)

print("Created: monitoring/prometheus.yml")
print("\nPrometheus will scrape:")
print("  - Itself (self-monitoring)")
print("  - Inference server metrics")
print("  - ChromaDB metrics")
print("  - GPU metrics (if DCGM exporter is running)")

In [None]:
# Create Grafana provisioning configuration
os.makedirs("../docker-examples/ml-stack/monitoring/grafana/provisioning/datasources", exist_ok=True)
os.makedirs("../docker-examples/ml-stack/monitoring/grafana/provisioning/dashboards", exist_ok=True)
os.makedirs("../docker-examples/ml-stack/monitoring/grafana/dashboards", exist_ok=True)

# Datasource configuration
datasource_config = '''apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false
'''

with open("../docker-examples/ml-stack/monitoring/grafana/provisioning/datasources/datasources.yml", "w") as f:
    f.write(datasource_config)

# Dashboard provisioning
dashboard_provisioning = '''apiVersion: 1

providers:
  - name: 'ML Dashboards'
    orgId: 1
    folder: 'ML Monitoring'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    options:
      path: /var/lib/grafana/dashboards
'''

with open("../docker-examples/ml-stack/monitoring/grafana/provisioning/dashboards/dashboards.yml", "w") as f:
    f.write(dashboard_provisioning)

print("Created Grafana provisioning configuration")

In [None]:
# Create a simple ML monitoring dashboard for Grafana
import json

ml_dashboard = {
    "annotations": {"list": []},
    "editable": True,
    "fiscalYearStartMonth": 0,
    "graphTooltip": 0,
    "id": None,
    "links": [],
    "liveNow": False,
    "panels": [
        {
            "id": 1,
            "title": "Inference Requests/sec",
            "type": "timeseries",
            "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
            "targets": [
                {
                    "expr": 'rate(http_requests_total{job="inference-server"}[5m])',
                    "legendFormat": "{{method}} {{path}}",
                }
            ],
        },
        {
            "id": 2,
            "title": "Response Latency (p99)",
            "type": "timeseries",
            "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
            "targets": [
                {
                    "expr": 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{job="inference-server"}[5m]))',
                    "legendFormat": "p99 latency",
                }
            ],
        },
        {
            "id": 3,
            "title": "GPU Memory Usage",
            "type": "gauge",
            "gridPos": {"h": 8, "w": 8, "x": 0, "y": 8},
            "targets": [
                {
                    "expr": 'DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_TOTAL * 100',
                    "legendFormat": "GPU Memory %",
                }
            ],
            "fieldConfig": {
                "defaults": {
                    "max": 100,
                    "min": 0,
                    "unit": "percent",
                    "thresholds": {
                        "mode": "absolute",
                        "steps": [
                            {"color": "green", "value": None},
                            {"color": "yellow", "value": 70},
                            {"color": "red", "value": 90},
                        ]
                    }
                }
            }
        },
        {
            "id": 4,
            "title": "GPU Utilization",
            "type": "gauge",
            "gridPos": {"h": 8, "w": 8, "x": 8, "y": 8},
            "targets": [
                {
                    "expr": 'DCGM_FI_DEV_GPU_UTIL',
                    "legendFormat": "GPU Util %",
                }
            ],
        },
        {
            "id": 5,
            "title": "Tokens Generated/sec",
            "type": "timeseries",
            "gridPos": {"h": 8, "w": 8, "x": 16, "y": 8},
            "targets": [
                {
                    "expr": 'rate(tokens_generated_total{job="inference-server"}[5m])',
                    "legendFormat": "tokens/sec",
                }
            ],
        },
    ],
    "refresh": "5s",
    "schemaVersion": 38,
    "style": "dark",
    "tags": ["ml", "inference"],
    "templating": {"list": []},
    "time": {"from": "now-1h", "to": "now"},
    "timepicker": {},
    "timezone": "",
    "title": "ML Inference Dashboard",
    "uid": "ml-inference-dashboard",
    "version": 1,
    "weekStart": ""
}

with open("../docker-examples/ml-stack/monitoring/grafana/dashboards/ml-dashboard.json", "w") as f:
    json.dump(ml_dashboard, f, indent=2)

print("Created: monitoring/grafana/dashboards/ml-dashboard.json")
print("\nDashboard includes:")
print("  - Request rate graph")
print("  - Latency percentiles")
print("  - GPU memory gauge")
print("  - GPU utilization gauge")
print("  - Token generation rate")

---

## Part 4: Service Dependencies and Health Checks

### ELI5: Why Dependencies Matter

> **Imagine making a sandwich...**
>
> You can't put on the top bread before the fillings. Order matters!
>
> In Docker Compose:
> - **depends_on** = "Don't start me until these services are running"
> - **condition: service_healthy** = "Wait until they pass health checks"
>
> Without proper dependencies, your inference server might crash trying to connect to a database that isn't ready yet!

In [None]:
# Let's look at dependency patterns in detail

dependency_examples = '''
# ============================================
# Dependency Patterns in Docker Compose
# ============================================

# Pattern 1: Simple dependency (just wait for container to start)
inference:
  depends_on:
    - vectordb

# Pattern 2: Wait for service to be healthy (RECOMMENDED)
inference:
  depends_on:
    vectordb:
      condition: service_healthy
    prometheus:
      condition: service_started

# Pattern 3: Multiple dependencies with mixed conditions
api-gateway:
  depends_on:
    inference:
      condition: service_healthy
    auth-service:
      condition: service_healthy
    cache:
      condition: service_started

# Health check best practices:
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s      # How often to check
  timeout: 10s       # How long to wait for response
  retries: 3         # How many failures before unhealthy
  start_period: 60s  # Grace period for slow-starting services (like ML models)
'''

print(dependency_examples)

---

## Part 5: GPU Allocation in Docker Compose

### GPU Configuration Options

```yaml
# Option 1: Use all GPUs
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

# Option 2: Specific number of GPUs
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

# Option 3: Specific GPU by ID
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0', '1']
          capabilities: [gpu]
```

### DGX Spark Consideration

DGX Spark has **one** powerful Blackwell GPU with 128GB unified memory. You'll typically use:
```yaml
count: 1  # or just 'all' since there's only one
```

---

## Part 6: Networking Between Services

### How Services Talk to Each Other

Docker Compose creates a private network where:
- Services can reach each other by **name** (DNS)
- No need for IP addresses
- Ports are only exposed to host if specified

```
inference container:  http://vectordb:8000  (internal)
host machine:         http://localhost:8001 (exposed port)
```

In [None]:
# Create a helper script to test the stack
test_script = '''#!/bin/bash
# Test script for ML Stack

echo "=================================================="
echo "Testing ML Inference Stack"
echo "=================================================="

# Wait for services to be ready
echo "\nWaiting for services to start..."
sleep 5

# Test inference server health
echo "\n[1/4] Testing Inference Server..."
curl -s http://localhost:8000/health | python3 -m json.tool

# Test ChromaDB
echo "\n[2/4] Testing ChromaDB..."
curl -s http://localhost:8001/api/v1/heartbeat | python3 -m json.tool

# Test Prometheus
echo "\n[3/4] Testing Prometheus..."
curl -s http://localhost:9090/-/healthy
echo ""

# Test Grafana
echo "\n[4/4] Testing Grafana..."
curl -s http://localhost:3000/api/health | python3 -m json.tool

echo "\n=================================================="
echo "Stack Test Complete!"
echo "=================================================="
echo "\nAccess URLs:"
echo "  - Inference API:  http://localhost:8000/docs"
echo "  - ChromaDB:       http://localhost:8001"
echo "  - Prometheus:     http://localhost:9090"
echo "  - Grafana:        http://localhost:3000 (admin/admin)"
'''

with open("../docker-examples/ml-stack/test-stack.sh", "w") as f:
    f.write(test_script)

os.chmod("../docker-examples/ml-stack/test-stack.sh", 0o755)

print("Created: test-stack.sh")
print("\nTo run the full stack:")
print("  cd ../docker-examples/ml-stack")
print("  docker compose up -d")
print("  ./test-stack.sh")

In [None]:
# Create a README for the stack
readme = '''# ML Inference Stack

A complete ML inference stack for DGX Spark with:
- LLM Inference Server (GPU-enabled)
- ChromaDB Vector Database
- Prometheus Monitoring
- Grafana Dashboards

## Quick Start

```bash
# Start the stack
docker compose up -d

# View logs
docker compose logs -f

# Check status
docker compose ps

# Stop the stack
docker compose down
```

## Services

| Service | Port | Description |
|---------|------|-------------|
| Inference | 8000 | LLM API server |
| ChromaDB | 8001 | Vector database |
| Prometheus | 9090 | Metrics |
| Grafana | 3000 | Dashboards |

## API Endpoints

### Inference Server

- `GET /health` - Health check
- `POST /predict` - Text generation
- `POST /chat` - Chat completion
- `GET /docs` - API documentation

### Example Usage

```bash
# Generate text
curl -X POST http://localhost:8000/predict \\
  -H "Content-Type: application/json" \\
  -d '{"prompt": "Hello, how are you?", "max_tokens": 100}'

# Chat
curl -X POST http://localhost:8000/chat \\
  -H "Content-Type: application/json" \\
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

## Configuration

Environment variables for the inference server:

- `MODEL_PATH` - Path to model weights
- `MODEL_NAME` - HuggingFace model ID
- `CUDA_VISIBLE_DEVICES` - GPU selection

## Monitoring

Access Grafana at http://localhost:3000 (admin/admin)

Pre-configured dashboards:
- ML Inference Dashboard

## Volumes

Persistent data is stored in Docker volumes:
- `ml-model-cache` - Model weights cache
- `ml-chroma-data` - Vector database
- `ml-prometheus-data` - Metrics history
- `ml-grafana-data` - Dashboard configs

## Troubleshooting

### GPU not detected
```bash
# Verify NVIDIA Container Toolkit
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
```

### Service not healthy
```bash
# Check logs
docker compose logs inference

# Restart specific service
docker compose restart inference
```
'''

with open("../docker-examples/ml-stack/README.md", "w") as f:
    f.write(readme)

print("Created: README.md")

---

## Project Structure

Let's see the complete structure we've created:

In [None]:
# Show the final project structure
print("ML Stack Project Structure:")
print("=" * 60)
!find ../docker-examples/ml-stack -type f | sort | head -20
print("=" * 60)

---

## Common Mistakes

### Mistake 1: Not Using Health Checks with depends_on

```yaml
# BAD - Service might start before database is ready
inference:
  depends_on:
    - vectordb

# GOOD - Wait for database to be healthy
inference:
  depends_on:
    vectordb:
      condition: service_healthy
```

---

### Mistake 2: Exposing All Ports

```yaml
# BAD - Exposes internal service to host
internal-service:
  ports:
    - "5432:5432"  # Now accessible from outside!

# GOOD - Only expose what's needed
internal-service:
  # No ports exposed - only accessible within network
  networks:
    - internal
```

---

### Mistake 3: Not Using Named Volumes

```yaml
# BAD - Data lost on container removal
volumes:
  - ./data:/data

# GOOD - Data persists in named volume
volumes:
  - app_data:/data

volumes:
  app_data:
```

---

## Try It Yourself

### Exercise 1: Add Redis Cache

Add a Redis service to the stack for caching inference results.

<details>
<summary>Hint</summary>
Use the official redis:alpine image and expose port 6379.
</details>

In [None]:
# TODO: Add Redis service configuration
redis_config = '''
# Add Redis to docker-compose.yml
redis:
  # TODO: Complete this configuration
'''

print(redis_config)

### Exercise 2: Add Traefik as API Gateway

Add Traefik to handle load balancing and SSL termination.

<details>
<summary>Hint</summary>
Traefik can automatically discover services using Docker labels.
</details>

---

## Checkpoint

You've learned:
- How to create multi-container ML stacks with Docker Compose
- Service dependencies and health checks
- GPU allocation for ML workloads
- Networking between containers
- Prometheus and Grafana for monitoring
- Best practices for production deployments

---

## Challenge (Optional)

Create a complete RAG (Retrieval-Augmented Generation) stack with:
1. Document ingestion service
2. Embedding service (GPU)
3. ChromaDB for vector storage
4. LLM inference service (GPU)
5. API gateway with rate limiting
6. Full monitoring

---

## Further Reading

- [Docker Compose Documentation](https://docs.docker.com/compose/)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/)
- [Prometheus Monitoring](https://prometheus.io/docs/)
- [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards/)

---

## Cleanup

In [None]:
# To clean up the stack:
print("To stop and clean up the stack:")
print("  cd ../docker-examples/ml-stack")
print("  docker compose down -v  # -v removes volumes too")
print("\nTo remove all related images:")
print("  docker compose down --rmi all")