# Module 8: Packaging and Containerization with Docker

**Course**: End-to-End Machine Learning (Datacamp)  
**Case Study**: CardioCare Heart Disease Prediction  
**Author**: Seif

---

## Why containerize?

Containers package your app + its dependencies into a portable, reproducible unit that runs the same across environments.
- Reproducibility and isolation
- Fast deploys and consistent runtime
- Easy to integrate with CI/CD

## What we'll build

We'll create a tiny Flask service `app.py` with a `/predict` endpoint that accepts a JSON payload with a few heart-disease-style features (cp, thalach, ca, thal) and returns a dummy risk prediction. Then we'll containerize it with a Dockerfile, and show how to build/run the image locally.

In [None]:
# Write a minimal Flask service to app.py
app_code = '''
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.get('/health')
def health():
    return {'status': 'ok'}

def simple_rule_predict(payload):
    # Very simple rule-based demo: NOT A REAL MODEL!
    # risk = 1 if (thalach < 150) or (ca >= 2) or (cp >= 2), else 0
    cp = int(payload.get('cp', 0))
    thalach = int(payload.get('thalach', 190))
    ca = int(payload.get('ca', 0))
    thal = int(payload.get('thal', 0))  # unused in this rule, but included
    risk = 1 if (thalach < 150) or (ca >= 2) or (cp >= 2) else 0
    return risk

@app.post('/predict')
def predict():
    try:
        payload = request.get_json(force=True, silent=False) or {}
    except Exception as e:
        return jsonify({'error': f'Invalid JSON: {str(e)}'}), 400

    missing = [k for k in ['cp','thalach','ca','thal'] if k not in payload]
    if missing:
        return jsonify({'error': f'Missing keys: {missing}'}), 400

    pred = simple_rule_predict(payload)
    return jsonify({'prediction': int(pred)})

if __name__ == '__main__':
    # Default Flask dev server (for demo purposes). In production, use gunicorn/uvicorn.
    app.run(host='0.0.0.0', port=8000, debug=False)
'''

with open('app.py', 'w', encoding='utf-8') as f:
    f.write(app_code)
print('Wrote app.py')

## Dockerfile

We'll use the official Python slim image, copy our code, install Flask, and expose port 8000.

In [None]:
# Write a Dockerfile to the project root
dockerfile = '''
# syntax=docker/dockerfile:1.7-labs
FROM python:3.11-slim AS base

# Security/size best practices: no cache, no extra packages
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# Optionally copy a requirements file and install
# If you have a project requirements.txt, uncomment the next two lines and remove the RUN pip install flask below
# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt

# For this demo, we only need Flask
RUN pip install --no-cache-dir flask

# Copy the app code
COPY app.py ./

# Health and runtime config
EXPOSE 8000
ENV APP_ENV=production

# Run the service
CMD ["python", "app.py"]
'''

with open('Dockerfile', 'w', encoding='utf-8') as f:
    f.write(dockerfile)
print('Wrote Dockerfile')

## Build and run (PowerShell)

```powershell
# Build the image from the Dockerfile in the project root
docker build -t heart_disease_model:latest .

# Run the container, mapping port 8000
docker run --rm -p 8000:8000 -e APP_ENV=production heart_disease_model:latest
```

In another terminal, test the endpoint:

```powershell
$body = @{ cp = 2; thalach = 140; ca = 1; thal = 2 } | ConvertTo-Json
Invoke-RestMethod -Uri http://localhost:8000/predict -Method Post -ContentType 'application/json' -Body $body | ConvertTo-Json
```

## Tagging images

```powershell
docker tag heart_disease_model:latest heart_disease_model:v1
```

Push to a registry (example):

```powershell
docker tag heart_disease_model:latest <your-registry>/heart_disease_model:latest
docker push <your-registry>/heart_disease_model:latest
```

## Best practices & security

- Use trusted base images and pin versions
- Keep images small (slim base, no-cache, remove build deps)
- Don't bake secrets into images; pass via env vars or secret managers
- Consider a non-root user in containers for production
- Use multi-stage builds for compiled deps

In [None]:
# Write a .dockerignore to keep images smaller and safer
contents = """
# Bytecode and caches
__pycache__/
*.pyc
*.pyo

# Venvs
.venv/
venv/

# VCS
.git/
.gitignore

# Jupyter
.ipynb_checkpoints/
notebooks/

# Local data and artifacts
mlruns/
data/
*.parquet
*.csv

# OS/editor files
.DS_Store
Thumbs.db
.vscode/

# Tests (optional exclude)
tests/
"""
with open('.dockerignore', 'w', encoding='utf-8') as f:
    f.write(contents)
print('Wrote .dockerignore')

## Load a real model: MLflow or local joblib

We'll create a production-ready variant `app_model.py` that tries to:
- Load a model from MLflow using `MLFLOW_MODEL_URI` (e.g., `models:/CardioCareHeartDiseaseLR/Production`)
- Otherwise load a local `model.joblib`
- Otherwise fall back to the simple rule from earlier

It accepts JSON with keys `cp`, `thalach`, `ca`, `thal` and returns predictions.


In [None]:
# Write app_model.py that loads a model from MLflow or joblib
app_code = '''
from flask import Flask, request, jsonify
import os
import json
import numpy as np
import pandas as pd

MLFLOW_MODEL_URI = os.getenv("MLFLOW_MODEL_URI")
MODEL_PATH = os.getenv("MODEL_PATH", "model.joblib")

_mlflow_model = None
_sklearn_model = None

# Try MLflow first
if MLFLOW_MODEL_URI:
    try:
        import mlflow.pyfunc
        _mlflow_model = mlflow.pyfunc.load_model(MLFLOW_MODEL_URI)
        print(f"Loaded MLflow model from {MLFLOW_MODEL_URI}")
    except Exception as e:
        print(f"Could not load MLflow model: {e}")

# Try local joblib if MLflow not available
if _mlflow_model is None and os.path.exists(MODEL_PATH):
    try:
        import joblib
        _sklearn_model = joblib.load(MODEL_PATH)
        print(f"Loaded local model from {MODEL_PATH}")
    except Exception as e:
        print(f"Could not load local joblib model: {e}")

app = Flask(__name__)

@app.get('/health')
def health():
    return {'status': 'ok'}

FEATURES = ["cp", "thalach", "ca", "thal"]

def _predict_payloads(records):
    if _mlflow_model is not None:
        # MLflow models generally accept DataFrame inputs
        df = pd.DataFrame(records)
        preds = _mlflow_model.predict(df)
        # Ensure JSON-serializable
        try:
            arr = np.array(preds).astype(int).tolist()
        except Exception:
            arr = np.array(preds).tolist()
        return arr
    elif _sklearn_model is not None:
        X = []
        for r in records:
            X.append([
                int(r.get('cp', 0)),
                int(r.get('thalach', 190)),
                int(r.get('ca', 0)),
                int(r.get('thal', 0)),
            ])
        X = np.array(X)
        preds = _sklearn_model.predict(X)
        return np.array(preds).astype(int).tolist()
    else:
        # Fallback to simple rule
        out = []
        for r in records:
            cp = int(r.get('cp', 0))
            thalach = int(r.get('thalach', 190))
            ca = int(r.get('ca', 0))
            # thal unused in this toy rule
            risk = 1 if (thalach < 150) or (ca >= 2) or (cp >= 2) else 0
            out.append(risk)
        return out

@app.post('/predict')
def predict():
    try:
        payload = request.get_json(force=True, silent=False)
    except Exception as e:
        return jsonify({'error': f'Invalid JSON: {e}'}), 400

    if payload is None:
        return jsonify({'error': 'Empty payload'}), 400

    # Accept a single object or a list of objects
    if isinstance(payload, dict):
        records = [payload]
    elif isinstance(payload, list):
        records = payload
    else:
        return jsonify({'error': 'Payload must be an object or array of objects'}), 400

    # Basic key check
    missing = [k for k in FEATURES if k not in records[0]]
    if missing:
        return jsonify({'error': f'Missing keys (example record): {missing}'}), 400

    preds = _predict_payloads(records)
    # Return single or list depending on input
    if isinstance(payload, dict):
        return jsonify({'prediction': int(preds[0])})
    return jsonify({'predictions': preds})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000, debug=False)
'''

with open('app_model.py', 'w', encoding='utf-8') as f:
    f.write(app_code)
print('Wrote app_model.py')

## Dockerfile for model-serving (requirements + MLflow/joblib)

We'll write `Dockerfile.model` which:
- Installs your repo `requirements.txt` (includes scikit-learn and mlflow)
- Copies `app_model.py`
- Optionally copies a `model.joblib` if you add it to the project root

Build and run examples are provided after this cell.


In [None]:
# Write Dockerfile.model that serves app_model.py
content = '''
# syntax=docker/dockerfile:1.7-labs
FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# Install requirements (includes mlflow, scikit-learn, etc.)
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy the serving app and (optionally) a local model artifact
COPY app_model.py ./
# If you have a model file in the project root, uncomment next line
# COPY model.joblib ./model.joblib

EXPOSE 8000

# Config via env vars (set at runtime)
# ENV MLFLOW_TRACKING_URI="http://127.0.0.1:5000"
# ENV MLFLOW_MODEL_URI="models:/CardioCareHeartDiseaseLR/Production"

CMD ["python", "app_model.py"]
'''
with open('Dockerfile.model', 'w', encoding='utf-8') as f:
    f.write(content)
print('Wrote Dockerfile.model')

## Build and run the model-serving image (PowerShell)

```powershell
# Build from Dockerfile.model
docker build -f Dockerfile.model -t heart_disease_model:mlflow .

# Option A: Run with MLflow model registry (requires running tracking server or Databricks, etc.)
# Replace the MLFLOW_MODEL_URI with your own (e.g., a registered model + stage)
docker run --rm -p 8000:8000 `
  -e MLFLOW_TRACKING_URI="http://127.0.0.1:5000" `
  -e MLFLOW_MODEL_URI="models:/CardioCareHeartDiseaseLR/Production" `
  heart_disease_model:mlflow

# Option B: Run with a local joblib model (place model.joblib in project root and uncomment COPY in Dockerfile.model)
# docker run --rm -p 8000:8000 heart_disease_model:mlflow

# Test the endpoint (single record)
$body = @{ cp = 2; thalach = 140; ca = 1; thal = 2 } | ConvertTo-Json
Invoke-RestMethod -Uri http://localhost:8000/predict -Method Post -ContentType 'application/json' -Body $body | ConvertTo-Json

# Test the endpoint (batch)
$batch = @(
  @{ cp = 0; thalach = 170; ca = 0; thal = 2 },
  @{ cp = 3; thalach = 130; ca = 2; thal = 3 }
) | ConvertTo-Json
Invoke-RestMethod -Uri http://localhost:8000/predict -Method Post -ContentType 'application/json' -Body $batch | ConvertTo-Json
```