# Module 9: CI/CD for ML Deployment

**Course**: End-to-End Machine Learning (Datacamp)  
**Case Study**: CardioCare Heart Disease Prediction  
**Author**: Seif

---

## Overview

- What CI/CD is and why it matters in ML
- CI: integrate changes often, run automated tests, block regressions
- CD: automatically promote artifacts to environments when checks pass
- Practical: create a GitHub Actions workflow, a model validation gate, and outline AWS Elastic Beanstalk (EB) deployment steps

## CI/CD principles in ML

- Commit frequently and run tests automatically (unit, integration, notebook smoke tests)
- Rebuild and scan container images on changes
- Validate model performance on new data; halt if below threshold
- Promote only artifacts that pass quality gates (to Staging/Prod)

## Model validation gate (performance threshold)

We'll create a tiny script `scripts/validate_model.py` that exits non-zero if a given metric is below a threshold. A CI job can run this step to block deployment.

In [None]:
# Write a simple validation script that gates deployment by metric threshold
import os, json, textwrap, pathlib
pathlib.Path('scripts').mkdir(parents=True, exist_ok=True)
code = '''
import os, sys, json

def main():
    # Read metric from env or a JSON file path
    threshold = float(os.getenv('METRIC_THRESHOLD', '0.80'))
    metric = os.getenv('MODEL_METRIC')
    metrics_file = os.getenv('METRICS_JSON')

    if metrics_file and os.path.exists(metrics_file):
        with open(metrics_file, 'r', encoding='utf-8') as f:
            data = json.load(f)
            metric = data.get('f1') or data.get('accuracy')

    if metric is None:
        print('No metric provided; set MODEL_METRIC or METRICS_JSON.')
        sys.exit(1)

    metric = float(metric)
    print(f'Model metric: {metric} (threshold: {threshold})')
    if metric < threshold:
        print('Threshold not met. Failing.')
        sys.exit(1)
    print('Threshold met. Passing.')

if __name__ == '__main__':
    main()
'''
with open('scripts/validate_model.py', 'w', encoding='utf-8') as f:
    f.write(code)
print('Wrote scripts/validate_model.py')

## GitHub Actions: CI workflow

We'll generate a minimal workflow that:
- Sets up Python
- Installs dependencies
- Runs unit tests (if any)
- Runs the model validation gate (using an example env metric)
- Builds a Docker image (no push)

You can extend this to push images to a registry and trigger EB deploys on release tags.

In [None]:
# Write .github/workflows/ci.yml
import os, pathlib
pathlib.Path('.github/workflows').mkdir(parents=True, exist_ok=True)
workflow = '''
name: CI
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
          pip install pytest
      - name: Run unit tests (if any)
        run: |
          if [ -d tests ]; then pytest -q; else echo 'No tests folder'; fi
      - name: Model validation gate
        env:
          MODEL_METRIC: '0.85'   # Example value; replace with real metric from a prior job or artifact
          METRIC_THRESHOLD: '0.80'
        run: |
          python scripts/validate_model.py
  build-image:
    runs-on: ubuntu-latest
    needs: [ test ]
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: |
          docker build -t cardiocare/heart_disease_model:ci .
'''
with open('.github/workflows/ci.yml', 'w', encoding='utf-8') as f:
    f.write(workflow)
print('Wrote .github/workflows/ci.yml')

## AWS Elastic Beanstalk (EB) basics

EB can deploy Dockerized apps with simple commands (after installing the EB CLI and configuring AWS credentials).

PowerShell example (run from project root):

```powershell
# One-time: initialize EB app (choose Docker platform when prompted)
eb init

# Create an environment (e.g., staging)
eb create cardiocare-staging

# Deploy current version
eb deploy

# Open the app URL in your browser
eb open
```

Tip: For full automation, wire these commands into release workflows and use environment variables/secrets for credentials.

## Alternatives

- Azure Machine Learning: real-time endpoints, managed compute, monitoring
- Google App Engine / Cloud Run: simple managed container hosting
- Kubernetes: advanced orchestration (AKS/EKS/GKE), higher control and complexity

## Best practices

- Keep pipelines fast and deterministic; fail fast on test/quality gates
- Separate CI (test/build) from CD (promote/deploy) with clear approvals
- Store artifacts (models, images) with versioning and metadata
- Track lineage from data → features → model → image → deployment