# CI/CD for ML Projects

In this notebook, we’ll learn how to **automate ML workflows** using **Continuous Integration (CI)** and **Continuous Deployment (CD)** : a crucial part of **MLOps**.

## 🎯 Objectives
- Understand CI/CD in ML systems.
- Set up automated testing and deployment pipelines.
- Learn how to use **GitHub Actions** for ML model automation.
- Explore best practices for reliable ML CI/CD pipelines.

---

## 🚀 1. What is CI/CD?

### 🧩 Continuous Integration (CI)
- Automatically test and validate code changes.
- Ensures the project is always in a working state.
- Runs unit tests, lint checks, and style checks.

### 🌐 Continuous Deployment (CD)
- Automates the release of new model versions.
- Deploys to staging or production environments.
- Ensures smooth rollouts and rollback mechanisms.

### 🔄 CI/CD in ML Context
In ML workflows, CI/CD helps automate:
- Model training and evaluation pipelines.
- Data validation and drift detection.
- Model registry updates.
- Containerized deployment to cloud environments.

---

## 🧱 2. Example ML Project Structure

```
ml_project/
├── data/
│   └── iris.csv
├── model/
│   ├── train_model.py
│   └── predict.py
├── tests/
│   └── test_model.py
├── requirements.txt
└── .github/
    └── workflows/
        └── ci_cd.yml
```

## ⚗️ 3. Sample ML Training Script

We'll use a simple training script to generate a model file.

In [None]:
%%writefile model/train_model.py
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import joblib

def train():
    data = load_iris()
    X, y = data.data, data.target
    model = RandomForestClassifier()
    model.fit(X, y)
    joblib.dump(model, 'model/iris_model.pkl')
    print('✅ Model trained and saved successfully.')

if __name__ == '__main__':
    train()

## 🧪 4. Test Script for CI

This test ensures that the model runs and outputs predictions correctly.

In [None]:
%%writefile tests/test_model.py
import joblib
from sklearn.datasets import load_iris
import os

def test_model_training():
    assert os.path.exists('model/iris_model.pkl'), 'Model file not found!'

def test_model_prediction():
    model = joblib.load('model/iris_model.pkl')
    iris = load_iris()
    sample = iris.data[0].reshape(1, -1)
    pred = model.predict(sample)
    assert pred[0] in [0, 1, 2], 'Invalid prediction!'

## 🧩 5. Define GitHub Actions Workflow

We’ll create a **GitHub Actions YAML file** to automate CI/CD.

### File: `.github/workflows/ci_cd.yml`

```yaml
name: CI/CD for ML Project

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build-and-test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Train model
        run: |
          python model/train_model.py

      - name: Run tests
        run: |
          pytest tests/

  deploy:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy Model API (Simulated)
        run: echo '🚀 Model deployed to production environment'
```

## ⚡ 6. Running the CI/CD Pipeline

Once committed and pushed to GitHub:
1. GitHub Actions will automatically start a workflow.
2. It installs dependencies, trains the model, and runs tests.
3. If all tests pass, the deploy job executes (simulated here).

You can monitor the progress under **Actions → CI/CD for ML Project** in your repository.

## 🧰 7. Requirements File

Your `requirements.txt` should include:

```
scikit-learn==1.5.1
joblib
pytest
```

## 🧩 8. CI/CD Best Practices for ML

✅ **Data Versioning:** Use tools like DVC or Git LFS for large datasets.

✅ **Model Registry:** Automate model version control using MLflow or Weights & Biases.

✅ **Test Coverage:** Include both functional and performance tests.

✅ **Environment Reproducibility:** Pin versions in `requirements.txt`.

✅ **Deployment Automation:** Push Docker images or APIs to cloud environments (AWS, GCP, etc.).

## 📊 9. CI/CD Flow Summary

```mermaid
graph TD
A[Commit or Pull Request] --> B[CI: Build and Test]
B --> C{Tests Passed?}
C -->|No| D[Fail Pipeline ❌]
C -->|Yes| E[CD: Deploy Model 🚀]
E --> F[Production or Staging Environment]
```

---

## ✅ Summary

- Understood the role of CI/CD in ML workflows.
- Built a GitHub Actions pipeline for automation.
- Automated model training, testing, and simulated deployment.
- Followed best practices for scalable ML system delivery.

---
Next → **08-Monitoring_and_Model_Tracking.ipynb** 📈