# Lesson 4: GitHub Actions (CI for ML)

**Module 6: Monitoring & CI/CD**  
**Estimated Time**: 60 mins  
**Difficulty**: Intermediate

---

## üéØ Learning Objectives

By the end of this lesson, you will:

‚úÖ Understand the **CI/CD** philosophy for Machine Learning.  
‚úÖ Write your first **GitHub Actions Workflow** (`.yaml`).  
‚úÖ Automate **PyTest** execution on every push.  
‚úÖ Automate **Code Formatting** (Black).  

---

## üìö Table of Contents

1. [What is CI/CD for ML?](#1-ci-cd)
2. [Anatomy of a Workflow](#2-workflow)
3. [Hands-On: Creating the YAML](#3-hands-on)
4. [Interview Preparation](#4-interview)

---

### üõ†Ô∏è Setup
No pip install needed. This lesson involves writing YAML files.

## 1. What is CI/CD for ML?

- **CI (Continuous Integration)**: "I push code, and a robot tests it."  
  - Run Unit Tests (PyTest).
  - Check formatting (Black).
  - Check Data Validity (Pandas Schema).
  
- **CD (Continuous Deployment)**: "I merge code, and a robot deploys it."  
  - Build Docker Image.
  - Push to ECR.
  - Update Kubernetes Deployment.

## 2. Anatomy of a Workflow

A GitHub Action consists of:
- **Triggers** (`on: push`):
- **Jobs** (`build-and-test`):
  - **Steps**:
    - Checkout Code.
    - Install Python.
    - Install Dependencies.
    - Run Tests.

## 3. Hands-On: Creating the YAML

We will write a `ci.yaml` file simulating what you would put in `.github/workflows/`.

In [None]:
%%writefile ci_pipeline.yaml
name: ML Pipeline CI

# 1. Trigger on Push to Main
on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    # Step 1: Clone the Repo
    - uses: actions/checkout@v3

    # Step 2: Set up Python 3.9
    - name: Set up Python 3.9
      uses: actions/setup-python@v4
      with:
        python-version: "3.9"

    # Step 3: Install Dependencies
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest black ruff
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi

    # Step 4: Check Formatting
    - name: Check Code Quality (Black)
      run: |
        black src/ --check

    # Step 5: Run Tests
    - name: Run Tests with PyTest
      run: |
        pytest tests/

In [None]:
# This file is now saved in the current directory.
# To use it, you would move it to .github/workflows/ci_pipeline.yaml

## 4. Interview Preparation

**Q1: What prevents a bad model from breaking production?**  
*A1: A CI/CD pipeline. CI runs unit tests and checks code quality. CD runs integration tests and might deploy to a staging environment first for manual verification.*

**Q2: How do you handle large datasets in CI?**  
*A2: You don't. You use a small, representative subset of data (stored in Git or DVC-pulled) to run fast tests. Full training tests happen in a separate, possibly scheduled job, not on every commit.*