# GitHub Actions: Practical Training

This notebook provides a practical guide to GitHub Actions, focusing on two key workflows:
1. Building and pushing Docker images to registries
2. Setting up environments for running AI models and plotting their outputs

GitHub Actions is a powerful CI/CD platform that allows you to automate various workflows directly from your GitHub repository.

## What are GitHub Actions?

GitHub Actions is a CI/CD (Continuous Integration and Continuous Deployment) platform that allows you to automate your build, test, and deployment pipeline. It provides:

1. **Workflows**: YAML files that define automation processes
2. **Events**: Triggers that start workflows (push, pull request, schedule, etc.)
3. **Jobs**: Sets of steps that execute on the same runner
4. **Steps**: Individual tasks that run commands or actions
5. **Actions**: Reusable units of code that can be shared
6. **Runners**: Servers that run your workflows (GitHub-hosted or self-hosted)

## Basic GitHub Actions Workflow Structure

```yaml
name: My Workflow

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Run a one-line script
        run: echo Hello, world!
```

Now, let's dive into our specific use cases.

## Use Case 1: Building and Pushing Docker Images

This workflow demonstrates how to:
1. Build a Docker image from your repository
2. Test the built image
3. Push the image to a Docker registry (Docker Hub or GitHub Container Registry)

### Step 1: Create a Dockerfile

First, you need a Dockerfile in your repository. Here's a simple example:

```dockerfile
# Example Dockerfile for a Python application

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Set a default command
CMD ["python", "app.py"]
```

### Step 2: Create GitHub Actions Workflow for Docker

Create a file at `.github/workflows/docker-build.yml` with the following content:

```yaml
name: Build and Push Docker Image

on:
  push:
    branches: [ main ]
    # Optionally trigger on tags to create versioned releases
    tags: [ 'v*' ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      
      # Log in to Docker Hub
      - name: Login to Docker Hub
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      
      # Extract metadata for the Docker image
      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: username/my-application
          # Generate tags based on the following events
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha
      
      # Build and push the Docker image
      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
```

### Setting Up the Required Secrets

For the workflow above, you need to add secrets to your repository:

1. Go to your repository on GitHub
2. Navigate to "Settings" > "Secrets and variables" > "Actions"
3. Add the following secrets:
   - `DOCKER_USERNAME`: Your Docker Hub username
   - `DOCKER_PASSWORD`: Your Docker Hub access token (not your account password)

### Alternative: Using GitHub Container Registry

If you prefer to use GitHub Container Registry (ghcr.io) instead of Docker Hub:

```yaml
# Replace the Docker Hub login step with:
- name: Login to GitHub Container Registry
  if: github.event_name != 'pull_request'
  uses: docker/login-action@v2
  with:
    registry: ghcr.io
    username: ${{ github.repository_owner }}
    password: ${{ secrets.GITHUB_TOKEN }}

# And update the images in metadata:
images: ghcr.io/${{ github.repository }}
```

No additional secrets are required for GitHub Container Registry as `GITHUB_TOKEN` is automatically provided.

## Use Case 2: Environment for Running AI Models and Plotting Outputs

This workflow demonstrates how to:
1. Set up a Python environment with ML libraries
2. Run AI model training or inference
3. Generate and save plots/visualizations
4. Upload results as artifacts

### Step 1: Create GitHub Actions Workflow for AI Model Training

Create a file at `.github/workflows/ai-model-training.yml` with the following content:

```yaml
name: AI Model Training and Visualization

on:
  push:
    branches: [ main ]
    paths:
      - 'model/**'
      - 'data/**'
  workflow_dispatch:  # Allows manual triggering
    inputs:
      epochs:
        description: 'Number of training epochs'
        required: true
        default: '10'
      batch_size:
        description: 'Batch size for training'
        required: true
        default: '32'

jobs:
  train:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'
      
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          # Install additional visualization libraries
          pip install matplotlib seaborn plotly
      
      - name: Download dataset (if needed)
        run: |
          mkdir -p data
          # Example: Download a dataset from a public source
          # wget -O data/dataset.csv https://example.com/dataset.csv
          echo "Using existing dataset or downloading as needed"
      
      - name: Train model
        run: |
          # Set epochs from workflow input or use default
          EPOCHS=${{ github.event.inputs.epochs || '10' }}
          BATCH_SIZE=${{ github.event.inputs.batch_size || '32' }}
          
          # Run training script with parameters
          python model/train.py --epochs $EPOCHS --batch_size $BATCH_SIZE --output_dir ./output
      
      - name: Generate visualizations
        run: |
          # Run script to generate plots
          python model/visualize.py --model_dir ./output --plots_dir ./plots
      
      - name: Archive model artifacts
        uses: actions/upload-artifact@v3
        with:
          name: model-artifacts
          path: |
            output/*.h5
            output/*.pkl
            output/metrics.json
      
      - name: Archive plots
        uses: actions/upload-artifact@v3
        with:
          name: training-plots
          path: plots/
```

### Step 2: Example Python Scripts

To complement the GitHub Actions workflow, here are example Python scripts for training and visualization:

In [None]:
# Example: model/train.py
import argparse
import json
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras

def main():
    # Parse command line arguments
    parser = argparse.ArgumentParser(description='Train a simple neural network')
    parser.add_argument('--epochs', type=int, default=10, help='Number of training epochs')
    parser.add_argument('--batch_size', type=int, default=32, help='Batch size for training')
    parser.add_argument('--output_dir', type=str, default='./output', help='Directory to save model and outputs')
    args = parser.parse_args()
    
    # Create output directory if it doesn't exist
    os.makedirs(args.output_dir, exist_ok=True)
    
    # Load and preprocess data (example with MNIST)
    print("Loading dataset...")
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_train = x_train.astype("float32") / 255.0
    x_test = x_test.astype("float32") / 255.0
    
    # Build a simple model
    print("Building model...")
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(28, 28)),
        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Create callbacks
    callbacks = [
        tf.keras.callbacks.ModelCheckpoint(
            filepath=os.path.join(args.output_dir, 'model_checkpoint.h5'),
            save_best_only=True,
            monitor='val_accuracy'
        ),
        tf.keras.callbacks.CSVLogger(os.path.join(args.output_dir, 'training_log.csv'))
    ]
    
    # Train the model
    print(f"Training model for {args.epochs} epochs with batch size {args.batch_size}...")
    history = model.fit(
        x_train, y_train,
        epochs=args.epochs,
        batch_size=args.batch_size,
        validation_split=0.1,
        callbacks=callbacks
    )
    
    # Evaluate the model
    print("Evaluating model...")
    test_scores = model.evaluate(x_test, y_test, verbose=2)
    print(f"Test loss: {test_scores[0]}")
    print(f"Test accuracy: {test_scores[1]}")
    
    # Save the final model
    model.save(os.path.join(args.output_dir, 'final_model.h5'))
    print(f"Saved model to {args.output_dir}/final_model.h5")
    
    # Save training history and test metrics
    metrics = {
        'test_loss': float(test_scores[0]),
        'test_accuracy': float(test_scores[1]),
        'training_history': {
            'accuracy': [float(x) for x in history.history['accuracy']],
            'val_accuracy': [float(x) for x in history.history['val_accuracy']],
            'loss': [float(x) for x in history.history['loss']],
            'val_loss': [float(x) for x in history.history['val_loss']]
        }
    }
    
    with open(os.path.join(args.output_dir, 'metrics.json'), 'w') as f:
        json.dump(metrics, f, indent=2)
    
    print("Training complete!")

if __name__ == "__main__":
    main()

In [None]:
# Example: model/visualize.py
import argparse
import json
import os
import matplotlib
matplotlib.use('Agg')  # Use non-interactive backend
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import tensorflow as tf
from tensorflow import keras

def plot_training_history(history_dict, output_dir):
    """Plot training and validation metrics."""
    # Plot accuracy
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(history_dict['accuracy'], label='Training Accuracy')
    plt.plot(history_dict['val_accuracy'], label='Validation Accuracy')
    plt.title('Model Accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend()
    
    # Plot loss
    plt.subplot(1, 2, 2)
    plt.plot(history_dict['loss'], label='Training Loss')
    plt.plot(history_dict['val_loss'], label='Validation Loss')
    plt.title('Model Loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend()
    
    plt.tight_layout()
    plt.savefig(os.path.join(output_dir, 'training_history.png'))
    plt.close()

def plot_confusion_matrix(model, output_dir):
    """Generate and plot confusion matrix."""
    # Load test data
    (_, _), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_test = x_test.astype("float32") / 255.0
    
    # Get predictions
    y_pred = np.argmax(model.predict(x_test), axis=1)
    
    # Create confusion matrix
    cm = tf.math.confusion_matrix(y_test, y_pred).numpy()
    
    # Plot
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.savefig(os.path.join(output_dir, 'confusion_matrix.png'))
    plt.close()

def plot_sample_predictions(model, output_dir):
    """Plot sample images and their predictions."""
    # Load test data
    (_, _), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_test_norm = x_test.astype("float32") / 255.0
    
    # Get predictions for a few samples
    predictions = model.predict(x_test_norm[:20])
    predicted_classes = np.argmax(predictions, axis=1)
    
    # Plot
    plt.figure(figsize=(15, 10))
    for i in range(20):
        plt.subplot(4, 5, i+1)
        plt.imshow(x_test[i], cmap='gray')
        color = 'green' if predicted_classes[i] == y_test[i] else 'red'
        plt.title(f"True: {y_test[i]}, Pred: {predicted_classes[i]}", 
                  color=color)
        plt.axis('off')
    
    plt.tight_layout()
    plt.savefig(os.path.join(output_dir, 'sample_predictions.png'))
    plt.close()

def main():
    # Parse command line arguments
    parser = argparse.ArgumentParser(description='Generate visualizations for trained model')
    parser.add_argument('--model_dir', type=str, default='./output', 
                        help='Directory where model and metrics are saved')
    parser.add_argument('--plots_dir', type=str, default='./plots', 
                        help='Directory to save plots')
    args = parser.parse_args()
    
    # Create output directory if it doesn't exist
    os.makedirs(args.plots_dir, exist_ok=True)
    
    # Load metrics
    print("Loading metrics...")
    try:
        with open(os.path.join(args.model_dir, 'metrics.json'), 'r') as f:
            metrics = json.load(f)
    except FileNotFoundError:
        print("Error: Metrics file not found!")
        return
    
    # Plot training history
    print("Generating training history plot...")
    plot_training_history(metrics['training_history'], args.plots_dir)
    
    # Load model
    print("Loading model...")
    try:
        model_path = os.path.join(args.model_dir, 'final_model.h5')
        model = keras.models.load_model(model_path)
    except:
        print("Error: Could not load model!")
        return
    
    # Generate confusion matrix
    print("Generating confusion matrix...")
    plot_confusion_matrix(model, args.plots_dir)
    
    # Plot sample predictions
    print("Generating sample prediction plots...")
    plot_sample_predictions(model, args.plots_dir)
    
    print(f"All visualizations saved to {args.plots_dir}")

if __name__ == "__main__":
    main()

### Step 3: Setting Up Requirements.txt

Create a `requirements.txt` file in the root of your repository:

```
numpy>=1.20.0
tensorflow>=2.8.0
matplotlib>=3.5.0
seaborn>=0.11.0
scikit-learn>=1.0.0
```

## Combining Both Workflows: AI Model in Docker

Now, let's combine our two use cases by creating a workflow that:
1. Trains an AI model
2. Creates visualizations
3. Packages everything into a Docker image
4. Pushes the image to a registry

### Create a Dockerfile for the AI Model

```dockerfile
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model code and trained artifacts
COPY model/ ./model/
COPY output/ ./output/
COPY plots/ ./plots/

# Set up a command to serve predictions or show results
CMD ["python", "model/serve.py", "--model_path", "output/final_model.h5", "--port", "8000"]
```

### Create a Combined Workflow

Create a file at `.github/workflows/train-and-dockerize.yml`:

```yaml
name: Train AI Model and Dockerize

on:
  push:
    branches: [ main ]
    paths:
      - 'model/**'
      - 'data/**'
  workflow_dispatch:
    inputs:
      epochs:
        description: 'Number of training epochs'
        required: true
        default: '10'

jobs:
  train:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'
      
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      
      - name: Train model
        run: |
          EPOCHS=${{ github.event.inputs.epochs || '10' }}
          python model/train.py --epochs $EPOCHS --output_dir ./output
      
      - name: Generate visualizations
        run: |
          python model/visualize.py --model_dir ./output --plots_dir ./plots
      
      - name: Upload artifacts for next job
        uses: actions/upload-artifact@v3
        with:
          name: model-artifacts
          path: |
            model/
            output/
            plots/
            requirements.txt
  
  dockerize:
    needs: train
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Download artifacts
        uses: actions/download-artifact@v3
        with:
          name: model-artifacts
          path: .
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      
      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: username/ai-model-app
          tags: |
            type=sha,format=short
            type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}
      
      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
```

This workflow first trains the model and generates visualizations, then builds and pushes a Docker image containing the trained model and visualization outputs.

## Advanced GitHub Actions Features

### 1. Matrix Builds for Testing Different Configurations

You can use matrix builds to test your model with different configurations:

```yaml
jobs:
  train:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        learning_rate: [0.001, 0.01]
        batch_size: [32, 64]
    
    steps:
      # ... other steps
      - name: Train model
        run: |
          python model/train.py \
            --learning_rate ${{ matrix.learning_rate }} \
            --batch_size ${{ matrix.batch_size }} \
            --output_dir ./output/${{ matrix.learning_rate }}_${{ matrix.batch_size }}
```

### 2. Caching Dependencies and Data

Improve workflow performance by caching dependencies and datasets:

```yaml
- name: Cache pip dependencies
  uses: actions/cache@v3
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

- name: Cache dataset
  uses: actions/cache@v3
  with:
    path: ./data
    key: dataset-v1  # Increment this version when your dataset changes
```

### 3. Scheduled Training Runs

Run periodic training on new data with a schedule:

```yaml
on:
  schedule:
    # Run every Monday at 3:00 AM
    - cron: '0 3 * * 1'
```

### 4. Environment-specific Deployment

Deploy trained models to different environments based on branch:

```yaml
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }}
    steps:
      - name: Deploy model
        run: |
          python deploy.py \
            --model ./output/final_model.h5 \
            --environment ${{ env.ENVIRONMENT_NAME }} \
            --api_key ${{ secrets.API_KEY }}
```

## GitHub Actions Security Best Practices

1. **Use Secrets for Sensitive Data**: Always use GitHub Secrets for API keys, tokens, and credentials.

2. **Limit Permissions**: Use the `permissions` keyword to limit what the workflow can do.

3. **Pin Action Versions**: Always pin actions to specific versions or commit SHAs.

4. **Validate External Inputs**: Be cautious with inputs from external PRs and validate them.

5. **Secure Docker Images**: Scan Docker images for vulnerabilities using:

```yaml
- name: Scan Docker image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'username/ai-model-app:latest'
    format: 'table'
    exit-code: '1'
    severity: 'CRITICAL'
```

## Conclusion

GitHub Actions provides a powerful platform for automating your AI model training and Docker image building workflows. By leveraging these tools, you can:

1. Automatically train and evaluate models with each code change
2. Generate and archive visualizations
3. Package models into Docker containers for deployment
4. Ensure consistent environment across development and production

For more complex workflows, you can explore additional features like:
- Self-hosted runners for specialized hardware (e.g., GPU instances)
- Integration with cloud services (AWS, Azure, GCP)
- Notifications via email, Slack, or other platforms
- Model deployment to production environments

This notebook provides a foundation for implementing these workflows in your own repositories.

## Exercise: Create Your Own GitHub Actions Workflow

Try creating a simple GitHub Actions workflow that:

1. Sets up a Python environment
2. Installs dependencies from a requirements.txt file
3. Runs a simple script to train a model (you can use the provided examples)
4. Saves the results as artifacts

Steps:
1. Create a `.github/workflows/exercise.yml` file
2. Define a workflow that triggers on push to the main branch
3. Set up a job that runs on ubuntu-latest
4. Add steps to checkout code, set up Python, and install dependencies
5. Add a step to run a training script
6. Add a step to save artifacts

Good luck! Once you push this workflow to GitHub, it will automatically run whenever you push changes to the main branch.