# 🚀 MLOps & Production Deployment

## From Experiment to Production-Ready ML Systems

**What You'll Learn:**
- Experiment tracking and model versioning
- Model deployment (Flask, FastAPI, Docker)
- CI/CD pipelines for ML
- Monitoring and logging
- A/B testing strategies
- Data drift detection
- Production best practices

**Prerequisites:** Basic ML knowledge, Python programming

---

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import json
import joblib
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# ML libraries
from sklearn.datasets import load_iris, make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Libraries imported successfully!")

---

## Part 1: Experiment Tracking with MLflow

### 1.1 Setting up MLflow

**MLflow:** Open-source platform for ML lifecycle management

**Key Features:**
- Track experiments (parameters, metrics, artifacts)
- Package code in reproducible format
- Deploy models to various platforms
- Model registry for versioning

In [None]:
# Install MLflow if needed
# !pip install mlflow

try:
    import mlflow
    import mlflow.sklearn
    print(f"MLflow version: {mlflow.__version__}")
    MLFLOW_AVAILABLE = True
except ImportError:
    print("MLflow not installed. Install with: pip install mlflow")
    MLFLOW_AVAILABLE = False

### 1.2 Tracking Experiments

In [None]:
if MLFLOW_AVAILABLE:
    # Load data
    X, y = make_classification(
        n_samples=1000, n_features=20, n_informative=15,
        n_redundant=5, random_state=42
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Set experiment
    mlflow.set_experiment("iris_classification")
    
    # Function to train and track
    def train_with_mlflow(n_estimators, max_depth, min_samples_split):
        """Train model and track with MLflow."""
        
        with mlflow.start_run():
            # Log parameters
            mlflow.log_param("n_estimators", n_estimators)
            mlflow.log_param("max_depth", max_depth)
            mlflow.log_param("min_samples_split", min_samples_split)
            
            # Train model
            model = RandomForestClassifier(
                n_estimators=n_estimators,
                max_depth=max_depth,
                min_samples_split=min_samples_split,
                random_state=42
            )
            model.fit(X_train, y_train)
            
            # Predictions
            y_pred = model.predict(X_test)
            
            # Calculate metrics
            accuracy = accuracy_score(y_test, y_pred)
            precision = precision_score(y_test, y_pred, average='weighted')
            recall = recall_score(y_test, y_pred, average='weighted')
            f1 = f1_score(y_test, y_pred, average='weighted')
            
            # Log metrics
            mlflow.log_metric("accuracy", accuracy)
            mlflow.log_metric("precision", precision)
            mlflow.log_metric("recall", recall)
            mlflow.log_metric("f1_score", f1)
            
            # Log model
            mlflow.sklearn.log_model(model, "model")
            
            # Log artifacts (e.g., plots)
            from sklearn.metrics import confusion_matrix
            import matplotlib.pyplot as plt
            
            cm = confusion_matrix(y_test, y_pred)
            plt.figure(figsize=(8, 6))
            sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
            plt.title('Confusion Matrix')
            plt.ylabel('True Label')
            plt.xlabel('Predicted Label')
            plt.savefig('confusion_matrix.png')
            mlflow.log_artifact('confusion_matrix.png')
            plt.close()
            
            print(f"Run completed: Accuracy = {accuracy:.4f}")
            
            return model, accuracy
    
    # Run multiple experiments
    print("Running experiments with different hyperparameters...\n")
    
    experiments = [
        (50, 5, 2),
        (100, 10, 2),
        (100, 15, 5),
        (200, 20, 2),
    ]
    
    results = []
    for n_est, max_d, min_split in experiments:
        print(f"Training: n_estimators={n_est}, max_depth={max_d}, min_samples_split={min_split}")
        model, acc = train_with_mlflow(n_est, max_d, min_split)
        results.append((n_est, max_d, min_split, acc))
        print()
    
    # Display results
    results_df = pd.DataFrame(
        results, 
        columns=['n_estimators', 'max_depth', 'min_samples_split', 'accuracy']
    )
    print("\nExperiment Results:")
    print(results_df.to_string(index=False))
    print("\nTo view experiments: mlflow ui")
else:
    print("MLflow not available")

### 1.3 Model Registry

In [None]:
if MLFLOW_AVAILABLE:
    # Register best model
    model_name = "iris_classifier"
    
    # In practice, you would:
    # 1. Get the best run from experiments
    # 2. Register the model
    # 3. Transition to "Production" stage
    
    print("Model Registry Workflow:")
    print("""
    1. Register model:
       mlflow.register_model(
           model_uri=f"runs:/{run_id}/model",
           name="iris_classifier"
       )
    
    2. Transition to production:
       client = mlflow.tracking.MlflowClient()
       client.transition_model_version_stage(
           name="iris_classifier",
           version=1,
           stage="Production"
       )
    
    3. Load production model:
       model = mlflow.pyfunc.load_model(
           model_uri=f"models:/{model_name}/Production"
       )
    """)
else:
    print("MLflow not available")

---

## Part 2: Model Deployment

### 2.1 Flask API for Model Serving

In [None]:
# Save a simple model for deployment examples
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
joblib.dump(model, 'model.pkl')
print("Model saved to model.pkl")

In [None]:
%%writefile app_flask.py
"""
Flask API for model serving.

Run with: python app_flask.py
Test with: curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"features": [1.0, 2.0, ...]}'
"""

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model at startup
model = joblib.load('model.pkl')

@app.route('/health', methods=['GET'])
def health():
    """Health check endpoint."""
    return jsonify({'status': 'healthy'})

@app.route('/predict', methods=['POST'])
def predict():
    """Prediction endpoint."""
    try:
        # Get data from request
        data = request.get_json()
        features = np.array(data['features']).reshape(1, -1)
        
        # Make prediction
        prediction = model.predict(features)
        probability = model.predict_proba(features)
        
        # Return response
        response = {
            'prediction': int(prediction[0]),
            'probability': probability[0].tolist()
        }
        
        return jsonify(response)
    
    except Exception as e:
        return jsonify({'error': str(e)}), 400

@app.route('/batch_predict', methods=['POST'])
def batch_predict():
    """Batch prediction endpoint."""
    try:
        data = request.get_json()
        features = np.array(data['features'])
        
        predictions = model.predict(features)
        probabilities = model.predict_proba(features)
        
        response = {
            'predictions': predictions.tolist(),
            'probabilities': probabilities.tolist()
        }
        
        return jsonify(response)
    
    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=False)

### 2.2 FastAPI for High-Performance Serving

In [None]:
%%writefile app_fastapi.py
"""
FastAPI for high-performance model serving.

Run with: uvicorn app_fastapi:app --reload
Docs: http://localhost:8000/docs
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import List
import joblib
import numpy as np
import uvicorn

app = FastAPI(
    title="ML Model API",
    description="API for serving ML model predictions",
    version="1.0.0"
)

# Load model
model = joblib.load('model.pkl')

# Request/Response schemas
class PredictionInput(BaseModel):
    features: List[float] = Field(..., description="Input features for prediction")
    
    class Config:
        schema_extra = {
            "example": {
                "features": [1.0, 2.0, 3.0, 4.0, 5.0]
            }
        }

class PredictionOutput(BaseModel):
    prediction: int
    probability: List[float]
    confidence: float

class BatchPredictionInput(BaseModel):
    features: List[List[float]]

class BatchPredictionOutput(BaseModel):
    predictions: List[int]
    probabilities: List[List[float]]

@app.get("/")
async def root():
    return {"message": "ML Model API", "version": "1.0.0"}

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.post("/predict", response_model=PredictionOutput)
async def predict(input_data: PredictionInput):
    """Make a single prediction."""
    try:
        features = np.array(input_data.features).reshape(1, -1)
        
        prediction = model.predict(features)
        probability = model.predict_proba(features)
        
        return PredictionOutput(
            prediction=int(prediction[0]),
            probability=probability[0].tolist(),
            confidence=float(max(probability[0]))
        )
    
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

@app.post("/batch_predict", response_model=BatchPredictionOutput)
async def batch_predict(input_data: BatchPredictionInput):
    """Make batch predictions."""
    try:
        features = np.array(input_data.features)
        
        predictions = model.predict(features)
        probabilities = model.predict_proba(features)
        
        return BatchPredictionOutput(
            predictions=predictions.tolist(),
            probabilities=probabilities.tolist()
        )
    
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

### 2.3 Dockerfile for Containerization

In [None]:
%%writefile Dockerfile
# Multi-stage build for smaller image
FROM python:3.9-slim as builder

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Final stage
FROM python:3.9-slim

WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /root/.local /root/.local

# Copy application files
COPY app_fastapi.py .
COPY model.pkl .

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["uvicorn", "app_fastapi:app", "--host", "0.0.0.0", "--port", "8000"]

In [None]:
%%writefile requirements.txt
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
numpy==1.24.3
scikit-learn==1.3.2
joblib==1.3.2

In [None]:
%%writefile docker-compose.yml
version: '3.8'

services:
  ml-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/app/model.pkl
    volumes:
      - ./logs:/app/logs
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 5s

---

## Part 3: Monitoring and Logging

### 3.1 Logging Setup

In [None]:
import logging
import time
from datetime import datetime

# Setup logging
def setup_logging():
    """Configure logging for production."""
    
    # Create logs directory
    Path('logs').mkdir(exist_ok=True)
    
    # Configure logger
    logger = logging.getLogger('ml_api')
    logger.setLevel(logging.INFO)
    
    # File handler
    file_handler = logging.FileHandler(
        f'logs/ml_api_{datetime.now().strftime("%Y%m%d")}.log'
    )
    file_handler.setLevel(logging.INFO)
    
    # Console handler
    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)
    
    # Formatter
    formatter = logging.Formatter(
        '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
    )
    file_handler.setFormatter(formatter)
    console_handler.setFormatter(formatter)
    
    # Add handlers
    logger.addHandler(file_handler)
    logger.addHandler(console_handler)
    
    return logger

logger = setup_logging()
logger.info("Logging configured successfully")

### 3.2 Prediction Monitoring

In [None]:
class PredictionMonitor:
    """Monitor predictions in production."""
    
    def __init__(self, log_path='logs/predictions.jsonl'):
        self.log_path = log_path
        Path(log_path).parent.mkdir(exist_ok=True)
    
    def log_prediction(self, features, prediction, probability, latency_ms):
        """Log a prediction."""
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'features': features.tolist() if hasattr(features, 'tolist') else features,
            'prediction': int(prediction),
            'probability': probability.tolist() if hasattr(probability, 'tolist') else probability,
            'confidence': float(max(probability)),
            'latency_ms': latency_ms
        }
        
        # Append to log file
        with open(self.log_path, 'a') as f:
            f.write(json.dumps(log_entry) + '\n')
    
    def get_statistics(self, n=1000):
        """Get statistics from recent predictions."""
        predictions = []
        
        # Read last n predictions
        try:
            with open(self.log_path, 'r') as f:
                lines = f.readlines()
                for line in lines[-n:]:
                    predictions.append(json.loads(line))
        except FileNotFoundError:
            return {}
        
        if not predictions:
            return {}
        
        # Calculate statistics
        confidences = [p['confidence'] for p in predictions]
        latencies = [p['latency_ms'] for p in predictions]
        
        stats = {
            'total_predictions': len(predictions),
            'avg_confidence': np.mean(confidences),
            'min_confidence': np.min(confidences),
            'avg_latency_ms': np.mean(latencies),
            'p95_latency_ms': np.percentile(latencies, 95),
            'p99_latency_ms': np.percentile(latencies, 99),
        }
        
        return stats

# Example usage
monitor = PredictionMonitor()

# Simulate some predictions
for _ in range(10):
    features = np.random.randn(20)
    
    start = time.time()
    prediction = model.predict(features.reshape(1, -1))[0]
    probability = model.predict_proba(features.reshape(1, -1))[0]
    latency = (time.time() - start) * 1000  # ms
    
    monitor.log_prediction(features, prediction, probability, latency)

# Get statistics
stats = monitor.get_statistics()
print("\nPrediction Statistics:")
for key, value in stats.items():
    print(f"{key}: {value:.4f}")

### 3.3 Data Drift Detection

In [None]:
from scipy import stats as scipy_stats

class DataDriftDetector:
    """Detect data drift in production."""
    
    def __init__(self, reference_data):
        """Initialize with reference (training) data."""
        self.reference_data = reference_data
        self.reference_mean = np.mean(reference_data, axis=0)
        self.reference_std = np.std(reference_data, axis=0)
    
    def kolmogorov_smirnov_test(self, current_data, threshold=0.05):
        """
        Perform KS test to detect drift.
        
        Returns:
            dict: Drift detection results per feature
        """
        n_features = self.reference_data.shape[1]
        drift_detected = {}
        
        for i in range(n_features):
            # KS test
            statistic, p_value = scipy_stats.ks_2samp(
                self.reference_data[:, i],
                current_data[:, i]
            )
            
            drift_detected[f'feature_{i}'] = {
                'statistic': statistic,
                'p_value': p_value,
                'drift': p_value < threshold
            }
        
        return drift_detected
    
    def psi(self, current_data, bins=10):
        """
        Calculate Population Stability Index (PSI).
        
        PSI < 0.1: No significant change
        0.1 < PSI < 0.2: Moderate change
        PSI > 0.2: Significant change
        """
        n_features = self.reference_data.shape[1]
        psi_values = {}
        
        for i in range(n_features):
            # Calculate bins based on reference data
            _, bin_edges = np.histogram(self.reference_data[:, i], bins=bins)
            
            # Calculate distributions
            ref_dist, _ = np.histogram(self.reference_data[:, i], bins=bin_edges)
            cur_dist, _ = np.histogram(current_data[:, i], bins=bin_edges)
            
            # Normalize
            ref_dist = ref_dist / ref_dist.sum()
            cur_dist = cur_dist / cur_dist.sum()
            
            # Avoid division by zero
            ref_dist = np.where(ref_dist == 0, 0.0001, ref_dist)
            cur_dist = np.where(cur_dist == 0, 0.0001, cur_dist)
            
            # Calculate PSI
            psi = np.sum((cur_dist - ref_dist) * np.log(cur_dist / ref_dist))
            
            psi_values[f'feature_{i}'] = {
                'psi': psi,
                'status': 'stable' if psi < 0.1 else ('moderate' if psi < 0.2 else 'significant')
            }
        
        return psi_values
    
    def visualize_drift(self, current_data, feature_idx=0):
        """Visualize distribution drift for a feature."""
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        # Distributions
        axes[0].hist(self.reference_data[:, feature_idx], bins=30, alpha=0.5, label='Reference', density=True)
        axes[0].hist(current_data[:, feature_idx], bins=30, alpha=0.5, label='Current', density=True)
        axes[0].set_xlabel('Value')
        axes[0].set_ylabel('Density')
        axes[0].set_title(f'Distribution Comparison - Feature {feature_idx}')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Q-Q plot
        scipy_stats.probplot(current_data[:, feature_idx], dist=scipy_stats.norm, plot=axes[1])
        axes[1].set_title(f'Q-Q Plot - Feature {feature_idx}')
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()

# Example
detector = DataDriftDetector(X_train)

# Simulate drifted data
drifted_data = X_test + np.random.randn(*X_test.shape) * 0.5

# Detect drift
drift_results = detector.kolmogorov_smirnov_test(drifted_data)
psi_results = detector.psi(drifted_data)

print("\nDrift Detection (KS Test):")
drifted_features = [k for k, v in drift_results.items() if v['drift']]
print(f"Features with drift detected: {len(drifted_features)}/{len(drift_results)}")

print("\nPSI Analysis:")
for feature, result in list(psi_results.items())[:5]:  # Show first 5
    print(f"{feature}: PSI={result['psi']:.4f} ({result['status']})")

# Visualize
detector.visualize_drift(drifted_data, feature_idx=0)

---

## Part 4: A/B Testing

### 4.1 A/B Test Framework

In [None]:
class ABTestFramework:
    """A/B testing framework for model comparison."""
    
    def __init__(self, model_a, model_b, traffic_split=0.5):
        """
        Args:
            model_a: Control model
            model_b: Treatment model
            traffic_split: Fraction of traffic to model_b (0-1)
        """
        self.model_a = model_a
        self.model_b = model_b
        self.traffic_split = traffic_split
        
        self.results_a = []
        self.results_b = []
    
    def predict(self, features):
        """Route request to model A or B."""
        # Random assignment
        use_b = np.random.random() < self.traffic_split
        
        if use_b:
            prediction = self.model_b.predict(features)
            probability = self.model_b.predict_proba(features)
            model_used = 'B'
        else:
            prediction = self.model_a.predict(features)
            probability = self.model_a.predict_proba(features)
            model_used = 'A'
        
        return prediction, probability, model_used
    
    def record_outcome(self, model_used, predicted, actual, latency_ms):
        """Record prediction outcome."""
        result = {
            'predicted': predicted,
            'actual': actual,
            'correct': predicted == actual,
            'latency_ms': latency_ms
        }
        
        if model_used == 'A':
            self.results_a.append(result)
        else:
            self.results_b.append(result)
    
    def analyze_results(self):
        """Analyze A/B test results."""
        if not self.results_a or not self.results_b:
            return None
        
        # Calculate metrics
        accuracy_a = np.mean([r['correct'] for r in self.results_a])
        accuracy_b = np.mean([r['correct'] for r in self.results_b])
        
        latency_a = np.mean([r['latency_ms'] for r in self.results_a])
        latency_b = np.mean([r['latency_ms'] for r in self.results_b])
        
        # Statistical test
        correct_a = [r['correct'] for r in self.results_a]
        correct_b = [r['correct'] for r in self.results_b]
        
        # Two-proportion z-test
        n_a, n_b = len(correct_a), len(correct_b)
        p_a, p_b = accuracy_a, accuracy_b
        p_pooled = (sum(correct_a) + sum(correct_b)) / (n_a + n_b)
        
        se = np.sqrt(p_pooled * (1 - p_pooled) * (1/n_a + 1/n_b))
        z_score = (p_b - p_a) / se if se > 0 else 0
        p_value = 2 * (1 - scipy_stats.norm.cdf(abs(z_score)))
        
        results = {
            'model_a': {
                'samples': n_a,
                'accuracy': accuracy_a,
                'latency_ms': latency_a
            },
            'model_b': {
                'samples': n_b,
                'accuracy': accuracy_b,
                'latency_ms': latency_b
            },
            'difference': {
                'accuracy_diff': accuracy_b - accuracy_a,
                'latency_diff': latency_b - latency_a,
                'z_score': z_score,
                'p_value': p_value,
                'significant': p_value < 0.05
            }
        }
        
        return results
    
    def visualize_results(self):
        """Visualize A/B test results."""
        results = self.analyze_results()
        if results is None:
            print("No results to visualize")
            return
        
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        # Accuracy comparison
        models = ['Model A', 'Model B']
        accuracies = [
            results['model_a']['accuracy'],
            results['model_b']['accuracy']
        ]
        
        axes[0].bar(models, accuracies, color=['blue', 'orange'])
        axes[0].set_ylabel('Accuracy')
        axes[0].set_title('Model Accuracy Comparison')
        axes[0].set_ylim([min(accuracies) * 0.95, max(accuracies) * 1.05])
        axes[0].grid(True, alpha=0.3)
        
        # Latency comparison
        latencies = [
            results['model_a']['latency_ms'],
            results['model_b']['latency_ms']
        ]
        
        axes[1].bar(models, latencies, color=['blue', 'orange'])
        axes[1].set_ylabel('Latency (ms)')
        axes[1].set_title('Model Latency Comparison')
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Print summary
        print("\nA/B Test Results:")
        print(f"Model A: {results['model_a']['samples']} samples, "
              f"Accuracy={results['model_a']['accuracy']:.4f}")
        print(f"Model B: {results['model_b']['samples']} samples, "
              f"Accuracy={results['model_b']['accuracy']:.4f}")
        print(f"\nDifference: {results['difference']['accuracy_diff']:.4f}")
        print(f"P-value: {results['difference']['p_value']:.4f}")
        print(f"Statistically significant: {results['difference']['significant']}")

# Example
# Train two different models
model_a = RandomForestClassifier(n_estimators=50, random_state=42)
model_b = RandomForestClassifier(n_estimators=100, random_state=42)

model_a.fit(X_train, y_train)
model_b.fit(X_train, y_train)

# Run A/B test
ab_test = ABTestFramework(model_a, model_b, traffic_split=0.5)

# Simulate requests
for i in range(200):
    features = X_test[i:i+1]
    actual = y_test[i]
    
    start = time.time()
    prediction, probability, model_used = ab_test.predict(features)
    latency = (time.time() - start) * 1000
    
    ab_test.record_outcome(model_used, prediction[0], actual, latency)

# Analyze and visualize
ab_test.visualize_results()

---

## Part 5: CI/CD for ML

### 5.1 GitHub Actions Workflow

In [None]:
%%writefile .github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install pytest pytest-cov
    
    - name: Run tests
      run: |
        pytest tests/ --cov=. --cov-report=xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
  
  train:
    needs: test
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
    
    - name: Train model
      run: |
        python train.py
    
    - name: Validate model
      run: |
        python validate.py
    
    - name: Upload model artifact
      uses: actions/upload-artifact@v3
      with:
        name: model
        path: model.pkl
  
  deploy:
    needs: train
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Download model
      uses: actions/download-artifact@v3
      with:
        name: model
    
    - name: Build Docker image
      run: |
        docker build -t ml-api:latest .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag ml-api:latest ${{ secrets.DOCKER_USERNAME }}/ml-api:latest
        docker push ${{ secrets.DOCKER_USERNAME }}/ml-api:latest

---

## 📝 Summary

### Key Concepts

1. **Experiment Tracking:**
   - Use MLflow to track parameters, metrics, artifacts
   - Model registry for versioning
   - Reproducible experiments

2. **Model Deployment:**
   - Flask for simple APIs
   - FastAPI for high performance
   - Docker for containerization
   - Health checks and monitoring

3. **Monitoring:**
   - Log all predictions
   - Track latency and confidence
   - Detect data drift (KS test, PSI)
   - Alert on anomalies

4. **A/B Testing:**
   - Compare models in production
   - Statistical significance testing
   - Traffic splitting

5. **CI/CD:**
   - Automated testing
   - Model training pipelines
   - Automated deployment
   - Version control

### Production Checklist

**Before Deployment:**
- ✅ Model performance meets requirements
- ✅ Comprehensive tests written
- ✅ API documentation complete
- ✅ Logging configured
- ✅ Monitoring setup
- ✅ Docker image tested

**After Deployment:**
- ✅ Health checks passing
- ✅ Predictions logged
- ✅ Latency within limits
- ✅ Error rate low
- ✅ Drift detection active
- ✅ Alerts configured

### Interview Questions

1. **How do you deploy an ML model to production?**
   - Containerize with Docker
   - Expose via REST API (Flask/FastAPI)
   - Deploy to cloud (AWS/GCP/Azure)
   - Add monitoring and logging

2. **How do you detect if a model is degrading in production?**
   - Monitor prediction confidence
   - Track labeled feedback accuracy
   - Detect data drift (KS test, PSI)
   - Compare to baseline metrics

3. **What is data drift and how do you handle it?**
   - Input distribution changes over time
   - Detect with statistical tests (KS, PSI)
   - Retrain model on recent data
   - Update model in production

4. **How do you do A/B testing for models?**
   - Split traffic between models
   - Track performance metrics
   - Statistical significance test
   - Gradual rollout of winner

5. **What should you log in production?**
   - Input features
   - Predictions and probabilities
   - Latency
   - Errors and exceptions
   - Model version

### Next Steps

- **Deploy:** Build and deploy your own model API
- **Monitor:** Set up comprehensive monitoring
- **Scale:** Learn Kubernetes for scaling
- **Advanced:** Feature stores, model governance

---

**Congratulations!** You've learned MLOps and production deployment. You can now take models from experiment to production!

**Next:** [12 - Reinforcement Learning](./12_reinforcement_learning.ipynb)