# Project 21: MLOps Pipeline

**Implement model versioning, monitoring, and retraining**

In this tutorial, we'll build a complete MLOps framework that handles the entire ML lifecycle:

```
┌─────────────────────────────────────────────────────────────────────┐
│                        MLOps Pipeline                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐ │
│  │   Data   │──▶│  Model   │──▶│  Model   │──▶│   Deployment     │ │
│  │  Prep    │   │ Training │   │ Registry │   │   (Staging/Prod) │ │
│  └──────────┘   └──────────┘   └──────────┘   └──────────────────┘ │
│       │              │              │                   │          │
│       ▼              ▼              ▼                   ▼          │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐ │
│  │ Feature  │   │Experiment│   │ Version  │   │   Monitoring     │ │
│  │  Store   │   │ Tracking │   │ Control  │   │   & Alerting     │ │
│  └──────────┘   └──────────┘   └──────────┘   └──────────────────┘ │
│                                                       │            │
│                      ┌────────────────────────────────┘            │
│                      ▼                                             │
│              ┌──────────────┐                                      │
│              │  Auto        │◀── Drift Detection                   │
│              │  Retrain     │◀── Performance Decay                 │
│              └──────────────┘                                      │
└─────────────────────────────────────────────────────────────────────┘
```

**Components we'll build:**
1. Experiment Tracker (like MLflow)
2. Model Registry & Versioning
3. Feature Store
4. Data Drift Detection
5. Model Performance Monitoring
6. Automated Retraining Pipeline
7. A/B Testing Framework
8. Complete MLOps Pipeline

## Table of Contents

1. [Setup and Installation](#1-setup-and-installation)
2. [MLOps Concepts Overview](#2-mlops-concepts-overview)
3. [Experiment Tracker](#3-experiment-tracker)
4. [Model Registry](#4-model-registry)
5. [Feature Store](#5-feature-store)
6. [Data Drift Detection](#6-data-drift-detection)
7. [Model Performance Monitoring](#7-model-performance-monitoring)
8. [Automated Retraining Pipeline](#8-automated-retraining-pipeline)
9. [A/B Testing Framework](#9-ab-testing-framework)
10. [Complete MLOps Pipeline](#10-complete-mlops-pipeline)
11. [Demo: Full MLOps Simulation](#11-demo-full-mlops-simulation)
12. [Summary](#12-summary)

## 1. Setup and Installation

In [None]:
# All standard libraries - no special installation needed!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
import os
import hashlib
import pickle
import joblib
from pathlib import Path
from typing import Dict, List, Optional, Any, Tuple, Union
from dataclasses import dataclass, field, asdict
from enum import Enum
import uuid
import warnings
import time
from collections import defaultdict
warnings.filterwarnings('ignore')

# Sklearn
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, confusion_matrix, classification_report
)
from sklearn.datasets import make_classification

# Statistical tests for drift detection
from scipy import stats
from scipy.stats import ks_2samp, chi2_contingency, wasserstein_distance

# Set seeds
SEED = 42
np.random.seed(SEED)

# Create MLOps directory
MLOPS_DIR = Path('./mlops_artifacts')
MLOPS_DIR.mkdir(exist_ok=True)

print("MLOps Pipeline - All libraries loaded!")
print(f"Artifacts directory: {MLOPS_DIR}")

## 2. MLOps Concepts Overview

### What is MLOps?

MLOps (Machine Learning Operations) is a set of practices that combines ML, DevOps, and Data Engineering to deploy and maintain ML systems in production reliably and efficiently.

### Key Components:

| Component | Purpose | Production Tools |
|-----------|---------|------------------|
| **Experiment Tracking** | Log parameters, metrics, artifacts | MLflow, W&B, Neptune |
| **Model Registry** | Version and manage models | MLflow, SageMaker |
| **Feature Store** | Store and serve features | Feast, Tecton |
| **Drift Detection** | Detect data/concept drift | Evidently, WhyLabs |
| **Monitoring** | Track model performance | Prometheus, Grafana |
| **CI/CD** | Automate testing/deployment | GitHub Actions, Jenkins |
| **Orchestration** | Schedule and manage pipelines | Airflow, Kubeflow |

### Model Lifecycle:

```
Development → Staging → Production → Monitoring → Retraining
     ↑                                              ↓
     └──────────────────────────────────────────────┘
```

In [None]:
# Define enums and data classes for MLOps

class ModelStage(Enum):
    """Model deployment stages."""
    DEVELOPMENT = "development"
    STAGING = "staging"
    PRODUCTION = "production"
    ARCHIVED = "archived"

class AlertSeverity(Enum):
    """Alert severity levels."""
    INFO = "info"
    WARNING = "warning"
    CRITICAL = "critical"

class DriftType(Enum):
    """Types of drift."""
    DATA_DRIFT = "data_drift"
    CONCEPT_DRIFT = "concept_drift"
    PREDICTION_DRIFT = "prediction_drift"

@dataclass
class ModelMetadata:
    """Metadata for a trained model."""
    model_id: str
    model_name: str
    version: str
    created_at: str
    stage: str
    metrics: Dict[str, float]
    parameters: Dict[str, Any]
    tags: Dict[str, str] = field(default_factory=dict)
    description: str = ""

@dataclass
class Alert:
    """Monitoring alert."""
    alert_id: str
    timestamp: str
    severity: str
    message: str
    metric_name: str
    metric_value: float
    threshold: float

print("MLOps data structures defined!")

## 3. Experiment Tracker

Track experiments with parameters, metrics, and artifacts - similar to MLflow.

In [None]:
class ExperimentTracker:
    """
    Track ML experiments with parameters, metrics, and artifacts.
    
    Similar to MLflow tracking but file-based for portability.
    """
    
    def __init__(self, tracking_dir: str = './mlops_artifacts/experiments'):
        self.tracking_dir = Path(tracking_dir)
        self.tracking_dir.mkdir(parents=True, exist_ok=True)
        
        self.current_run = None
        self.runs = self._load_runs()
    
    def _load_runs(self) -> Dict:
        """Load existing runs from disk."""
        runs_file = self.tracking_dir / 'runs.json'
        if runs_file.exists():
            with open(runs_file, 'r') as f:
                return json.load(f)
        return {}
    
    def _save_runs(self):
        """Save runs to disk."""
        with open(self.tracking_dir / 'runs.json', 'w') as f:
            json.dump(self.runs, f, indent=2, default=str)
    
    def start_run(self, experiment_name: str, run_name: str = None) -> str:
        """
        Start a new experiment run.
        
        Returns:
            run_id: Unique identifier for this run
        """
        run_id = str(uuid.uuid4())[:8]
        run_name = run_name or f"run_{run_id}"
        
        self.current_run = {
            'run_id': run_id,
            'run_name': run_name,
            'experiment_name': experiment_name,
            'start_time': datetime.now().isoformat(),
            'end_time': None,
            'status': 'running',
            'parameters': {},
            'metrics': {},
            'artifacts': [],
            'tags': {}
        }
        
        print(f"Started run: {run_name} (ID: {run_id})")
        return run_id
    
    def log_param(self, key: str, value: Any):
        """Log a parameter."""
        if self.current_run is None:
            raise ValueError("No active run. Call start_run() first.")
        self.current_run['parameters'][key] = value
    
    def log_params(self, params: Dict[str, Any]):
        """Log multiple parameters."""
        for key, value in params.items():
            self.log_param(key, value)
    
    def log_metric(self, key: str, value: float, step: int = None):
        """Log a metric."""
        if self.current_run is None:
            raise ValueError("No active run. Call start_run() first.")
        
        if key not in self.current_run['metrics']:
            self.current_run['metrics'][key] = []
        
        self.current_run['metrics'][key].append({
            'value': value,
            'step': step,
            'timestamp': datetime.now().isoformat()
        })
    
    def log_metrics(self, metrics: Dict[str, float], step: int = None):
        """Log multiple metrics."""
        for key, value in metrics.items():
            self.log_metric(key, value, step)
    
    def log_artifact(self, artifact_path: str, artifact_name: str = None):
        """Log an artifact (file)."""
        if self.current_run is None:
            raise ValueError("No active run. Call start_run() first.")
        
        artifact_name = artifact_name or Path(artifact_path).name
        self.current_run['artifacts'].append({
            'name': artifact_name,
            'path': str(artifact_path),
            'logged_at': datetime.now().isoformat()
        })
    
    def set_tag(self, key: str, value: str):
        """Set a tag on the current run."""
        if self.current_run is None:
            raise ValueError("No active run. Call start_run() first.")
        self.current_run['tags'][key] = value
    
    def end_run(self, status: str = 'completed'):
        """End the current run."""
        if self.current_run is None:
            raise ValueError("No active run.")
        
        self.current_run['end_time'] = datetime.now().isoformat()
        self.current_run['status'] = status
        
        # Store run
        run_id = self.current_run['run_id']
        self.runs[run_id] = self.current_run
        self._save_runs()
        
        print(f"Ended run: {self.current_run['run_name']} (Status: {status})")
        self.current_run = None
    
    def get_run(self, run_id: str) -> Dict:
        """Get a specific run by ID."""
        return self.runs.get(run_id)
    
    def search_runs(self, experiment_name: str = None, 
                    metric_name: str = None, 
                    min_value: float = None) -> List[Dict]:
        """Search runs with filters."""
        results = []
        
        for run_id, run in self.runs.items():
            if experiment_name and run['experiment_name'] != experiment_name:
                continue
            
            if metric_name and metric_name in run['metrics']:
                metric_values = [m['value'] for m in run['metrics'][metric_name]]
                if min_value and max(metric_values) < min_value:
                    continue
            
            results.append(run)
        
        return results
    
    def get_best_run(self, experiment_name: str, metric_name: str, 
                     maximize: bool = True) -> Dict:
        """Get the best run for an experiment based on a metric."""
        runs = self.search_runs(experiment_name=experiment_name)
        
        best_run = None
        best_value = -np.inf if maximize else np.inf
        
        for run in runs:
            if metric_name in run['metrics']:
                values = [m['value'] for m in run['metrics'][metric_name]]
                value = max(values) if maximize else min(values)
                
                if (maximize and value > best_value) or (not maximize and value < best_value):
                    best_value = value
                    best_run = run
        
        return best_run
    
    def get_runs_df(self) -> pd.DataFrame:
        """Get all runs as a DataFrame."""
        rows = []
        for run_id, run in self.runs.items():
            row = {
                'run_id': run_id,
                'run_name': run['run_name'],
                'experiment': run['experiment_name'],
                'status': run['status'],
                'start_time': run['start_time']
            }
            # Add final metric values
            for metric_name, values in run['metrics'].items():
                row[metric_name] = values[-1]['value'] if values else None
            rows.append(row)
        
        return pd.DataFrame(rows)

# Test experiment tracker
tracker = ExperimentTracker()
print("\nExperimentTracker created!")
print("\nCapabilities:")
print("  - log_param(), log_params()")
print("  - log_metric(), log_metrics()")
print("  - log_artifact()")
print("  - search_runs(), get_best_run()")

## 4. Model Registry

Version control for models with staging capabilities.

In [None]:
class ModelRegistry:
    """
    Registry for versioning and managing ML models.
    
    Features:
    - Model versioning
    - Stage transitions (dev → staging → production)
    - Model metadata storage
    - Model loading/saving
    """
    
    def __init__(self, registry_dir: str = './mlops_artifacts/model_registry'):
        self.registry_dir = Path(registry_dir)
        self.registry_dir.mkdir(parents=True, exist_ok=True)
        
        self.registry = self._load_registry()
    
    def _load_registry(self) -> Dict:
        """Load registry from disk."""
        registry_file = self.registry_dir / 'registry.json'
        if registry_file.exists():
            with open(registry_file, 'r') as f:
                return json.load(f)
        return {'models': {}, 'versions': {}}
    
    def _save_registry(self):
        """Save registry to disk."""
        with open(self.registry_dir / 'registry.json', 'w') as f:
            json.dump(self.registry, f, indent=2, default=str)
    
    def register_model(self, model, model_name: str, 
                       metrics: Dict[str, float],
                       parameters: Dict[str, Any] = None,
                       description: str = "",
                       tags: Dict[str, str] = None) -> str:
        """
        Register a new model version.
        
        Returns:
            version: Version string (e.g., 'v1', 'v2')
        """
        # Initialize model entry if new
        if model_name not in self.registry['models']:
            self.registry['models'][model_name] = {
                'created_at': datetime.now().isoformat(),
                'versions': [],
                'latest_version': None,
                'production_version': None,
                'staging_version': None
            }
        
        # Generate version
        version_num = len(self.registry['models'][model_name]['versions']) + 1
        version = f"v{version_num}"
        
        # Generate model ID
        model_id = f"{model_name}_{version}_{str(uuid.uuid4())[:8]}"
        
        # Save model artifact
        model_path = self.registry_dir / model_name / version
        model_path.mkdir(parents=True, exist_ok=True)
        joblib.dump(model, model_path / 'model.joblib')
        
        # Create metadata
        metadata = ModelMetadata(
            model_id=model_id,
            model_name=model_name,
            version=version,
            created_at=datetime.now().isoformat(),
            stage=ModelStage.DEVELOPMENT.value,
            metrics=metrics,
            parameters=parameters or {},
            tags=tags or {},
            description=description
        )
        
        # Save metadata
        with open(model_path / 'metadata.json', 'w') as f:
            json.dump(asdict(metadata), f, indent=2)
        
        # Update registry
        self.registry['models'][model_name]['versions'].append(version)
        self.registry['models'][model_name]['latest_version'] = version
        self.registry['versions'][model_id] = asdict(metadata)
        
        self._save_registry()
        
        print(f"Registered model: {model_name} {version}")
        return version
    
    def transition_stage(self, model_name: str, version: str, stage: ModelStage):
        """
        Transition model to a new stage.
        """
        model_path = self.registry_dir / model_name / version / 'metadata.json'
        
        if not model_path.exists():
            raise ValueError(f"Model {model_name} {version} not found")
        
        # Update metadata
        with open(model_path, 'r') as f:
            metadata = json.load(f)
        
        old_stage = metadata['stage']
        metadata['stage'] = stage.value
        
        with open(model_path, 'w') as f:
            json.dump(metadata, f, indent=2)
        
        # Update registry
        if stage == ModelStage.PRODUCTION:
            # Archive previous production model
            old_prod = self.registry['models'][model_name].get('production_version')
            if old_prod:
                self.transition_stage(model_name, old_prod, ModelStage.ARCHIVED)
            self.registry['models'][model_name]['production_version'] = version
        elif stage == ModelStage.STAGING:
            self.registry['models'][model_name]['staging_version'] = version
        
        # Update version metadata in registry
        model_id = metadata['model_id']
        self.registry['versions'][model_id]['stage'] = stage.value
        
        self._save_registry()
        
        print(f"Transitioned {model_name} {version}: {old_stage} → {stage.value}")
    
    def load_model(self, model_name: str, version: str = None, 
                   stage: ModelStage = None):
        """
        Load a model from the registry.
        
        Args:
            model_name: Name of the model
            version: Specific version (e.g., 'v1')
            stage: Load model at specific stage (e.g., PRODUCTION)
        """
        if stage:
            if stage == ModelStage.PRODUCTION:
                version = self.registry['models'][model_name].get('production_version')
            elif stage == ModelStage.STAGING:
                version = self.registry['models'][model_name].get('staging_version')
        
        if version is None:
            version = self.registry['models'][model_name]['latest_version']
        
        model_path = self.registry_dir / model_name / version / 'model.joblib'
        
        if not model_path.exists():
            raise ValueError(f"Model {model_name} {version} not found")
        
        return joblib.load(model_path)
    
    def get_model_metadata(self, model_name: str, version: str) -> Dict:
        """Get metadata for a specific model version."""
        metadata_path = self.registry_dir / model_name / version / 'metadata.json'
        
        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                return json.load(f)
        return None
    
    def list_models(self) -> List[str]:
        """List all registered models."""
        return list(self.registry['models'].keys())
    
    def list_versions(self, model_name: str) -> List[str]:
        """List all versions of a model."""
        if model_name in self.registry['models']:
            return self.registry['models'][model_name]['versions']
        return []
    
    def get_production_model(self, model_name: str):
        """Get the production model."""
        return self.load_model(model_name, stage=ModelStage.PRODUCTION)
    
    def compare_versions(self, model_name: str, versions: List[str] = None) -> pd.DataFrame:
        """Compare metrics across model versions."""
        versions = versions or self.list_versions(model_name)
        
        rows = []
        for version in versions:
            metadata = self.get_model_metadata(model_name, version)
            if metadata:
                row = {
                    'version': version,
                    'stage': metadata['stage'],
                    'created_at': metadata['created_at'],
                    **metadata['metrics']
                }
                rows.append(row)
        
        return pd.DataFrame(rows)

# Test model registry
registry = ModelRegistry()
print("\nModelRegistry created!")
print("\nCapabilities:")
print("  - register_model()")
print("  - transition_stage()")
print("  - load_model()")
print("  - compare_versions()")

## 5. Feature Store

A simple feature store for managing and serving features.

In [None]:
class FeatureStore:
    """
    Simple feature store for managing ML features.
    
    Features:
    - Store feature definitions
    - Track feature statistics
    - Serve features for training/inference
    """
    
    def __init__(self, store_dir: str = './mlops_artifacts/feature_store'):
        self.store_dir = Path(store_dir)
        self.store_dir.mkdir(parents=True, exist_ok=True)
        
        self.feature_groups = self._load_store()
    
    def _load_store(self) -> Dict:
        """Load feature store from disk."""
        store_file = self.store_dir / 'feature_store.json'
        if store_file.exists():
            with open(store_file, 'r') as f:
                return json.load(f)
        return {}
    
    def _save_store(self):
        """Save feature store to disk."""
        with open(self.store_dir / 'feature_store.json', 'w') as f:
            json.dump(self.feature_groups, f, indent=2, default=str)
    
    def create_feature_group(self, name: str, description: str = ""):
        """Create a new feature group."""
        if name not in self.feature_groups:
            self.feature_groups[name] = {
                'name': name,
                'description': description,
                'created_at': datetime.now().isoformat(),
                'features': {},
                'statistics': {}
            }
            self._save_store()
            print(f"Created feature group: {name}")
    
    def register_features(self, group_name: str, df: pd.DataFrame, 
                          feature_cols: List[str] = None):
        """
        Register features from a DataFrame.
        """
        if group_name not in self.feature_groups:
            self.create_feature_group(group_name)
        
        feature_cols = feature_cols or df.columns.tolist()
        
        for col in feature_cols:
            if col in df.columns:
                # Store feature definition
                self.feature_groups[group_name]['features'][col] = {
                    'dtype': str(df[col].dtype),
                    'registered_at': datetime.now().isoformat()
                }
                
                # Calculate and store statistics
                if pd.api.types.is_numeric_dtype(df[col]):
                    self.feature_groups[group_name]['statistics'][col] = {
                        'mean': float(df[col].mean()),
                        'std': float(df[col].std()),
                        'min': float(df[col].min()),
                        'max': float(df[col].max()),
                        'missing_pct': float(df[col].isnull().mean() * 100)
                    }
                else:
                    self.feature_groups[group_name]['statistics'][col] = {
                        'n_unique': int(df[col].nunique()),
                        'top_value': str(df[col].mode().iloc[0]) if len(df[col].mode()) > 0 else None,
                        'missing_pct': float(df[col].isnull().mean() * 100)
                    }
        
        # Save feature data
        data_path = self.store_dir / group_name
        data_path.mkdir(exist_ok=True)
        df[feature_cols].to_parquet(data_path / 'features.parquet', index=False)
        
        self._save_store()
        print(f"Registered {len(feature_cols)} features to group '{group_name}'")
    
    def get_features(self, group_name: str, feature_cols: List[str] = None) -> pd.DataFrame:
        """Get features from the store."""
        data_path = self.store_dir / group_name / 'features.parquet'
        
        if not data_path.exists():
            raise ValueError(f"Feature group '{group_name}' has no data")
        
        df = pd.read_parquet(data_path)
        
        if feature_cols:
            df = df[feature_cols]
        
        return df
    
    def get_statistics(self, group_name: str) -> Dict:
        """Get feature statistics."""
        if group_name in self.feature_groups:
            return self.feature_groups[group_name]['statistics']
        return {}
    
    def list_feature_groups(self) -> List[str]:
        """List all feature groups."""
        return list(self.feature_groups.keys())
    
    def list_features(self, group_name: str) -> List[str]:
        """List features in a group."""
        if group_name in self.feature_groups:
            return list(self.feature_groups[group_name]['features'].keys())
        return []

# Test feature store
feature_store = FeatureStore()
print("\nFeatureStore created!")
print("\nCapabilities:")
print("  - create_feature_group()")
print("  - register_features()")
print("  - get_features()")
print("  - get_statistics()")

## 6. Data Drift Detection

Detect when data distribution changes over time.

In [None]:
class DriftDetector:
    """
    Detect data drift using statistical tests.
    
    Methods:
    - KS Test (Kolmogorov-Smirnov) for numeric features
    - Chi-Square test for categorical features
    - PSI (Population Stability Index)
    - Wasserstein distance
    """
    
    def __init__(self, significance_level: float = 0.05, psi_threshold: float = 0.2):
        """
        Args:
            significance_level: P-value threshold for statistical tests
            psi_threshold: PSI threshold (>0.2 indicates significant drift)
        """
        self.significance_level = significance_level
        self.psi_threshold = psi_threshold
        self.reference_data = None
        self.reference_stats = {}
    
    def set_reference(self, df: pd.DataFrame):
        """
        Set reference (baseline) data for comparison.
        """
        self.reference_data = df.copy()
        
        # Calculate reference statistics
        for col in df.columns:
            if pd.api.types.is_numeric_dtype(df[col]):
                self.reference_stats[col] = {
                    'type': 'numeric',
                    'mean': df[col].mean(),
                    'std': df[col].std(),
                    'quantiles': df[col].quantile([0.25, 0.5, 0.75]).to_dict()
                }
            else:
                self.reference_stats[col] = {
                    'type': 'categorical',
                    'value_counts': df[col].value_counts(normalize=True).to_dict()
                }
        
        print(f"Reference data set with {len(df)} samples, {len(df.columns)} features")
    
    def detect_drift(self, current_df: pd.DataFrame) -> Dict:
        """
        Detect drift between reference and current data.
        
        Returns:
            Dict with drift results per feature
        """
        if self.reference_data is None:
            raise ValueError("Reference data not set. Call set_reference() first.")
        
        results = {
            'overall_drift': False,
            'n_drifted_features': 0,
            'features': {},
            'timestamp': datetime.now().isoformat()
        }
        
        common_cols = set(self.reference_data.columns) & set(current_df.columns)
        
        for col in common_cols:
            if col not in self.reference_stats:
                continue
            
            ref_data = self.reference_data[col].dropna()
            cur_data = current_df[col].dropna()
            
            if self.reference_stats[col]['type'] == 'numeric':
                drift_result = self._detect_numeric_drift(ref_data, cur_data, col)
            else:
                drift_result = self._detect_categorical_drift(ref_data, cur_data, col)
            
            results['features'][col] = drift_result
            
            if drift_result['is_drifted']:
                results['n_drifted_features'] += 1
        
        # Overall drift if more than 20% of features drifted
        drift_ratio = results['n_drifted_features'] / len(common_cols) if common_cols else 0
        results['overall_drift'] = drift_ratio > 0.2
        results['drift_ratio'] = drift_ratio
        
        return results
    
    def _detect_numeric_drift(self, ref_data: pd.Series, cur_data: pd.Series, 
                               col_name: str) -> Dict:
        """
        Detect drift in numeric feature using KS test and PSI.
        """
        # KS Test
        ks_stat, ks_pvalue = ks_2samp(ref_data, cur_data)
        
        # PSI (Population Stability Index)
        psi = self._calculate_psi(ref_data, cur_data)
        
        # Wasserstein distance
        wasserstein = wasserstein_distance(ref_data, cur_data)
        
        # Mean shift
        mean_shift = abs(ref_data.mean() - cur_data.mean()) / (ref_data.std() + 1e-10)
        
        is_drifted = (ks_pvalue < self.significance_level) or (psi > self.psi_threshold)
        
        return {
            'type': 'numeric',
            'is_drifted': is_drifted,
            'ks_statistic': float(ks_stat),
            'ks_pvalue': float(ks_pvalue),
            'psi': float(psi),
            'wasserstein_distance': float(wasserstein),
            'mean_shift': float(mean_shift),
            'ref_mean': float(ref_data.mean()),
            'cur_mean': float(cur_data.mean())
        }
    
    def _detect_categorical_drift(self, ref_data: pd.Series, cur_data: pd.Series,
                                   col_name: str) -> Dict:
        """
        Detect drift in categorical feature using Chi-Square test.
        """
        # Get all categories
        all_cats = set(ref_data.unique()) | set(cur_data.unique())
        
        # Calculate frequencies
        ref_counts = ref_data.value_counts()
        cur_counts = cur_data.value_counts()
        
        # Align categories
        ref_freq = [ref_counts.get(cat, 0) for cat in all_cats]
        cur_freq = [cur_counts.get(cat, 0) for cat in all_cats]
        
        # Chi-Square test (need at least some counts)
        if sum(ref_freq) > 0 and sum(cur_freq) > 0:
            # Create contingency table
            contingency = np.array([ref_freq, cur_freq])
            # Remove zero columns
            contingency = contingency[:, contingency.sum(axis=0) > 0]
            
            if contingency.shape[1] > 1:
                chi2, pvalue, dof, expected = chi2_contingency(contingency)
            else:
                chi2, pvalue = 0, 1.0
        else:
            chi2, pvalue = 0, 1.0
        
        is_drifted = pvalue < self.significance_level
        
        return {
            'type': 'categorical',
            'is_drifted': is_drifted,
            'chi2_statistic': float(chi2),
            'chi2_pvalue': float(pvalue),
            'n_categories_ref': len(ref_counts),
            'n_categories_cur': len(cur_counts)
        }
    
    def _calculate_psi(self, ref_data: pd.Series, cur_data: pd.Series, 
                       n_bins: int = 10) -> float:
        """
        Calculate Population Stability Index (PSI).
        
        PSI < 0.1: No significant change
        0.1 <= PSI < 0.2: Moderate change
        PSI >= 0.2: Significant change
        """
        # Create bins based on reference data
        bins = np.quantile(ref_data, np.linspace(0, 1, n_bins + 1))
        bins = np.unique(bins)  # Remove duplicates
        
        if len(bins) < 2:
            return 0.0
        
        # Calculate proportions
        ref_counts, _ = np.histogram(ref_data, bins=bins)
        cur_counts, _ = np.histogram(cur_data, bins=bins)
        
        ref_pct = ref_counts / len(ref_data)
        cur_pct = cur_counts / len(cur_data)
        
        # Avoid division by zero
        ref_pct = np.clip(ref_pct, 0.0001, None)
        cur_pct = np.clip(cur_pct, 0.0001, None)
        
        # Calculate PSI
        psi = np.sum((cur_pct - ref_pct) * np.log(cur_pct / ref_pct))
        
        return psi
    
    def get_drift_report(self, drift_results: Dict) -> pd.DataFrame:
        """
        Get drift results as a DataFrame.
        """
        rows = []
        for feature, result in drift_results['features'].items():
            row = {'feature': feature, **result}
            rows.append(row)
        
        return pd.DataFrame(rows).sort_values('is_drifted', ascending=False)

# Test drift detector
drift_detector = DriftDetector()
print("\nDriftDetector created!")
print("\nMethods:")
print("  - KS Test (numeric)")
print("  - Chi-Square Test (categorical)")
print("  - PSI (Population Stability Index)")
print("  - Wasserstein Distance")

## 7. Model Performance Monitoring

Track model performance over time and alert on degradation.

In [None]:
class PerformanceMonitor:
    """
    Monitor model performance over time with alerting.
    
    Features:
    - Track metrics over time
    - Detect performance degradation
    - Generate alerts
    - Visualize trends
    """
    
    def __init__(self, monitor_dir: str = './mlops_artifacts/monitoring'):
        self.monitor_dir = Path(monitor_dir)
        self.monitor_dir.mkdir(parents=True, exist_ok=True)
        
        self.metrics_history = defaultdict(list)
        self.alerts = []
        self.thresholds = {}
        self.baseline_metrics = {}
    
    def set_baseline(self, metrics: Dict[str, float]):
        """
        Set baseline metrics for comparison.
        """
        self.baseline_metrics = metrics
        print(f"Baseline set: {metrics}")
    
    def set_threshold(self, metric_name: str, min_value: float = None, 
                      max_value: float = None, max_degradation: float = None):
        """
        Set alerting thresholds for a metric.
        
        Args:
            metric_name: Name of the metric
            min_value: Alert if metric falls below this
            max_value: Alert if metric exceeds this
            max_degradation: Max allowed % degradation from baseline
        """
        self.thresholds[metric_name] = {
            'min_value': min_value,
            'max_value': max_value,
            'max_degradation': max_degradation
        }
    
    def log_metrics(self, metrics: Dict[str, float], 
                    timestamp: datetime = None,
                    model_version: str = None) -> List[Alert]:
        """
        Log metrics and check for alerts.
        
        Returns:
            List of triggered alerts
        """
        timestamp = timestamp or datetime.now()
        new_alerts = []
        
        for metric_name, value in metrics.items():
            # Store metric
            self.metrics_history[metric_name].append({
                'value': value,
                'timestamp': timestamp.isoformat(),
                'model_version': model_version
            })
            
            # Check thresholds
            if metric_name in self.thresholds:
                alert = self._check_threshold(metric_name, value, timestamp)
                if alert:
                    new_alerts.append(alert)
                    self.alerts.append(alert)
        
        return new_alerts
    
    def _check_threshold(self, metric_name: str, value: float, 
                         timestamp: datetime) -> Optional[Alert]:
        """
        Check if metric violates thresholds.
        """
        threshold = self.thresholds[metric_name]
        
        # Check minimum value
        if threshold['min_value'] is not None and value < threshold['min_value']:
            return Alert(
                alert_id=str(uuid.uuid4())[:8],
                timestamp=timestamp.isoformat(),
                severity=AlertSeverity.CRITICAL.value,
                message=f"{metric_name} ({value:.4f}) below minimum threshold ({threshold['min_value']})",
                metric_name=metric_name,
                metric_value=value,
                threshold=threshold['min_value']
            )
        
        # Check maximum value
        if threshold['max_value'] is not None and value > threshold['max_value']:
            return Alert(
                alert_id=str(uuid.uuid4())[:8],
                timestamp=timestamp.isoformat(),
                severity=AlertSeverity.WARNING.value,
                message=f"{metric_name} ({value:.4f}) above maximum threshold ({threshold['max_value']})",
                metric_name=metric_name,
                metric_value=value,
                threshold=threshold['max_value']
            )
        
        # Check degradation from baseline
        if threshold['max_degradation'] is not None and metric_name in self.baseline_metrics:
            baseline = self.baseline_metrics[metric_name]
            degradation = (baseline - value) / baseline * 100
            
            if degradation > threshold['max_degradation']:
                return Alert(
                    alert_id=str(uuid.uuid4())[:8],
                    timestamp=timestamp.isoformat(),
                    severity=AlertSeverity.WARNING.value,
                    message=f"{metric_name} degraded by {degradation:.1f}% from baseline",
                    metric_name=metric_name,
                    metric_value=value,
                    threshold=baseline * (1 - threshold['max_degradation']/100)
                )
        
        return None
    
    def get_metrics_df(self, metric_name: str = None) -> pd.DataFrame:
        """
        Get metrics history as DataFrame.
        """
        rows = []
        metrics_to_get = [metric_name] if metric_name else self.metrics_history.keys()
        
        for name in metrics_to_get:
            if name in self.metrics_history:
                for entry in self.metrics_history[name]:
                    rows.append({
                        'metric': name,
                        'value': entry['value'],
                        'timestamp': entry['timestamp'],
                        'model_version': entry.get('model_version')
                    })
        
        return pd.DataFrame(rows)
    
    def get_alerts_df(self) -> pd.DataFrame:
        """Get alerts as DataFrame."""
        return pd.DataFrame([asdict(a) for a in self.alerts])
    
    def plot_metrics(self, metric_names: List[str] = None, figsize=(12, 6)):
        """
        Plot metrics over time.
        """
        metric_names = metric_names or list(self.metrics_history.keys())
        
        fig, axes = plt.subplots(len(metric_names), 1, figsize=(figsize[0], figsize[1] * len(metric_names)))
        if len(metric_names) == 1:
            axes = [axes]
        
        for ax, metric_name in zip(axes, metric_names):
            if metric_name in self.metrics_history:
                values = [m['value'] for m in self.metrics_history[metric_name]]
                timestamps = range(len(values))
                
                ax.plot(timestamps, values, 'b-o', label=metric_name)
                
                # Plot baseline
                if metric_name in self.baseline_metrics:
                    ax.axhline(y=self.baseline_metrics[metric_name], 
                              color='g', linestyle='--', label='Baseline')
                
                # Plot threshold
                if metric_name in self.thresholds:
                    if self.thresholds[metric_name]['min_value']:
                        ax.axhline(y=self.thresholds[metric_name]['min_value'],
                                  color='r', linestyle='--', label='Min Threshold')
                
                ax.set_xlabel('Time')
                ax.set_ylabel(metric_name)
                ax.set_title(f'{metric_name} Over Time')
                ax.legend()
                ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        return fig
    
    def check_health(self) -> Dict:
        """
        Check overall model health.
        """
        health = {
            'status': 'healthy',
            'issues': [],
            'recent_alerts': len([a for a in self.alerts[-10:] 
                                  if a.severity == AlertSeverity.CRITICAL.value])
        }
        
        # Check recent metrics
        for metric_name, history in self.metrics_history.items():
            if history:
                recent_value = history[-1]['value']
                
                if metric_name in self.baseline_metrics:
                    baseline = self.baseline_metrics[metric_name]
                    degradation = (baseline - recent_value) / baseline * 100
                    
                    if degradation > 20:
                        health['status'] = 'degraded'
                        health['issues'].append(f"{metric_name} degraded by {degradation:.1f}%")
                    elif degradation > 10:
                        if health['status'] == 'healthy':
                            health['status'] = 'warning'
                        health['issues'].append(f"{metric_name} degraded by {degradation:.1f}%")
        
        if health['recent_alerts'] > 3:
            health['status'] = 'critical'
        
        return health

# Test performance monitor
monitor = PerformanceMonitor()
print("\nPerformanceMonitor created!")
print("\nCapabilities:")
print("  - log_metrics()")
print("  - set_threshold()")
print("  - check_health()")
print("  - plot_metrics()")

## 8. Automated Retraining Pipeline

Automatically retrain models when drift is detected or performance degrades.

In [None]:
class RetrainingPipeline:
    """
    Automated model retraining pipeline.
    
    Triggers:
    - Data drift detected
    - Performance below threshold
    - Scheduled retraining
    """
    
    def __init__(self, model_registry: ModelRegistry,
                 drift_detector: DriftDetector,
                 performance_monitor: PerformanceMonitor,
                 experiment_tracker: ExperimentTracker):
        
        self.registry = model_registry
        self.drift_detector = drift_detector
        self.monitor = performance_monitor
        self.tracker = experiment_tracker
        
        self.retrain_history = []
        self.model_factory = None  # Function to create new model
    
    def set_model_factory(self, factory_fn):
        """
        Set the function used to create new models.
        
        Args:
            factory_fn: Function that returns an untrained model
        """
        self.model_factory = factory_fn
    
    def check_retrain_triggers(self, X_current: pd.DataFrame, 
                                y_current: pd.Series = None,
                                model_name: str = None) -> Dict:
        """
        Check if retraining should be triggered.
        
        Returns:
            Dict with trigger status and reasons
        """
        triggers = {
            'should_retrain': False,
            'reasons': [],
            'drift_detected': False,
            'performance_degraded': False
        }
        
        # Check data drift
        if self.drift_detector.reference_data is not None:
            drift_results = self.drift_detector.detect_drift(X_current)
            if drift_results['overall_drift']:
                triggers['should_retrain'] = True
                triggers['drift_detected'] = True
                triggers['reasons'].append(
                    f"Data drift detected: {drift_results['n_drifted_features']} features drifted"
                )
        
        # Check performance
        health = self.monitor.check_health()
        if health['status'] in ['degraded', 'critical']:
            triggers['should_retrain'] = True
            triggers['performance_degraded'] = True
            triggers['reasons'].extend(health['issues'])
        
        return triggers
    
    def retrain(self, X_train: pd.DataFrame, y_train: pd.Series,
                X_val: pd.DataFrame, y_val: pd.Series,
                model_name: str,
                reason: str = "Manual retrain") -> str:
        """
        Retrain the model with new data.
        
        Returns:
            new_version: Version of the newly trained model
        """
        if self.model_factory is None:
            raise ValueError("Model factory not set. Call set_model_factory() first.")
        
        print(f"\n{'='*50}")
        print(f"RETRAINING TRIGGERED")
        print(f"Reason: {reason}")
        print(f"{'='*50}")
        
        # Start experiment tracking
        run_id = self.tracker.start_run(
            experiment_name=f"{model_name}_retrain",
            run_name=f"retrain_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
        )
        
        try:
            # Create and train new model
            model = self.model_factory()
            
            # Log parameters
            if hasattr(model, 'get_params'):
                self.tracker.log_params(model.get_params())
            
            self.tracker.log_param('train_samples', len(X_train))
            self.tracker.log_param('retrain_reason', reason)
            
            # Train
            print("Training new model...")
            model.fit(X_train, y_train)
            
            # Evaluate
            y_pred = model.predict(X_val)
            
            metrics = {
                'accuracy': accuracy_score(y_val, y_pred),
                'precision': precision_score(y_val, y_pred, average='weighted'),
                'recall': recall_score(y_val, y_pred, average='weighted'),
                'f1': f1_score(y_val, y_pred, average='weighted')
            }
            
            # Log metrics
            self.tracker.log_metrics(metrics)
            self.tracker.set_tag('retrain_reason', reason)
            
            print(f"New model metrics: {metrics}")
            
            # Register new model
            new_version = self.registry.register_model(
                model=model,
                model_name=model_name,
                metrics=metrics,
                parameters=model.get_params() if hasattr(model, 'get_params') else {},
                description=f"Retrained due to: {reason}",
                tags={'retrain_reason': reason}
            )
            
            # Update reference data for drift detection
            self.drift_detector.set_reference(X_train)
            
            # Update baseline metrics
            self.monitor.set_baseline(metrics)
            
            # Log retrain event
            self.retrain_history.append({
                'timestamp': datetime.now().isoformat(),
                'model_name': model_name,
                'new_version': new_version,
                'reason': reason,
                'metrics': metrics
            })
            
            self.tracker.end_run('completed')
            
            print(f"\nNew model registered: {model_name} {new_version}")
            return new_version
            
        except Exception as e:
            self.tracker.end_run('failed')
            raise e
    
    def auto_retrain_if_needed(self, X_current: pd.DataFrame,
                                y_current: pd.Series,
                                X_train: pd.DataFrame,
                                y_train: pd.Series,
                                model_name: str) -> Optional[str]:
        """
        Automatically retrain if triggers are met.
        
        Returns:
            new_version if retrained, None otherwise
        """
        triggers = self.check_retrain_triggers(X_current, y_current, model_name)
        
        if triggers['should_retrain']:
            reason = "; ".join(triggers['reasons'])
            
            # Split training data for validation
            X_tr, X_val, y_tr, y_val = train_test_split(
                X_train, y_train, test_size=0.2, random_state=42
            )
            
            return self.retrain(X_tr, y_tr, X_val, y_val, model_name, reason)
        
        print("No retraining needed.")
        return None
    
    def get_retrain_history(self) -> pd.DataFrame:
        """Get retraining history as DataFrame."""
        return pd.DataFrame(self.retrain_history)

print("\nRetrainingPipeline created!")
print("\nTriggers:")
print("  - Data drift detected")
print("  - Performance degradation")
print("  - Manual trigger")

## 9. A/B Testing Framework

Compare model versions statistically.

In [None]:
class ABTester:
    """
    A/B Testing framework for comparing model versions.
    
    Features:
    - Split traffic between models
    - Statistical significance testing
    - Performance comparison
    """
    
    def __init__(self, significance_level: float = 0.05):
        self.significance_level = significance_level
        self.experiments = {}
        self.results_history = []
    
    def create_experiment(self, experiment_name: str,
                          model_a, model_b,
                          traffic_split: float = 0.5):
        """
        Create an A/B experiment.
        
        Args:
            experiment_name: Name of the experiment
            model_a: Control model (current production)
            model_b: Treatment model (challenger)
            traffic_split: Fraction of traffic to model B
        """
        self.experiments[experiment_name] = {
            'model_a': model_a,
            'model_b': model_b,
            'traffic_split': traffic_split,
            'results_a': [],
            'results_b': [],
            'created_at': datetime.now().isoformat(),
            'status': 'running'
        }
        print(f"Created A/B experiment: {experiment_name}")
        print(f"  Traffic split: {(1-traffic_split)*100:.0f}% A / {traffic_split*100:.0f}% B")
    
    def route_prediction(self, experiment_name: str, X: np.ndarray) -> Tuple[np.ndarray, str]:
        """
        Route prediction to appropriate model based on traffic split.
        
        Returns:
            predictions, model_used ('A' or 'B')
        """
        exp = self.experiments[experiment_name]
        
        if np.random.random() < exp['traffic_split']:
            predictions = exp['model_b'].predict(X)
            return predictions, 'B'
        else:
            predictions = exp['model_a'].predict(X)
            return predictions, 'A'
    
    def record_result(self, experiment_name: str, model_used: str,
                      y_true: np.ndarray, y_pred: np.ndarray):
        """
        Record results for an experiment.
        """
        exp = self.experiments[experiment_name]
        
        accuracy = accuracy_score(y_true, y_pred)
        
        result = {
            'timestamp': datetime.now().isoformat(),
            'accuracy': accuracy,
            'n_samples': len(y_true)
        }
        
        if model_used == 'A':
            exp['results_a'].append(result)
        else:
            exp['results_b'].append(result)
    
    def run_comparison(self, experiment_name: str, X: pd.DataFrame, 
                       y: pd.Series, n_iterations: int = 100) -> Dict:
        """
        Run full A/B comparison.
        """
        exp = self.experiments[experiment_name]
        
        # Get predictions from both models
        pred_a = exp['model_a'].predict(X)
        pred_b = exp['model_b'].predict(X)
        
        # Calculate metrics
        metrics_a = {
            'accuracy': accuracy_score(y, pred_a),
            'precision': precision_score(y, pred_a, average='weighted'),
            'recall': recall_score(y, pred_a, average='weighted'),
            'f1': f1_score(y, pred_a, average='weighted')
        }
        
        metrics_b = {
            'accuracy': accuracy_score(y, pred_b),
            'precision': precision_score(y, pred_b, average='weighted'),
            'recall': recall_score(y, pred_b, average='weighted'),
            'f1': f1_score(y, pred_b, average='weighted')
        }
        
        # Bootstrap for statistical testing
        bootstrap_diffs = []
        n_samples = len(y)
        
        for _ in range(n_iterations):
            idx = np.random.choice(n_samples, n_samples, replace=True)
            acc_a = accuracy_score(y.iloc[idx], pred_a[idx])
            acc_b = accuracy_score(y.iloc[idx], pred_b[idx])
            bootstrap_diffs.append(acc_b - acc_a)
        
        # Calculate p-value (proportion of times A is better)
        p_value = np.mean(np.array(bootstrap_diffs) <= 0)
        
        # Determine winner
        diff = metrics_b['accuracy'] - metrics_a['accuracy']
        
        if p_value < self.significance_level and diff > 0:
            winner = 'B'
            conclusion = 'Model B is significantly better'
        elif p_value < self.significance_level and diff < 0:
            winner = 'A'
            conclusion = 'Model A is significantly better'
        else:
            winner = 'None'
            conclusion = 'No significant difference'
        
        result = {
            'experiment_name': experiment_name,
            'metrics_a': metrics_a,
            'metrics_b': metrics_b,
            'accuracy_diff': diff,
            'p_value': p_value,
            'significant': p_value < self.significance_level,
            'winner': winner,
            'conclusion': conclusion,
            'n_samples': n_samples
        }
        
        self.results_history.append(result)
        
        return result
    
    def print_results(self, result: Dict):
        """
        Print A/B test results.
        """
        print(f"\n{'='*50}")
        print(f"A/B TEST RESULTS: {result['experiment_name']}")
        print(f"{'='*50}")
        
        print(f"\nModel A (Control):")
        for metric, value in result['metrics_a'].items():
            print(f"  {metric}: {value:.4f}")
        
        print(f"\nModel B (Treatment):")
        for metric, value in result['metrics_b'].items():
            print(f"  {metric}: {value:.4f}")
        
        print(f"\nStatistical Analysis:")
        print(f"  Accuracy Difference: {result['accuracy_diff']:+.4f}")
        print(f"  P-value: {result['p_value']:.4f}")
        print(f"  Significant: {result['significant']}")
        
        print(f"\nConclusion: {result['conclusion']}")
        print(f"Winner: Model {result['winner']}")

# Test A/B tester
ab_tester = ABTester()
print("\nABTester created!")
print("\nCapabilities:")
print("  - create_experiment()")
print("  - run_comparison()")
print("  - Statistical significance testing")

## 10. Complete MLOps Pipeline

Combine all components into a unified MLOps system.

In [None]:
class MLOpsPipeline:
    """
    Complete MLOps Pipeline combining all components.
    
    Features:
    - End-to-end ML lifecycle management
    - Automated monitoring and retraining
    - Model versioning and registry
    - A/B testing
    """
    
    def __init__(self, project_name: str = "mlops_project"):
        self.project_name = project_name
        self.base_dir = Path(f'./mlops_artifacts/{project_name}')
        self.base_dir.mkdir(parents=True, exist_ok=True)
        
        # Initialize components
        self.tracker = ExperimentTracker(str(self.base_dir / 'experiments'))
        self.registry = ModelRegistry(str(self.base_dir / 'model_registry'))
        self.feature_store = FeatureStore(str(self.base_dir / 'feature_store'))
        self.drift_detector = DriftDetector()
        self.monitor = PerformanceMonitor(str(self.base_dir / 'monitoring'))
        self.ab_tester = ABTester()
        
        # Retraining pipeline
        self.retrain_pipeline = RetrainingPipeline(
            self.registry, self.drift_detector, self.monitor, self.tracker
        )
        
        # State
        self.current_model_name = None
        self.current_model = None
        
        print(f"MLOps Pipeline initialized: {project_name}")
    
    def train_initial_model(self, model, X_train: pd.DataFrame, y_train: pd.Series,
                            X_val: pd.DataFrame, y_val: pd.Series,
                            model_name: str) -> str:
        """
        Train and register initial model.
        """
        print(f"\n{'='*50}")
        print(f"TRAINING INITIAL MODEL: {model_name}")
        print(f"{'='*50}")
        
        # Start experiment
        self.tracker.start_run(f"{model_name}_initial", "initial_training")
        
        # Log parameters
        if hasattr(model, 'get_params'):
            self.tracker.log_params(model.get_params())
        self.tracker.log_param('train_samples', len(X_train))
        
        # Train
        print("Training model...")
        model.fit(X_train, y_train)
        
        # Evaluate
        y_pred = model.predict(X_val)
        metrics = {
            'accuracy': accuracy_score(y_val, y_pred),
            'precision': precision_score(y_val, y_pred, average='weighted'),
            'recall': recall_score(y_val, y_pred, average='weighted'),
            'f1': f1_score(y_val, y_pred, average='weighted')
        }
        
        self.tracker.log_metrics(metrics)
        print(f"Metrics: {metrics}")
        
        # Register model
        version = self.registry.register_model(
            model=model,
            model_name=model_name,
            metrics=metrics,
            parameters=model.get_params() if hasattr(model, 'get_params') else {},
            description="Initial model"
        )
        
        # Set up monitoring
        self.monitor.set_baseline(metrics)
        self.monitor.set_threshold('accuracy', min_value=metrics['accuracy'] * 0.9, max_degradation=10)
        self.monitor.set_threshold('f1', min_value=metrics['f1'] * 0.9, max_degradation=10)
        
        # Set reference data for drift detection
        self.drift_detector.set_reference(X_train)
        
        # Store features
        self.feature_store.register_features(f"{model_name}_features", X_train)
        
        # Update state
        self.current_model_name = model_name
        self.current_model = model
        
        # Promote to production
        self.registry.transition_stage(model_name, version, ModelStage.PRODUCTION)
        
        self.tracker.end_run('completed')
        
        print(f"\nModel {model_name} {version} deployed to production!")
        return version
    
    def predict(self, X: pd.DataFrame) -> np.ndarray:
        """
        Make predictions using production model.
        """
        if self.current_model is None:
            self.current_model = self.registry.get_production_model(self.current_model_name)
        
        return self.current_model.predict(X)
    
    def monitor_and_evaluate(self, X: pd.DataFrame, y: pd.Series,
                              model_version: str = None) -> Dict:
        """
        Monitor model performance on new data.
        """
        print(f"\n{'='*50}")
        print("MONITORING MODEL PERFORMANCE")
        print(f"{'='*50}")
        
        # Check drift
        drift_results = self.drift_detector.detect_drift(X)
        print(f"\nDrift Detection:")
        print(f"  Overall drift: {drift_results['overall_drift']}")
        print(f"  Drifted features: {drift_results['n_drifted_features']}")
        
        # Evaluate performance
        y_pred = self.predict(X)
        metrics = {
            'accuracy': accuracy_score(y, y_pred),
            'precision': precision_score(y, y_pred, average='weighted'),
            'recall': recall_score(y, y_pred, average='weighted'),
            'f1': f1_score(y, y_pred, average='weighted')
        }
        
        # Log metrics and get alerts
        alerts = self.monitor.log_metrics(metrics, model_version=model_version)
        
        print(f"\nPerformance Metrics:")
        for name, value in metrics.items():
            baseline = self.monitor.baseline_metrics.get(name, 0)
            diff = ((value - baseline) / baseline * 100) if baseline else 0
            print(f"  {name}: {value:.4f} ({diff:+.1f}% from baseline)")
        
        if alerts:
            print(f"\nAlerts ({len(alerts)}):")
            for alert in alerts:
                print(f"  [{alert.severity}] {alert.message}")
        
        # Check health
        health = self.monitor.check_health()
        print(f"\nModel Health: {health['status'].upper()}")
        
        return {
            'drift_results': drift_results,
            'metrics': metrics,
            'alerts': alerts,
            'health': health
        }
    
    def auto_retrain_check(self, X_current: pd.DataFrame, y_current: pd.Series,
                           X_train: pd.DataFrame, y_train: pd.Series) -> Optional[str]:
        """
        Check if retraining is needed and trigger if so.
        """
        print(f"\n{'='*50}")
        print("CHECKING RETRAINING TRIGGERS")
        print(f"{'='*50}")
        
        triggers = self.retrain_pipeline.check_retrain_triggers(
            X_current, y_current, self.current_model_name
        )
        
        print(f"Should retrain: {triggers['should_retrain']}")
        if triggers['reasons']:
            print("Reasons:")
            for reason in triggers['reasons']:
                print(f"  - {reason}")
        
        if triggers['should_retrain']:
            # Set model factory if not set
            if self.retrain_pipeline.model_factory is None:
                model_class = type(self.current_model)
                model_params = self.current_model.get_params() if hasattr(self.current_model, 'get_params') else {}
                self.retrain_pipeline.set_model_factory(lambda: model_class(**model_params))
            
            return self.retrain_pipeline.auto_retrain_if_needed(
                X_current, y_current, X_train, y_train, self.current_model_name
            )
        
        return None
    
    def run_ab_test(self, challenger_model, X_test: pd.DataFrame, 
                    y_test: pd.Series, experiment_name: str = None) -> Dict:
        """
        Run A/B test between current production model and challenger.
        """
        experiment_name = experiment_name or f"ab_test_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
        
        self.ab_tester.create_experiment(
            experiment_name,
            model_a=self.current_model,
            model_b=challenger_model
        )
        
        result = self.ab_tester.run_comparison(experiment_name, X_test, y_test)
        self.ab_tester.print_results(result)
        
        return result
    
    def get_dashboard_data(self) -> Dict:
        """
        Get data for monitoring dashboard.
        """
        return {
            'model_name': self.current_model_name,
            'production_version': self.registry.registry['models'].get(
                self.current_model_name, {}).get('production_version'),
            'health': self.monitor.check_health(),
            'recent_alerts': self.monitor.get_alerts_df().tail(10).to_dict('records'),
            'metrics_history': self.monitor.get_metrics_df().to_dict('records'),
            'retrain_history': self.retrain_pipeline.get_retrain_history().to_dict('records')
        }

print("="*60)
print("MLOPS PIPELINE READY!")
print("="*60)

## 11. Demo: Full MLOps Simulation

Let's simulate a complete MLOps workflow with drift and retraining!

In [None]:
# Create synthetic dataset
print("Creating synthetic dataset for MLOps demo...")

X, y = make_classification(
    n_samples=5000, n_features=20, n_informative=15,
    n_redundant=5, random_state=42
)

X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(20)])
y = pd.Series(y)

# Split data
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

print(f"Train: {len(X_train)}, Val: {len(X_val)}, Test: {len(X_test)}")

In [None]:
# Initialize MLOps Pipeline
mlops = MLOpsPipeline(project_name="fraud_detection_demo")

# Train initial model
initial_model = RandomForestClassifier(n_estimators=100, random_state=42)
version = mlops.train_initial_model(
    model=initial_model,
    X_train=X_train,
    y_train=y_train,
    X_val=X_val,
    y_val=y_val,
    model_name="fraud_classifier"
)

In [None]:
# Monitor on test data (no drift expected)
print("\n" + "="*60)
print("WEEK 1: Monitoring on similar data (no drift)")
print("="*60)

results_week1 = mlops.monitor_and_evaluate(X_test, y_test)

In [None]:
# Simulate data drift
print("\n" + "="*60)
print("WEEK 4: Simulating data drift...")
print("="*60)

# Create drifted data
X_drifted = X_test.copy()
# Shift some features
X_drifted['feature_0'] = X_drifted['feature_0'] + 2  # Mean shift
X_drifted['feature_1'] = X_drifted['feature_1'] * 1.5  # Variance change
X_drifted['feature_2'] = X_drifted['feature_2'] + np.random.normal(0, 1, len(X_drifted))  # Add noise

# Monitor on drifted data
results_week4 = mlops.monitor_and_evaluate(X_drifted, y_test)

In [None]:
# Check if retraining is needed
new_version = mlops.auto_retrain_check(
    X_current=X_drifted,
    y_current=y_test,
    X_train=pd.concat([X_train, X_drifted]),  # Include new data
    y_train=pd.concat([y_train, y_test])
)

In [None]:
# Run A/B test with a challenger model
print("\n" + "="*60)
print("A/B TEST: RandomForest vs GradientBoosting")
print("="*60)

# Train challenger
challenger = GradientBoostingClassifier(n_estimators=100, random_state=42)
challenger.fit(X_train, y_train)

# Run A/B test
ab_result = mlops.run_ab_test(challenger, X_test, y_test)

In [None]:
# Visualize monitoring results
fig = mlops.monitor.plot_metrics(['accuracy', 'f1'])
plt.show()

In [None]:
# View model registry
print("\nModel Registry:")
print("="*50)
print(f"Registered models: {mlops.registry.list_models()}")
print(f"\nVersions for fraud_classifier:")
print(mlops.registry.compare_versions('fraud_classifier'))

In [None]:
# View experiment tracking
print("\nExperiment Tracking:")
print("="*50)
print(mlops.tracker.get_runs_df())

In [None]:
# View drift report
if results_week4['drift_results']['features']:
    print("\nDrift Report:")
    print("="*50)
    drift_df = mlops.drift_detector.get_drift_report(results_week4['drift_results'])
    print(drift_df[['feature', 'is_drifted', 'type']].head(10))

## 12. Summary

### What We Built

A complete **MLOps Pipeline** that handles the entire ML lifecycle:

| Component | Functionality |
|-----------|---------------|
| `ExperimentTracker` | Log params, metrics, artifacts (like MLflow) |
| `ModelRegistry` | Version control, stage transitions |
| `FeatureStore` | Feature storage and statistics |
| `DriftDetector` | KS test, Chi-square, PSI for drift |
| `PerformanceMonitor` | Track metrics, generate alerts |
| `RetrainingPipeline` | Auto-retrain on triggers |
| `ABTester` | Statistical model comparison |
| `MLOpsPipeline` | Unified orchestration |

### Key Concepts Covered

1. **Model Versioning**: Track and manage model versions
2. **Experiment Tracking**: Log all experiments reproducibly
3. **Drift Detection**: Detect data distribution changes
4. **Performance Monitoring**: Track metrics and alert on degradation
5. **Automated Retraining**: Trigger retraining based on rules
6. **A/B Testing**: Statistically compare model versions

### Production Tools Mapping

| Our Component | Production Equivalent |
|---------------|----------------------|
| ExperimentTracker | MLflow, Weights & Biases |
| ModelRegistry | MLflow Model Registry, SageMaker |
| FeatureStore | Feast, Tecton |
| DriftDetector | Evidently, WhyLabs |
| PerformanceMonitor | Prometheus + Grafana |
| RetrainingPipeline | Airflow, Kubeflow Pipelines |

In [None]:
# Final summary
print("="*60)
print("MLOPS PIPELINE - COMPLETE!")
print("="*60)

print("""
Components Built (100% Kaggle Compatible):
──────────────────────────────────────────
1. ExperimentTracker  - Log experiments (like MLflow)
2. ModelRegistry      - Version control for models
3. FeatureStore       - Feature management
4. DriftDetector      - Statistical drift detection
5. PerformanceMonitor - Metrics tracking & alerting
6. RetrainingPipeline - Automated retraining
7. ABTester           - A/B testing framework
8. MLOpsPipeline      - Complete orchestration

Drift Detection Methods:
────────────────────────
• KS Test (Kolmogorov-Smirnov) - Numeric features
• Chi-Square Test - Categorical features
• PSI (Population Stability Index)
• Wasserstein Distance

Retraining Triggers:
────────────────────
• Data drift detected (>20% features drifted)
• Performance degradation (>10% from baseline)
• Manual trigger

This MLOps framework can be used as a foundation for
production ML systems or as a learning tool!
""")