# Day 01: MLOps Fundamentals for Trading Systems

## Week 23 - Production ML

**Learning Objectives:**
- Understand MLOps principles and their application in quantitative trading
- Build automated ML pipelines for trading models
- Implement model versioning, tracking, and registry with MLflow
- Create feature stores for consistent feature engineering
- Set up monitoring for data drift and model performance degradation
- Design automated retraining triggers based on performance metrics

**Key Concepts:**
- ML Lifecycle Management
- Experiment Tracking & Model Registry
- Feature Store Architecture
- Data & Model Monitoring
- CI/CD for ML Systems
- Automated Retraining Pipelines

---

## 1. Import Required Libraries

Essential libraries for MLOps workflows including experiment tracking, pipeline orchestration, and monitoring.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# ML libraries
from sklearn.model_selection import train_test_split, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    classification_report, confusion_matrix, roc_auc_score
)

# Serialization and utilities
import joblib
import json
import hashlib
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Any
from dataclasses import dataclass, field
from abc import ABC, abstractmethod
import logging
from collections import defaultdict
import uuid

# Statistical tests for drift detection
from scipy import stats

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("‚úÖ All libraries imported successfully!")
print(f"üìÖ Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. MLOps Architecture Overview

**MLOps** bridges the gap between ML development and production deployment. In trading systems, this is critical because:

1. **Model Staleness**: Markets evolve - models trained on historical data degrade
2. **Data Quality**: Real-time data can have quality issues not seen in backtests
3. **Regulatory Compliance**: Need audit trails for model decisions
4. **Risk Management**: Poor model performance can lead to significant losses

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    MLOps Architecture for Trading                    ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                      ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îÇ
‚îÇ  ‚îÇ  Data    ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ Feature  ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Model   ‚îÇ‚îÄ‚îÄ‚ñ∂‚îÇ  Model   ‚îÇ         ‚îÇ
‚îÇ  ‚îÇ  Source  ‚îÇ   ‚îÇ  Store   ‚îÇ   ‚îÇ Training ‚îÇ   ‚îÇ Registry ‚îÇ         ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ
‚îÇ       ‚îÇ              ‚îÇ              ‚îÇ              ‚îÇ                 ‚îÇ
‚îÇ       ‚ñº              ‚ñº              ‚ñº              ‚ñº                 ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îÇ
‚îÇ  ‚îÇ              Experiment Tracking (MLflow)             ‚îÇ          ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ
‚îÇ                              ‚îÇ                                       ‚îÇ
‚îÇ       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê              ‚îÇ
‚îÇ       ‚ñº                      ‚ñº                      ‚ñº              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îÇ
‚îÇ  ‚îÇ  Model   ‚îÇ          ‚îÇ Monitoring‚îÇ          ‚îÇ Retraining‚îÇ         ‚îÇ
‚îÇ  ‚îÇ Serving  ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ  & Drift  ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Trigger  ‚îÇ         ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îÇ
‚îÇ       ‚îÇ                      ‚ñ≤                                       ‚îÇ
‚îÇ       ‚ñº                      ‚îÇ                                       ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                                ‚îÇ
‚îÇ  ‚îÇ Trading  ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇPrediction ‚îÇ                                ‚îÇ
‚îÇ  ‚îÇ Signals  ‚îÇ          ‚îÇ  Logs     ‚îÇ                                ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                                ‚îÇ
‚îÇ                                                                      ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## 3. Generate Synthetic Trading Data

Create realistic trading data for demonstrating MLOps concepts.

In [None]:
def generate_trading_data(
    n_samples: int = 5000,
    start_date: str = '2020-01-01',
    seed: int = 42
) -> pd.DataFrame:
    """
    Generate synthetic trading data with technical features.
    
    Features include price-based features, volume, volatility,
    and momentum indicators commonly used in trading models.
    """
    np.random.seed(seed)
    
    # Generate date index
    dates = pd.date_range(start=start_date, periods=n_samples, freq='H')
    
    # Base price simulation (geometric brownian motion)
    returns = np.random.normal(0.0001, 0.02, n_samples)
    price = 100 * np.exp(np.cumsum(returns))
    
    # Generate OHLCV data
    high = price * (1 + np.abs(np.random.normal(0, 0.01, n_samples)))
    low = price * (1 - np.abs(np.random.normal(0, 0.01, n_samples)))
    open_price = price + np.random.normal(0, 0.5, n_samples)
    volume = np.random.lognormal(15, 1, n_samples)
    
    df = pd.DataFrame({
        'timestamp': dates,
        'open': open_price,
        'high': high,
        'low': low,
        'close': price,
        'volume': volume
    })
    
    # Technical indicators
    df['returns'] = df['close'].pct_change()
    df['log_returns'] = np.log(df['close'] / df['close'].shift(1))
    
    # Moving averages
    for window in [5, 10, 20, 50]:
        df[f'sma_{window}'] = df['close'].rolling(window).mean()
        df[f'ema_{window}'] = df['close'].ewm(span=window).mean()
    
    # Volatility features
    df['volatility_20'] = df['returns'].rolling(20).std()
    df['volatility_50'] = df['returns'].rolling(50).std()
    
    # Momentum features
    df['momentum_5'] = df['close'] / df['close'].shift(5) - 1
    df['momentum_10'] = df['close'] / df['close'].shift(10) - 1
    df['momentum_20'] = df['close'] / df['close'].shift(20) - 1
    
    # RSI
    delta = df['close'].diff()
    gain = (delta.where(delta > 0, 0)).rolling(14).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
    rs = gain / loss
    df['rsi'] = 100 - (100 / (1 + rs))
    
    # MACD
    exp1 = df['close'].ewm(span=12).mean()
    exp2 = df['close'].ewm(span=26).mean()
    df['macd'] = exp1 - exp2
    df['macd_signal'] = df['macd'].ewm(span=9).mean()
    
    # Bollinger Bands
    df['bb_middle'] = df['close'].rolling(20).mean()
    df['bb_std'] = df['close'].rolling(20).std()
    df['bb_upper'] = df['bb_middle'] + 2 * df['bb_std']
    df['bb_lower'] = df['bb_middle'] - 2 * df['bb_std']
    df['bb_position'] = (df['close'] - df['bb_lower']) / (df['bb_upper'] - df['bb_lower'])
    
    # Volume features
    df['volume_sma_20'] = df['volume'].rolling(20).mean()
    df['volume_ratio'] = df['volume'] / df['volume_sma_20']
    
    # Target: Next period direction (1 = up, 0 = down)
    df['target'] = (df['close'].shift(-1) > df['close']).astype(int)
    
    # Drop NaN values
    df = df.dropna().reset_index(drop=True)
    
    return df

# Generate data
df = generate_trading_data(n_samples=5000)
print(f"‚úÖ Generated {len(df)} samples")
print(f"üìä Features: {df.shape[1]} columns")
print(f"üìÖ Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"\nüéØ Target distribution:")
print(df['target'].value_counts(normalize=True))
df.head()

## 4. Data Validation and Quality Checks

Robust data validation is critical in production trading systems. Bad data can lead to incorrect signals and significant losses.

In [None]:
@dataclass
class ValidationResult:
    """Result of a data validation check."""
    check_name: str
    passed: bool
    message: str
    severity: str = "ERROR"  # ERROR, WARNING, INFO
    details: Dict = field(default_factory=dict)


class DataValidator:
    """
    Comprehensive data validation for trading data.
    
    Validates:
    - Schema (required columns, data types)
    - Data quality (missing values, duplicates)
    - Statistical properties (outliers, distribution)
    - Temporal consistency (gaps, ordering)
    """
    
    def __init__(self, config: Optional[Dict] = None):
        self.config = config or self._default_config()
        self.validation_results: List[ValidationResult] = []
    
    def _default_config(self) -> Dict:
        return {
            'required_columns': ['timestamp', 'open', 'high', 'low', 'close', 'volume'],
            'numeric_columns': ['open', 'high', 'low', 'close', 'volume'],
            'max_missing_pct': 0.01,  # 1%
            'max_outlier_zscore': 5.0,
            'min_samples': 100,
            'max_gap_hours': 24,
        }
    
    def validate(self, df: pd.DataFrame) -> Tuple[bool, List[ValidationResult]]:
        """Run all validation checks."""
        self.validation_results = []
        
        # Schema validation
        self._check_required_columns(df)
        self._check_data_types(df)
        
        # Data quality
        self._check_missing_values(df)
        self._check_duplicates(df)
        self._check_sample_size(df)
        
        # Statistical validation
        self._check_price_consistency(df)
        self._check_outliers(df)
        self._check_value_ranges(df)
        
        # Temporal validation
        self._check_timestamp_ordering(df)
        self._check_data_gaps(df)
        
        # Calculate overall pass/fail
        errors = [r for r in self.validation_results if not r.passed and r.severity == "ERROR"]
        overall_passed = len(errors) == 0
        
        return overall_passed, self.validation_results
    
    def _check_required_columns(self, df: pd.DataFrame):
        """Check that all required columns are present."""
        missing = set(self.config['required_columns']) - set(df.columns)
        passed = len(missing) == 0
        self.validation_results.append(ValidationResult(
            check_name="required_columns",
            passed=passed,
            message=f"Missing columns: {missing}" if not passed else "All required columns present",
            severity="ERROR",
            details={'missing_columns': list(missing)}
        ))
    
    def _check_data_types(self, df: pd.DataFrame):
        """Validate data types of numeric columns."""
        invalid_types = {}
        for col in self.config['numeric_columns']:
            if col in df.columns and not pd.api.types.is_numeric_dtype(df[col]):
                invalid_types[col] = str(df[col].dtype)
        
        passed = len(invalid_types) == 0
        self.validation_results.append(ValidationResult(
            check_name="data_types",
            passed=passed,
            message=f"Invalid types: {invalid_types}" if not passed else "All data types valid",
            severity="ERROR",
            details={'invalid_types': invalid_types}
        ))
    
    def _check_missing_values(self, df: pd.DataFrame):
        """Check for excessive missing values."""
        missing_pct = df.isnull().sum() / len(df)
        high_missing = missing_pct[missing_pct > self.config['max_missing_pct']]
        
        passed = len(high_missing) == 0
        self.validation_results.append(ValidationResult(
            check_name="missing_values",
            passed=passed,
            message=f"Columns with >1% missing: {list(high_missing.index)}" if not passed else "Missing values within threshold",
            severity="WARNING",
            details={'missing_percentages': high_missing.to_dict()}
        ))
    
    def _check_duplicates(self, df: pd.DataFrame):
        """Check for duplicate timestamps."""
        if 'timestamp' in df.columns:
            n_duplicates = df['timestamp'].duplicated().sum()
            passed = n_duplicates == 0
            self.validation_results.append(ValidationResult(
                check_name="duplicates",
                passed=passed,
                message=f"Found {n_duplicates} duplicate timestamps" if not passed else "No duplicates found",
                severity="WARNING",
                details={'n_duplicates': int(n_duplicates)}
            ))
    
    def _check_sample_size(self, df: pd.DataFrame):
        """Ensure minimum sample size."""
        passed = len(df) >= self.config['min_samples']
        self.validation_results.append(ValidationResult(
            check_name="sample_size",
            passed=passed,
            message=f"Only {len(df)} samples (min: {self.config['min_samples']})" if not passed else f"Sample size OK ({len(df)})",
            severity="ERROR",
            details={'n_samples': len(df)}
        ))
    
    def _check_price_consistency(self, df: pd.DataFrame):
        """Check OHLC price consistency (high >= low, etc.)."""
        if all(col in df.columns for col in ['open', 'high', 'low', 'close']):
            violations = (
                (df['high'] < df['low']) | 
                (df['high'] < df['open']) | 
                (df['high'] < df['close']) |
                (df['low'] > df['open']) | 
                (df['low'] > df['close'])
            ).sum()
            
            passed = violations == 0
            self.validation_results.append(ValidationResult(
                check_name="price_consistency",
                passed=passed,
                message=f"Found {violations} OHLC inconsistencies" if not passed else "OHLC data consistent",
                severity="ERROR",
                details={'n_violations': int(violations)}
            ))
    
    def _check_outliers(self, df: pd.DataFrame):
        """Check for extreme outliers using z-score."""
        outlier_cols = {}
        for col in self.config['numeric_columns']:
            if col in df.columns:
                z_scores = np.abs(stats.zscore(df[col].dropna()))
                n_outliers = (z_scores > self.config['max_outlier_zscore']).sum()
                if n_outliers > 0:
                    outlier_cols[col] = int(n_outliers)
        
        passed = len(outlier_cols) == 0
        self.validation_results.append(ValidationResult(
            check_name="outliers",
            passed=passed,
            message=f"Outliers detected: {outlier_cols}" if not passed else "No extreme outliers",
            severity="WARNING",
            details={'outlier_counts': outlier_cols}
        ))
    
    def _check_value_ranges(self, df: pd.DataFrame):
        """Check for non-positive prices or volumes."""
        issues = {}
        for col in ['open', 'high', 'low', 'close']:
            if col in df.columns:
                n_negative = (df[col] <= 0).sum()
                if n_negative > 0:
                    issues[col] = int(n_negative)
        
        if 'volume' in df.columns:
            n_negative = (df['volume'] < 0).sum()
            if n_negative > 0:
                issues['volume'] = int(n_negative)
        
        passed = len(issues) == 0
        self.validation_results.append(ValidationResult(
            check_name="value_ranges",
            passed=passed,
            message=f"Invalid values: {issues}" if not passed else "All values in valid range",
            severity="ERROR",
            details={'invalid_counts': issues}
        ))
    
    def _check_timestamp_ordering(self, df: pd.DataFrame):
        """Check that timestamps are in order."""
        if 'timestamp' in df.columns:
            is_sorted = df['timestamp'].is_monotonic_increasing
            self.validation_results.append(ValidationResult(
                check_name="timestamp_ordering",
                passed=is_sorted,
                message="Timestamps not in chronological order" if not is_sorted else "Timestamps properly ordered",
                severity="ERROR"
            ))
    
    def _check_data_gaps(self, df: pd.DataFrame):
        """Check for large gaps in time series."""
        if 'timestamp' in df.columns:
            gaps = df['timestamp'].diff()
            max_gap_hours = gaps.max().total_seconds() / 3600 if len(gaps) > 0 else 0
            
            passed = max_gap_hours <= self.config['max_gap_hours']
            self.validation_results.append(ValidationResult(
                check_name="data_gaps",
                passed=passed,
                message=f"Max gap: {max_gap_hours:.1f} hours" if not passed else f"Data gaps within threshold ({max_gap_hours:.1f}h)",
                severity="WARNING",
                details={'max_gap_hours': max_gap_hours}
            ))
    
    def summary(self) -> pd.DataFrame:
        """Get validation summary as DataFrame."""
        return pd.DataFrame([
            {
                'Check': r.check_name,
                'Passed': '‚úÖ' if r.passed else '‚ùå',
                'Severity': r.severity,
                'Message': r.message
            }
            for r in self.validation_results
        ])


# Run validation on our data
validator = DataValidator()
passed, results = validator.validate(df)

print(f"{'‚úÖ All checks passed!' if passed else '‚ùå Some checks failed!'}\n")
validator.summary()

## 5. Feature Store Implementation

A **Feature Store** is a centralized repository for storing, managing, and serving ML features. Key benefits:
- **Consistency**: Same features for training and inference
- **Reusability**: Features can be shared across models
- **Versioning**: Track feature changes over time
- **Freshness**: Ensure features are up-to-date

In [None]:
@dataclass
class FeatureMetadata:
    """Metadata for a feature set."""
    name: str
    version: str
    created_at: datetime
    features: List[str]
    description: str
    data_hash: str
    n_samples: int
    statistics: Dict[str, Dict] = field(default_factory=dict)


class FeatureStore:
    """
    Simple feature store implementation for trading features.
    
    In production, you'd use tools like:
    - Feast (open source)
    - Tecton (managed)
    - AWS Feature Store
    - Databricks Feature Store
    """
    
    def __init__(self, storage_path: str = "./feature_store"):
        self.storage_path = Path(storage_path)
        self.storage_path.mkdir(parents=True, exist_ok=True)
        self.metadata_file = self.storage_path / "metadata.json"
        self.metadata: Dict[str, Dict] = self._load_metadata()
    
    def _load_metadata(self) -> Dict:
        """Load feature store metadata."""
        if self.metadata_file.exists():
            with open(self.metadata_file, 'r') as f:
                return json.load(f)
        return {}
    
    def _save_metadata(self):
        """Persist metadata to disk."""
        with open(self.metadata_file, 'w') as f:
            json.dump(self.metadata, f, indent=2, default=str)
    
    def _compute_hash(self, df: pd.DataFrame) -> str:
        """Compute hash of DataFrame for versioning."""
        return hashlib.md5(
            pd.util.hash_pandas_object(df).values.tobytes()
        ).hexdigest()[:12]
    
    def _compute_statistics(self, df: pd.DataFrame, features: List[str]) -> Dict:
        """Compute feature statistics for monitoring."""
        stats = {}
        for feature in features:
            if feature in df.columns and pd.api.types.is_numeric_dtype(df[feature]):
                stats[feature] = {
                    'mean': float(df[feature].mean()),
                    'std': float(df[feature].std()),
                    'min': float(df[feature].min()),
                    'max': float(df[feature].max()),
                    'median': float(df[feature].median()),
                    'q25': float(df[feature].quantile(0.25)),
                    'q75': float(df[feature].quantile(0.75)),
                }
        return stats
    
    def register_feature_set(
        self,
        name: str,
        df: pd.DataFrame,
        features: List[str],
        description: str = "",
        version: Optional[str] = None
    ) -> FeatureMetadata:
        """
        Register a new feature set in the store.
        
        Args:
            name: Name of the feature set
            df: DataFrame containing features
            features: List of feature column names
            description: Description of the feature set
            version: Optional version string (auto-generated if not provided)
        
        Returns:
            FeatureMetadata object
        """
        # Compute version from data hash if not provided
        data_hash = self._compute_hash(df[features])
        if version is None:
            version = f"v_{data_hash}"
        
        # Compute statistics
        statistics = self._compute_statistics(df, features)
        
        # Create metadata
        metadata = FeatureMetadata(
            name=name,
            version=version,
            created_at=datetime.now(),
            features=features,
            description=description,
            data_hash=data_hash,
            n_samples=len(df),
            statistics=statistics
        )
        
        # Save feature data
        feature_path = self.storage_path / name / version
        feature_path.mkdir(parents=True, exist_ok=True)
        df[features].to_parquet(feature_path / "features.parquet")
        
        # Update metadata
        if name not in self.metadata:
            self.metadata[name] = {'versions': {}}
        
        self.metadata[name]['versions'][version] = {
            'created_at': metadata.created_at.isoformat(),
            'features': features,
            'description': description,
            'data_hash': data_hash,
            'n_samples': len(df),
            'statistics': statistics
        }
        self.metadata[name]['latest'] = version
        self._save_metadata()
        
        logger.info(f"Registered feature set '{name}' version '{version}'")
        return metadata
    
    def get_feature_set(
        self,
        name: str,
        version: Optional[str] = None
    ) -> Tuple[pd.DataFrame, FeatureMetadata]:
        """
        Retrieve a feature set from the store.
        
        Args:
            name: Name of the feature set
            version: Specific version (defaults to latest)
        
        Returns:
            Tuple of (DataFrame, FeatureMetadata)
        """
        if name not in self.metadata:
            raise ValueError(f"Feature set '{name}' not found")
        
        if version is None:
            version = self.metadata[name]['latest']
        
        if version not in self.metadata[name]['versions']:
            raise ValueError(f"Version '{version}' not found for '{name}'")
        
        # Load feature data
        feature_path = self.storage_path / name / version / "features.parquet"
        df = pd.read_parquet(feature_path)
        
        # Create metadata object
        meta_dict = self.metadata[name]['versions'][version]
        metadata = FeatureMetadata(
            name=name,
            version=version,
            created_at=datetime.fromisoformat(meta_dict['created_at']),
            features=meta_dict['features'],
            description=meta_dict['description'],
            data_hash=meta_dict['data_hash'],
            n_samples=meta_dict['n_samples'],
            statistics=meta_dict.get('statistics', {})
        )
        
        return df, metadata
    
    def list_feature_sets(self) -> pd.DataFrame:
        """List all registered feature sets."""
        records = []
        for name, info in self.metadata.items():
            latest = info.get('latest', '')
            if latest in info.get('versions', {}):
                version_info = info['versions'][latest]
                records.append({
                    'Name': name,
                    'Latest Version': latest,
                    'Features': len(version_info['features']),
                    'Samples': version_info['n_samples'],
                    'Created': version_info['created_at'][:10]
                })
        return pd.DataFrame(records)
    
    def get_feature_statistics(self, name: str, version: Optional[str] = None) -> pd.DataFrame:
        """Get statistics for a feature set."""
        if name not in self.metadata:
            raise ValueError(f"Feature set '{name}' not found")
        
        if version is None:
            version = self.metadata[name]['latest']
        
        stats = self.metadata[name]['versions'][version].get('statistics', {})
        return pd.DataFrame(stats).T


# Initialize feature store
feature_store = FeatureStore("./mlops_demo/feature_store")

# Define feature groups
price_features = ['returns', 'log_returns', 'momentum_5', 'momentum_10', 'momentum_20']
technical_features = ['rsi', 'macd', 'macd_signal', 'bb_position']
volatility_features = ['volatility_20', 'volatility_50']
volume_features = ['volume_ratio']

all_features = price_features + technical_features + volatility_features + volume_features

# Register feature set
metadata = feature_store.register_feature_set(
    name="trading_signals_v1",
    df=df,
    features=all_features,
    description="Core trading signal features including price momentum, technicals, and volatility"
)

print(f"‚úÖ Registered feature set: {metadata.name}")
print(f"üìå Version: {metadata.version}")
print(f"üìä Features: {len(metadata.features)}")
print(f"üìà Samples: {metadata.n_samples}")
print(f"\nüìã Feature Statistics:")
feature_store.get_feature_statistics("trading_signals_v1")

## 6. Experiment Tracking & Model Registry

Experiment tracking is essential for:
- Reproducibility of results
- Comparing model performance
- Auditing model decisions (regulatory compliance)
- Managing model lifecycle

We'll build a lightweight tracker (in production, use **MLflow**, **Weights & Biases**, or **Neptune**).

In [None]:
@dataclass
class ExperimentRun:
    """Represents a single experiment run."""
    run_id: str
    experiment_name: str
    model_name: str
    parameters: Dict[str, Any]
    metrics: Dict[str, float]
    artifacts: Dict[str, str]
    tags: Dict[str, str]
    start_time: datetime
    end_time: Optional[datetime] = None
    status: str = "RUNNING"


class ExperimentTracker:
    """
    Lightweight experiment tracker for ML experiments.
    
    Tracks:
    - Model parameters
    - Training metrics
    - Model artifacts
    - Metadata and tags
    """
    
    def __init__(self, storage_path: str = "./experiments"):
        self.storage_path = Path(storage_path)
        self.storage_path.mkdir(parents=True, exist_ok=True)
        self.experiments_file = self.storage_path / "experiments.json"
        self.experiments: Dict[str, List[Dict]] = self._load_experiments()
        self.current_run: Optional[ExperimentRun] = None
    
    def _load_experiments(self) -> Dict:
        """Load experiments from disk."""
        if self.experiments_file.exists():
            with open(self.experiments_file, 'r') as f:
                return json.load(f)
        return {}
    
    def _save_experiments(self):
        """Persist experiments to disk."""
        with open(self.experiments_file, 'w') as f:
            json.dump(self.experiments, f, indent=2, default=str)
    
    def start_run(
        self,
        experiment_name: str,
        model_name: str,
        parameters: Dict[str, Any],
        tags: Optional[Dict[str, str]] = None
    ) -> str:
        """Start a new experiment run."""
        run_id = str(uuid.uuid4())[:8]
        
        self.current_run = ExperimentRun(
            run_id=run_id,
            experiment_name=experiment_name,
            model_name=model_name,
            parameters=parameters,
            metrics={},
            artifacts={},
            tags=tags or {},
            start_time=datetime.now()
        )
        
        logger.info(f"Started run {run_id} for experiment '{experiment_name}'")
        return run_id
    
    def log_metric(self, name: str, value: float):
        """Log a metric for the current run."""
        if self.current_run is None:
            raise RuntimeError("No active run. Call start_run() first.")
        self.current_run.metrics[name] = value
    
    def log_metrics(self, metrics: Dict[str, float]):
        """Log multiple metrics."""
        for name, value in metrics.items():
            self.log_metric(name, value)
    
    def log_artifact(self, name: str, artifact: Any, artifact_type: str = "model"):
        """Save an artifact (model, plot, data)."""
        if self.current_run is None:
            raise RuntimeError("No active run. Call start_run() first.")
        
        # Create artifact directory
        artifact_dir = self.storage_path / self.current_run.experiment_name / self.current_run.run_id
        artifact_dir.mkdir(parents=True, exist_ok=True)
        
        # Save artifact based on type
        if artifact_type == "model":
            artifact_path = artifact_dir / f"{name}.joblib"
            joblib.dump(artifact, artifact_path)
        elif artifact_type == "dataframe":
            artifact_path = artifact_dir / f"{name}.parquet"
            artifact.to_parquet(artifact_path)
        elif artifact_type == "figure":
            artifact_path = artifact_dir / f"{name}.png"
            artifact.savefig(artifact_path, dpi=150, bbox_inches='tight')
        else:
            artifact_path = artifact_dir / f"{name}.json"
            with open(artifact_path, 'w') as f:
                json.dump(artifact, f, indent=2, default=str)
        
        self.current_run.artifacts[name] = str(artifact_path)
    
    def end_run(self, status: str = "COMPLETED"):
        """End the current run."""
        if self.current_run is None:
            raise RuntimeError("No active run to end.")
        
        self.current_run.end_time = datetime.now()
        self.current_run.status = status
        
        # Save to experiments
        exp_name = self.current_run.experiment_name
        if exp_name not in self.experiments:
            self.experiments[exp_name] = []
        
        self.experiments[exp_name].append({
            'run_id': self.current_run.run_id,
            'model_name': self.current_run.model_name,
            'parameters': self.current_run.parameters,
            'metrics': self.current_run.metrics,
            'artifacts': self.current_run.artifacts,
            'tags': self.current_run.tags,
            'start_time': self.current_run.start_time.isoformat(),
            'end_time': self.current_run.end_time.isoformat(),
            'status': self.current_run.status,
            'duration_seconds': (self.current_run.end_time - self.current_run.start_time).total_seconds()
        })
        
        self._save_experiments()
        logger.info(f"Ended run {self.current_run.run_id} with status {status}")
        
        run_id = self.current_run.run_id
        self.current_run = None
        return run_id
    
    def get_experiment_runs(self, experiment_name: str) -> pd.DataFrame:
        """Get all runs for an experiment."""
        if experiment_name not in self.experiments:
            return pd.DataFrame()
        
        runs = self.experiments[experiment_name]
        records = []
        for run in runs:
            record = {
                'run_id': run['run_id'],
                'model_name': run['model_name'],
                'status': run['status'],
                'duration_s': run.get('duration_seconds', 0),
                **{f"param_{k}": v for k, v in run['parameters'].items()},
                **{f"metric_{k}": v for k, v in run['metrics'].items()}
            }
            records.append(record)
        
        return pd.DataFrame(records)
    
    def get_best_run(self, experiment_name: str, metric: str, minimize: bool = False) -> Dict:
        """Get the best run based on a metric."""
        if experiment_name not in self.experiments:
            raise ValueError(f"Experiment '{experiment_name}' not found")
        
        runs = self.experiments[experiment_name]
        valid_runs = [r for r in runs if metric in r['metrics']]
        
        if not valid_runs:
            raise ValueError(f"No runs with metric '{metric}'")
        
        key = lambda r: r['metrics'][metric]
        best_run = min(valid_runs, key=key) if minimize else max(valid_runs, key=key)
        return best_run
    
    def load_artifact(self, experiment_name: str, run_id: str, artifact_name: str):
        """Load an artifact from a run."""
        artifact_dir = self.storage_path / experiment_name / run_id
        
        # Try different extensions
        for ext in ['.joblib', '.parquet', '.json', '.png']:
            artifact_path = artifact_dir / f"{artifact_name}{ext}"
            if artifact_path.exists():
                if ext == '.joblib':
                    return joblib.load(artifact_path)
                elif ext == '.parquet':
                    return pd.read_parquet(artifact_path)
                elif ext == '.json':
                    with open(artifact_path, 'r') as f:
                        return json.load(f)
        
        raise FileNotFoundError(f"Artifact '{artifact_name}' not found")


# Initialize tracker
tracker = ExperimentTracker("./mlops_demo/experiments")
print("‚úÖ Experiment tracker initialized")

## 7. Model Training Pipeline

Build an automated training pipeline that:
1. Loads features from the feature store
2. Trains multiple model variants
3. Logs all parameters, metrics, and artifacts
4. Tracks experiments for comparison

In [None]:
class TradingModelPipeline:
    """
    End-to-end training pipeline for trading signal models.
    
    Integrates with feature store and experiment tracker.
    """
    
    def __init__(
        self,
        feature_store: FeatureStore,
        tracker: ExperimentTracker,
        experiment_name: str = "trading_signals"
    ):
        self.feature_store = feature_store
        self.tracker = tracker
        self.experiment_name = experiment_name
    
    def prepare_data(
        self,
        feature_set_name: str,
        target_column: str,
        train_ratio: float = 0.7,
        val_ratio: float = 0.15
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
        """
        Prepare train/val/test splits using time-based splitting.
        
        For time series, we always split chronologically to prevent look-ahead bias.
        """
        # Get features from store
        features_df, metadata = self.feature_store.get_feature_set(feature_set_name)
        
        # Get target from original data
        # In production, target would also be in feature store
        X = features_df.values
        y = df['target'].iloc[:len(X)].values
        
        # Time-based split (no shuffling!)
        n = len(X)
        train_end = int(n * train_ratio)
        val_end = int(n * (train_ratio + val_ratio))
        
        X_train, y_train = X[:train_end], y[:train_end]
        X_val, y_val = X[train_end:val_end], y[train_end:val_end]
        X_test, y_test = X[val_end:], y[val_end:]
        
        logger.info(f"Data split - Train: {len(X_train)}, Val: {len(X_val)}, Test: {len(X_test)}")
        
        return X_train, X_val, X_test, y_train, y_val, y_test
    
    def train_model(
        self,
        model_name: str,
        model,
        X_train: np.ndarray,
        X_val: np.ndarray,
        X_test: np.ndarray,
        y_train: np.ndarray,
        y_val: np.ndarray,
        y_test: np.ndarray,
        parameters: Dict[str, Any],
        tags: Optional[Dict[str, str]] = None
    ) -> str:
        """
        Train a model with full experiment tracking.
        
        Returns the run_id for the experiment.
        """
        # Start experiment run
        run_id = self.tracker.start_run(
            experiment_name=self.experiment_name,
            model_name=model_name,
            parameters=parameters,
            tags=tags or {}
        )
        
        try:
            # Create pipeline with scaling
            pipeline = Pipeline([
                ('scaler', StandardScaler()),
                ('model', model)
            ])
            
            # Train
            logger.info(f"Training {model_name}...")
            pipeline.fit(X_train, y_train)
            
            # Predictions
            y_train_pred = pipeline.predict(X_train)
            y_val_pred = pipeline.predict(X_val)
            y_test_pred = pipeline.predict(X_test)
            
            # Probabilities for AUC
            if hasattr(pipeline, 'predict_proba'):
                y_train_proba = pipeline.predict_proba(X_train)[:, 1]
                y_val_proba = pipeline.predict_proba(X_val)[:, 1]
                y_test_proba = pipeline.predict_proba(X_test)[:, 1]
            else:
                y_train_proba = y_val_proba = y_test_proba = None
            
            # Calculate metrics
            metrics = {
                'train_accuracy': accuracy_score(y_train, y_train_pred),
                'val_accuracy': accuracy_score(y_val, y_val_pred),
                'test_accuracy': accuracy_score(y_test, y_test_pred),
                'train_precision': precision_score(y_train, y_train_pred, zero_division=0),
                'val_precision': precision_score(y_val, y_val_pred, zero_division=0),
                'test_precision': precision_score(y_test, y_test_pred, zero_division=0),
                'train_recall': recall_score(y_train, y_train_pred, zero_division=0),
                'val_recall': recall_score(y_val, y_val_pred, zero_division=0),
                'test_recall': recall_score(y_test, y_test_pred, zero_division=0),
                'train_f1': f1_score(y_train, y_train_pred, zero_division=0),
                'val_f1': f1_score(y_val, y_val_pred, zero_division=0),
                'test_f1': f1_score(y_test, y_test_pred, zero_division=0),
            }
            
            # Add AUC if probabilities available
            if y_train_proba is not None:
                metrics['train_auc'] = roc_auc_score(y_train, y_train_proba)
                metrics['val_auc'] = roc_auc_score(y_val, y_val_proba)
                metrics['test_auc'] = roc_auc_score(y_test, y_test_proba)
            
            # Log metrics
            self.tracker.log_metrics(metrics)
            
            # Log model artifact
            self.tracker.log_artifact('model', pipeline, artifact_type='model')
            
            # Log confusion matrix plot
            fig, axes = plt.subplots(1, 3, figsize=(15, 4))
            for ax, (name, y_true, y_pred) in zip(
                axes, 
                [('Train', y_train, y_train_pred), 
                 ('Val', y_val, y_val_pred), 
                 ('Test', y_test, y_test_pred)]
            ):
                cm = confusion_matrix(y_true, y_pred)
                ax.imshow(cm, cmap='Blues')
                ax.set_title(f'{name} Confusion Matrix')
                ax.set_xlabel('Predicted')
                ax.set_ylabel('Actual')
                for i in range(2):
                    for j in range(2):
                        ax.text(j, i, cm[i, j], ha='center', va='center')
            
            plt.tight_layout()
            self.tracker.log_artifact('confusion_matrices', fig, artifact_type='figure')
            plt.close()
            
            # End run successfully
            self.tracker.end_run(status="COMPLETED")
            
            logger.info(f"‚úÖ Training completed - Run ID: {run_id}")
            logger.info(f"   Val Accuracy: {metrics['val_accuracy']:.4f}")
            logger.info(f"   Test Accuracy: {metrics['test_accuracy']:.4f}")
            
            return run_id
            
        except Exception as e:
            logger.error(f"Training failed: {e}")
            self.tracker.end_run(status="FAILED")
            raise
    
    def run_experiments(
        self,
        feature_set_name: str,
        models: Dict[str, Tuple[Any, Dict[str, Any]]]
    ) -> pd.DataFrame:
        """
        Run multiple model experiments.
        
        Args:
            feature_set_name: Name of feature set in feature store
            models: Dict of model_name -> (model_instance, parameters)
        
        Returns:
            DataFrame with all experiment results
        """
        # Prepare data
        X_train, X_val, X_test, y_train, y_val, y_test = self.prepare_data(feature_set_name, 'target')
        
        run_ids = []
        for model_name, (model, params) in models.items():
            run_id = self.train_model(
                model_name=model_name,
                model=model,
                X_train=X_train, X_val=X_val, X_test=X_test,
                y_train=y_train, y_val=y_val, y_test=y_test,
                parameters=params,
                tags={'feature_set': feature_set_name}
            )
            run_ids.append(run_id)
        
        return self.tracker.get_experiment_runs(self.experiment_name)


# Initialize pipeline
pipeline = TradingModelPipeline(feature_store, tracker, experiment_name="trading_signals_experiment")

# Define models to experiment with
models = {
    'logistic_regression': (
        LogisticRegression(random_state=42, max_iter=1000),
        {'C': 1.0, 'penalty': 'l2', 'max_iter': 1000}
    ),
    'random_forest': (
        RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42),
        {'n_estimators': 100, 'max_depth': 10, 'random_state': 42}
    ),
    'gradient_boosting': (
        GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42),
        {'n_estimators': 100, 'max_depth': 5, 'learning_rate': 0.1}
    )
}

# Run experiments
results = pipeline.run_experiments("trading_signals_v1", models)
print("\nüìä Experiment Results:")
results[['run_id', 'model_name', 'metric_val_accuracy', 'metric_test_accuracy', 'metric_val_auc', 'metric_test_auc']]

## 8. Model Registry & Deployment

The **Model Registry** manages model versions and their deployment stages:
- **Staging**: Models under evaluation
- **Production**: Live models serving predictions
- **Archived**: Retired models kept for audit

In [None]:
class ModelRegistry:
    """
    Model registry for managing model versions and deployment stages.
    
    Stages:
    - None: Just registered
    - Staging: Under evaluation
    - Production: Live model
    - Archived: Retired
    """
    
    VALID_STAGES = ['staging', 'production', 'archived']
    
    def __init__(self, storage_path: str = "./model_registry"):
        self.storage_path = Path(storage_path)
        self.storage_path.mkdir(parents=True, exist_ok=True)
        self.registry_file = self.storage_path / "registry.json"
        self.registry: Dict[str, Dict] = self._load_registry()
    
    def _load_registry(self) -> Dict:
        """Load registry from disk."""
        if self.registry_file.exists():
            with open(self.registry_file, 'r') as f:
                return json.load(f)
        return {}
    
    def _save_registry(self):
        """Persist registry to disk."""
        with open(self.registry_file, 'w') as f:
            json.dump(self.registry, f, indent=2, default=str)
    
    def register_model(
        self,
        name: str,
        model,
        version: str,
        metrics: Dict[str, float],
        experiment_run_id: Optional[str] = None,
        description: str = ""
    ) -> str:
        """
        Register a new model version.
        
        Args:
            name: Model name
            model: Model object (sklearn pipeline, etc.)
            version: Version string
            metrics: Model metrics
            experiment_run_id: Link to experiment run
            description: Model description
        
        Returns:
            Full model identifier (name/version)
        """
        # Create model directory
        model_dir = self.storage_path / name / version
        model_dir.mkdir(parents=True, exist_ok=True)
        
        # Save model
        model_path = model_dir / "model.joblib"
        joblib.dump(model, model_path)
        
        # Create registry entry
        if name not in self.registry:
            self.registry[name] = {'versions': {}}
        
        self.registry[name]['versions'][version] = {
            'registered_at': datetime.now().isoformat(),
            'metrics': metrics,
            'experiment_run_id': experiment_run_id,
            'description': description,
            'stage': None,
            'model_path': str(model_path)
        }
        
        self._save_registry()
        logger.info(f"Registered model '{name}' version '{version}'")
        
        return f"{name}/{version}"
    
    def transition_stage(self, name: str, version: str, stage: str):
        """
        Transition a model to a new stage.
        
        Only one model can be in 'production' stage per model name.
        """
        if stage not in self.VALID_STAGES:
            raise ValueError(f"Invalid stage. Must be one of {self.VALID_STAGES}")
        
        if name not in self.registry:
            raise ValueError(f"Model '{name}' not found")
        
        if version not in self.registry[name]['versions']:
            raise ValueError(f"Version '{version}' not found for model '{name}'")
        
        # If transitioning to production, archive current production model
        if stage == 'production':
            for v, info in self.registry[name]['versions'].items():
                if info.get('stage') == 'production':
                    info['stage'] = 'archived'
                    logger.info(f"Archived previous production model '{name}/{v}'")
        
        # Update stage
        self.registry[name]['versions'][version]['stage'] = stage
        self.registry[name]['versions'][version]['stage_updated_at'] = datetime.now().isoformat()
        
        self._save_registry()
        logger.info(f"Transitioned '{name}/{version}' to stage '{stage}'")
    
    def get_production_model(self, name: str):
        """Get the production model for a given name."""
        if name not in self.registry:
            raise ValueError(f"Model '{name}' not found")
        
        for version, info in self.registry[name]['versions'].items():
            if info.get('stage') == 'production':
                model_path = info['model_path']
                return joblib.load(model_path), version, info
        
        raise ValueError(f"No production model found for '{name}'")
    
    def get_model(self, name: str, version: str):
        """Get a specific model version."""
        if name not in self.registry:
            raise ValueError(f"Model '{name}' not found")
        
        if version not in self.registry[name]['versions']:
            raise ValueError(f"Version '{version}' not found")
        
        info = self.registry[name]['versions'][version]
        model = joblib.load(info['model_path'])
        return model, info
    
    def list_models(self) -> pd.DataFrame:
        """List all registered models."""
        records = []
        for name, model_info in self.registry.items():
            for version, info in model_info['versions'].items():
                records.append({
                    'Name': name,
                    'Version': version,
                    'Stage': info.get('stage', 'None'),
                    'Registered': info['registered_at'][:10],
                    'Test Accuracy': info['metrics'].get('test_accuracy', 'N/A'),
                    'Test AUC': info['metrics'].get('test_auc', 'N/A')
                })
        return pd.DataFrame(records)
    
    def compare_versions(self, name: str) -> pd.DataFrame:
        """Compare all versions of a model."""
        if name not in self.registry:
            raise ValueError(f"Model '{name}' not found")
        
        records = []
        for version, info in self.registry[name]['versions'].items():
            record = {'Version': version, 'Stage': info.get('stage', 'None')}
            record.update(info['metrics'])
            records.append(record)
        
        return pd.DataFrame(records).sort_values('test_accuracy', ascending=False)


# Initialize model registry
model_registry = ModelRegistry("./mlops_demo/model_registry")

# Get best model from experiments and register it
best_run = tracker.get_best_run("trading_signals_experiment", metric="val_accuracy")
best_model = tracker.load_artifact("trading_signals_experiment", best_run['run_id'], "model")

# Register the best model
model_registry.register_model(
    name="trading_signal_classifier",
    model=best_model,
    version="1.0.0",
    metrics=best_run['metrics'],
    experiment_run_id=best_run['run_id'],
    description=f"Best model from experiments - {best_run['model_name']}"
)

# Promote to staging
model_registry.transition_stage("trading_signal_classifier", "1.0.0", "staging")

# After validation, promote to production
model_registry.transition_stage("trading_signal_classifier", "1.0.0", "production")

print("\nüìã Model Registry:")
model_registry.list_models()

## 9. Model Serving and Inference

Production model serving patterns for trading systems:
1. **Batch Inference**: Generate signals for all assets periodically
2. **Real-time Inference**: On-demand predictions for trading decisions
3. **Streaming Inference**: Continuous predictions on data streams

In [None]:
class ModelServer:
    """
    Model serving layer for trading signal predictions.
    
    Handles:
    - Loading production models
    - Feature preparation
    - Prediction generation
    - Prediction logging for monitoring
    """
    
    def __init__(
        self,
        model_registry: ModelRegistry,
        model_name: str,
        log_predictions: bool = True
    ):
        self.model_registry = model_registry
        self.model_name = model_name
        self.log_predictions = log_predictions
        self.prediction_log: List[Dict] = []
        
        # Load production model
        self._load_model()
    
    def _load_model(self):
        """Load the current production model."""
        try:
            self.model, self.model_version, self.model_info = \
                self.model_registry.get_production_model(self.model_name)
            logger.info(f"Loaded production model: {self.model_name} v{self.model_version}")
        except ValueError as e:
            logger.error(f"Failed to load model: {e}")
            raise
    
    def reload_model(self):
        """Reload model (for hot-reloading on model updates)."""
        self._load_model()
    
    def predict(self, features: np.ndarray, return_proba: bool = False) -> Dict:
        """
        Generate predictions for given features.
        
        Args:
            features: Feature array (n_samples, n_features)
            return_proba: Whether to return probabilities
        
        Returns:
            Dict with predictions and metadata
        """
        start_time = datetime.now()
        
        # Ensure 2D array
        if features.ndim == 1:
            features = features.reshape(1, -1)
        
        # Generate predictions
        predictions = self.model.predict(features)
        
        result = {
            'predictions': predictions.tolist(),
            'model_version': self.model_version,
            'timestamp': start_time.isoformat(),
            'n_samples': len(features)
        }
        
        if return_proba and hasattr(self.model, 'predict_proba'):
            probabilities = self.model.predict_proba(features)
            result['probabilities'] = probabilities.tolist()
            result['confidence'] = np.max(probabilities, axis=1).tolist()
        
        # Calculate latency
        latency_ms = (datetime.now() - start_time).total_seconds() * 1000
        result['latency_ms'] = latency_ms
        
        # Log prediction for monitoring
        if self.log_predictions:
            self._log_prediction(features, result)
        
        return result
    
    def _log_prediction(self, features: np.ndarray, result: Dict):
        """Log prediction for monitoring and drift detection."""
        log_entry = {
            'timestamp': result['timestamp'],
            'model_version': result['model_version'],
            'n_samples': result['n_samples'],
            'predictions': result['predictions'],
            'latency_ms': result['latency_ms'],
            'feature_stats': {
                'mean': float(np.mean(features)),
                'std': float(np.std(features)),
                'min': float(np.min(features)),
                'max': float(np.max(features))
            }
        }
        
        if 'confidence' in result:
            log_entry['avg_confidence'] = float(np.mean(result['confidence']))
        
        self.prediction_log.append(log_entry)
    
    def batch_predict(
        self,
        df: pd.DataFrame,
        feature_columns: List[str],
        output_column: str = 'prediction'
    ) -> pd.DataFrame:
        """
        Batch prediction on a DataFrame.
        
        Args:
            df: Input DataFrame
            feature_columns: List of feature column names
            output_column: Name for prediction column
        
        Returns:
            DataFrame with predictions added
        """
        features = df[feature_columns].values
        result = self.predict(features, return_proba=True)
        
        df_out = df.copy()
        df_out[output_column] = result['predictions']
        
        if 'probabilities' in result:
            df_out[f'{output_column}_prob'] = [p[1] for p in result['probabilities']]
            df_out[f'{output_column}_confidence'] = result['confidence']
        
        return df_out
    
    def get_prediction_stats(self) -> Dict:
        """Get statistics from prediction logs."""
        if not self.prediction_log:
            return {}
        
        total_predictions = sum(log['n_samples'] for log in self.prediction_log)
        latencies = [log['latency_ms'] for log in self.prediction_log]
        
        return {
            'total_requests': len(self.prediction_log),
            'total_predictions': total_predictions,
            'avg_latency_ms': np.mean(latencies),
            'p50_latency_ms': np.percentile(latencies, 50),
            'p99_latency_ms': np.percentile(latencies, 99),
            'prediction_distribution': {
                'up': sum(sum(1 for p in log['predictions'] if p == 1) for log in self.prediction_log),
                'down': sum(sum(1 for p in log['predictions'] if p == 0) for log in self.prediction_log)
            }
        }


# Initialize model server
server = ModelServer(model_registry, "trading_signal_classifier")

# Demo: Single prediction
sample_features = df[all_features].iloc[-10:].values
result = server.predict(sample_features, return_proba=True)

print("üîÆ Single Batch Prediction:")
print(f"   Predictions: {result['predictions']}")
print(f"   Confidence: {[f'{c:.2f}' for c in result['confidence']]}")
print(f"   Latency: {result['latency_ms']:.2f}ms")
print(f"   Model Version: {result['model_version']}")

# Demo: Batch prediction on DataFrame
df_predictions = server.batch_predict(df.tail(100), all_features, 'signal')
print(f"\nüìä Batch Prediction Results:")
print(df_predictions[['timestamp', 'close', 'signal', 'signal_prob', 'signal_confidence']].tail())

## 10. Monitoring and Drift Detection

Model monitoring is critical in trading because:
1. **Market Regime Changes**: Models trained in bull markets may fail in bear markets
2. **Data Distribution Shifts**: Feature distributions change over time
3. **Concept Drift**: The relationship between features and target changes

Types of drift:
- **Data Drift**: Input feature distributions change
- **Concept Drift**: P(Y|X) changes - same features, different outcomes
- **Prediction Drift**: Model output distribution changes

In [None]:
@dataclass
class DriftResult:
    """Result of a drift detection test."""
    feature: str
    test_name: str
    statistic: float
    p_value: float
    drift_detected: bool
    threshold: float


class DriftDetector:
    """
    Drift detection for production ML models.
    
    Uses statistical tests to compare reference (training) and
    production data distributions.
    """
    
    def __init__(
        self,
        reference_data: pd.DataFrame,
        features: List[str],
        p_value_threshold: float = 0.05,
        psi_threshold: float = 0.2
    ):
        """
        Initialize drift detector with reference data.
        
        Args:
            reference_data: Training/baseline data
            features: Features to monitor
            p_value_threshold: Threshold for statistical tests
            psi_threshold: Population Stability Index threshold
        """
        self.reference_data = reference_data[features].copy()
        self.features = features
        self.p_value_threshold = p_value_threshold
        self.psi_threshold = psi_threshold
        
        # Compute reference statistics
        self.reference_stats = self._compute_stats(self.reference_data)
        
        # Store historical drift results
        self.drift_history: List[Dict] = []
    
    def _compute_stats(self, df: pd.DataFrame) -> Dict[str, Dict]:
        """Compute statistics for each feature."""
        stats = {}
        for feature in self.features:
            if feature in df.columns:
                data = df[feature].dropna()
                stats[feature] = {
                    'mean': float(data.mean()),
                    'std': float(data.std()),
                    'min': float(data.min()),
                    'max': float(data.max()),
                    'median': float(data.median()),
                    'skew': float(data.skew()),
                    'kurtosis': float(data.kurtosis()),
                    'histogram': np.histogram(data, bins=10)
                }
        return stats
    
    def _ks_test(self, feature: str, production_data: pd.Series) -> DriftResult:
        """Kolmogorov-Smirnov test for distribution comparison."""
        reference = self.reference_data[feature].dropna()
        production = production_data.dropna()
        
        statistic, p_value = stats.ks_2samp(reference, production)
        
        return DriftResult(
            feature=feature,
            test_name="Kolmogorov-Smirnov",
            statistic=statistic,
            p_value=p_value,
            drift_detected=p_value < self.p_value_threshold,
            threshold=self.p_value_threshold
        )
    
    def _psi(self, feature: str, production_data: pd.Series, n_bins: int = 10) -> DriftResult:
        """
        Population Stability Index (PSI).
        
        PSI < 0.1: No significant change
        0.1 <= PSI < 0.2: Moderate change
        PSI >= 0.2: Significant change
        """
        reference = self.reference_data[feature].dropna()
        production = production_data.dropna()
        
        # Create bins from reference data
        _, bin_edges = np.histogram(reference, bins=n_bins)
        
        # Get percentages in each bin
        ref_counts, _ = np.histogram(reference, bins=bin_edges)
        prod_counts, _ = np.histogram(production, bins=bin_edges)
        
        # Convert to percentages (add small epsilon to avoid division by zero)
        epsilon = 1e-10
        ref_pct = ref_counts / len(reference) + epsilon
        prod_pct = prod_counts / len(production) + epsilon
        
        # Calculate PSI
        psi_value = np.sum((prod_pct - ref_pct) * np.log(prod_pct / ref_pct))
        
        return DriftResult(
            feature=feature,
            test_name="PSI",
            statistic=psi_value,
            p_value=psi_value,  # PSI doesn't have p-value, using value itself
            drift_detected=psi_value >= self.psi_threshold,
            threshold=self.psi_threshold
        )
    
    def _chi_squared_test(self, feature: str, production_data: pd.Series, n_bins: int = 10) -> DriftResult:
        """Chi-squared test for categorical/binned data."""
        reference = self.reference_data[feature].dropna()
        production = production_data.dropna()
        
        # Create bins from reference data
        _, bin_edges = np.histogram(reference, bins=n_bins)
        
        # Get counts in each bin
        ref_counts, _ = np.histogram(reference, bins=bin_edges)
        prod_counts, _ = np.histogram(production, bins=bin_edges)
        
        # Normalize to same sample size
        expected = ref_counts * (len(production) / len(reference))
        
        # Filter out zeros
        mask = (expected > 0) & (prod_counts > 0)
        if mask.sum() < 2:
            return DriftResult(
                feature=feature,
                test_name="Chi-Squared",
                statistic=0,
                p_value=1.0,
                drift_detected=False,
                threshold=self.p_value_threshold
            )
        
        statistic, p_value = stats.chisquare(prod_counts[mask], expected[mask])
        
        return DriftResult(
            feature=feature,
            test_name="Chi-Squared",
            statistic=statistic,
            p_value=p_value,
            drift_detected=p_value < self.p_value_threshold,
            threshold=self.p_value_threshold
        )
    
    def detect_drift(
        self,
        production_data: pd.DataFrame,
        tests: List[str] = ['ks', 'psi']
    ) -> Tuple[bool, pd.DataFrame]:
        """
        Detect drift in production data.
        
        Args:
            production_data: Recent production data
            tests: List of tests to run ('ks', 'psi', 'chi2')
        
        Returns:
            Tuple of (any_drift_detected, results_dataframe)
        """
        results = []
        
        for feature in self.features:
            if feature not in production_data.columns:
                continue
            
            prod_series = production_data[feature]
            
            if 'ks' in tests:
                results.append(self._ks_test(feature, prod_series))
            if 'psi' in tests:
                results.append(self._psi(feature, prod_series))
            if 'chi2' in tests:
                results.append(self._chi_squared_test(feature, prod_series))
        
        # Create results dataframe
        results_df = pd.DataFrame([
            {
                'Feature': r.feature,
                'Test': r.test_name,
                'Statistic': r.statistic,
                'P-Value/PSI': r.p_value,
                'Threshold': r.threshold,
                'Drift': '‚ö†Ô∏è YES' if r.drift_detected else '‚úÖ NO'
            }
            for r in results
        ])
        
        # Log to history
        timestamp = datetime.now()
        for r in results:
            self.drift_history.append({
                'timestamp': timestamp.isoformat(),
                'feature': r.feature,
                'test': r.test_name,
                'statistic': r.statistic,
                'p_value': r.p_value,
                'drift_detected': r.drift_detected
            })
        
        any_drift = any(r.drift_detected for r in results)
        
        return any_drift, results_df
    
    def get_drift_summary(self, production_data: pd.DataFrame) -> pd.DataFrame:
        """Get summary statistics comparing reference and production data."""
        prod_stats = self._compute_stats(production_data)
        
        records = []
        for feature in self.features:
            if feature in prod_stats:
                ref = self.reference_stats[feature]
                prod = prod_stats[feature]
                
                records.append({
                    'Feature': feature,
                    'Ref Mean': f"{ref['mean']:.4f}",
                    'Prod Mean': f"{prod['mean']:.4f}",
                    'Mean Œî%': f"{((prod['mean'] - ref['mean']) / (abs(ref['mean']) + 1e-10) * 100):.1f}%",
                    'Ref Std': f"{ref['std']:.4f}",
                    'Prod Std': f"{prod['std']:.4f}",
                    'Std Œî%': f"{((prod['std'] - ref['std']) / (abs(ref['std']) + 1e-10) * 100):.1f}%"
                })
        
        return pd.DataFrame(records)


# Split data into "training" (reference) and "production" (simulating drift)
n_reference = int(len(df) * 0.7)
reference_data = df.iloc[:n_reference]
production_data = df.iloc[n_reference:]

# Initialize drift detector
drift_detector = DriftDetector(
    reference_data=reference_data,
    features=all_features,
    p_value_threshold=0.05,
    psi_threshold=0.2
)

# Check for drift
drift_detected, drift_results = drift_detector.detect_drift(production_data)

print(f"{'‚ö†Ô∏è DRIFT DETECTED!' if drift_detected else '‚úÖ No significant drift detected'}\n")
print("üìä Drift Detection Results:")
drift_results

In [None]:
# Visualize feature distributions
fig, axes = plt.subplots(3, 4, figsize=(16, 10))
axes = axes.flatten()

for idx, feature in enumerate(all_features[:12]):
    ax = axes[idx]
    ax.hist(reference_data[feature].dropna(), bins=30, alpha=0.5, label='Reference', density=True)
    ax.hist(production_data[feature].dropna(), bins=30, alpha=0.5, label='Production', density=True)
    ax.set_title(feature)
    ax.legend()

plt.tight_layout()
plt.suptitle("Feature Distribution Comparison: Reference vs Production", y=1.02)
plt.show()

# Statistics summary
print("\nüìà Distribution Statistics Comparison:")
drift_detector.get_drift_summary(production_data)

## 11. Performance Monitoring

Track model performance metrics over time to detect degradation.

In [None]:
class PerformanceMonitor:
    """
    Monitor model performance over time.
    
    Tracks:
    - Prediction accuracy
    - Precision/Recall
    - Trading-specific metrics (Sharpe, returns)
    """
    
    def __init__(
        self,
        baseline_metrics: Dict[str, float],
        alert_thresholds: Optional[Dict[str, float]] = None
    ):
        """
        Initialize performance monitor.
        
        Args:
            baseline_metrics: Baseline metrics from training
            alert_thresholds: Thresholds for alerts (relative degradation)
        """
        self.baseline_metrics = baseline_metrics
        self.alert_thresholds = alert_thresholds or {
            'accuracy': 0.10,   # Alert if accuracy drops by 10%
            'precision': 0.15,  # Alert if precision drops by 15%
            'recall': 0.15,     # Alert if recall drops by 15%
            'auc': 0.10,        # Alert if AUC drops by 10%
        }
        
        self.performance_history: List[Dict] = []
        self.alerts: List[Dict] = []
    
    def evaluate(
        self,
        y_true: np.ndarray,
        y_pred: np.ndarray,
        y_proba: Optional[np.ndarray] = None,
        timestamp: Optional[datetime] = None
    ) -> Dict[str, Any]:
        """
        Evaluate model performance and check for degradation.
        
        Args:
            y_true: True labels
            y_pred: Predicted labels
            y_proba: Prediction probabilities (optional)
            timestamp: Timestamp for this evaluation
        
        Returns:
            Dict with metrics and alerts
        """
        timestamp = timestamp or datetime.now()
        
        # Calculate metrics
        metrics = {
            'accuracy': accuracy_score(y_true, y_pred),
            'precision': precision_score(y_true, y_pred, zero_division=0),
            'recall': recall_score(y_true, y_pred, zero_division=0),
            'f1': f1_score(y_true, y_pred, zero_division=0),
        }
        
        if y_proba is not None:
            try:
                metrics['auc'] = roc_auc_score(y_true, y_proba)
            except:
                pass
        
        # Check for degradation
        current_alerts = []
        for metric_name, threshold in self.alert_thresholds.items():
            if metric_name in metrics and metric_name in self.baseline_metrics:
                baseline = self.baseline_metrics[metric_name]
                current = metrics[metric_name]
                degradation = (baseline - current) / baseline if baseline > 0 else 0
                
                if degradation > threshold:
                    alert = {
                        'timestamp': timestamp.isoformat(),
                        'metric': metric_name,
                        'baseline': baseline,
                        'current': current,
                        'degradation_pct': degradation * 100,
                        'threshold_pct': threshold * 100,
                        'severity': 'HIGH' if degradation > threshold * 2 else 'MEDIUM'
                    }
                    current_alerts.append(alert)
                    self.alerts.append(alert)
        
        # Store in history
        record = {
            'timestamp': timestamp.isoformat(),
            'n_samples': len(y_true),
            'metrics': metrics,
            'alerts': len(current_alerts)
        }
        self.performance_history.append(record)
        
        return {
            'metrics': metrics,
            'alerts': current_alerts,
            'degradation_detected': len(current_alerts) > 0
        }
    
    def get_performance_trend(self) -> pd.DataFrame:
        """Get performance metrics over time."""
        if not self.performance_history:
            return pd.DataFrame()
        
        records = []
        for entry in self.performance_history:
            record = {'timestamp': entry['timestamp']}
            record.update(entry['metrics'])
            records.append(record)
        
        return pd.DataFrame(records)
    
    def get_alerts(self) -> pd.DataFrame:
        """Get all performance alerts."""
        if not self.alerts:
            return pd.DataFrame()
        return pd.DataFrame(self.alerts)
    
    def plot_performance(self):
        """Visualize performance trends."""
        df = self.get_performance_trend()
        if df.empty:
            print("No performance data to plot")
            return
        
        metrics_to_plot = ['accuracy', 'precision', 'recall', 'f1']
        available_metrics = [m for m in metrics_to_plot if m in df.columns]
        
        fig, axes = plt.subplots(2, 2, figsize=(12, 8))
        axes = axes.flatten()
        
        for idx, metric in enumerate(available_metrics):
            ax = axes[idx]
            values = df[metric].values
            ax.plot(values, marker='o', label=metric.capitalize())
            ax.axhline(y=self.baseline_metrics.get(metric, 0), 
                      color='r', linestyle='--', label='Baseline')
            
            # Threshold line
            baseline = self.baseline_metrics.get(metric, 0)
            threshold = self.alert_thresholds.get(metric, 0)
            ax.axhline(y=baseline * (1 - threshold), 
                      color='orange', linestyle=':', label='Alert Threshold')
            
            ax.set_title(f'{metric.capitalize()} Over Time')
            ax.set_xlabel('Evaluation Period')
            ax.set_ylabel(metric.capitalize())
            ax.legend()
            ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()


# Initialize performance monitor with baseline from training
baseline_metrics = {
    'accuracy': best_run['metrics']['test_accuracy'],
    'precision': best_run['metrics']['test_precision'],
    'recall': best_run['metrics']['test_recall'],
    'auc': best_run['metrics'].get('test_auc', 0.5)
}

performance_monitor = PerformanceMonitor(
    baseline_metrics=baseline_metrics,
    alert_thresholds={
        'accuracy': 0.05,   # 5% drop triggers alert
        'precision': 0.10,  
        'recall': 0.10,
        'auc': 0.05
    }
)

# Simulate periodic performance evaluation
print("üìä Simulating Performance Monitoring...\n")

# Split production data into chunks to simulate time periods
chunk_size = len(production_data) // 5
production_model, _, _ = model_registry.get_production_model("trading_signal_classifier")

for i in range(5):
    start_idx = i * chunk_size
    end_idx = (i + 1) * chunk_size
    
    chunk = production_data.iloc[start_idx:end_idx]
    X_chunk = chunk[all_features].values
    y_chunk = chunk['target'].values
    
    # Get predictions
    y_pred = production_model.predict(X_chunk)
    y_proba = production_model.predict_proba(X_chunk)[:, 1]
    
    # Evaluate
    result = performance_monitor.evaluate(
        y_true=y_chunk,
        y_pred=y_pred,
        y_proba=y_proba,
        timestamp=datetime.now() + timedelta(days=i*7)
    )
    
    print(f"Period {i+1}: Accuracy={result['metrics']['accuracy']:.3f}, "
          f"AUC={result['metrics'].get('auc', 'N/A'):.3f}, "
          f"Alerts={len(result['alerts'])}")

# Plot performance trends
performance_monitor.plot_performance()

# Show any alerts
alerts_df = performance_monitor.get_alerts()
if not alerts_df.empty:
    print("\n‚ö†Ô∏è Performance Alerts:")
    print(alerts_df)
else:
    print("\n‚úÖ No performance alerts generated")