# Week 23: Production ML - Theory

## MLOps & Production Systems for Quantitative Trading

This notebook covers the theoretical foundations and practical implementations of production ML systems in quantitative finance.

### Learning Objectives
1. Understand MLOps fundamentals and lifecycle management
2. Implement model versioning and experiment tracking
3. Build feature stores for production ML
4. Detect model drift and data drift
5. Design A/B testing frameworks for trading models
6. Create CI/CD pipelines for ML systems

### Prerequisites
- Understanding of ML fundamentals
- Python programming proficiency
- Basic knowledge of trading systems

---

In [None]:
# Standard imports for Production ML
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any, Tuple
from dataclasses import dataclass, field
from abc import ABC, abstractmethod
import json
import hashlib
import pickle
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# For statistical tests
from scipy import stats
from scipy.stats import ks_2samp, chi2_contingency

# ML libraries
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import StandardScaler

# Set random seed for reproducibility
np.random.seed(42)

# Plotting settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

print("‚úÖ All imports successful!")
print(f"üìÖ Notebook executed on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

---
## 1. Production ML System Architecture

### 1.1 Overview of Production ML Systems

A production ML system in quantitative trading consists of several interconnected components:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    PRODUCTION ML TRADING SYSTEM                              ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                   ‚îÇ
‚îÇ  ‚îÇ Data Sources ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Data Ingestion‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇFeature Store ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Market Data‚îÇ    ‚îÇ - Validation  ‚îÇ    ‚îÇ - Compute    ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Alt Data   ‚îÇ    ‚îÇ - Transform   ‚îÇ    ‚îÇ - Store      ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - News/Social‚îÇ    ‚îÇ - Quality     ‚îÇ    ‚îÇ - Serve      ‚îÇ                   ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                   ‚îÇ
‚îÇ                                                  ‚îÇ                           ‚îÇ
‚îÇ                                                  ‚ñº                           ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                   ‚îÇ
‚îÇ  ‚îÇModel Registry‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÇModel Training‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÇ Training     ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Versioning ‚îÇ    ‚îÇ - Hyperparams ‚îÇ    ‚îÇ Pipeline     ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Metadata   ‚îÇ    ‚îÇ - Validation  ‚îÇ    ‚îÇ              ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Artifacts  ‚îÇ    ‚îÇ - Metrics     ‚îÇ    ‚îÇ              ‚îÇ                   ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                   ‚îÇ
‚îÇ         ‚îÇ                                                                    ‚îÇ
‚îÇ         ‚ñº                                                                    ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                   ‚îÇ
‚îÇ  ‚îÇModel Serving ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Prediction  ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Execution   ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Real-time  ‚îÇ    ‚îÇ  Service     ‚îÇ    ‚îÇ  Engine      ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Batch      ‚îÇ    ‚îÇ              ‚îÇ    ‚îÇ              ‚îÇ                   ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                   ‚îÇ
‚îÇ                                                  ‚îÇ                           ‚îÇ
‚îÇ                                                  ‚ñº                           ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                   ‚îÇ
‚îÇ  ‚îÇ  Monitoring  ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÇ   Logging    ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÇ   Orders     ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Drift      ‚îÇ    ‚îÇ - Audit Trail‚îÇ    ‚îÇ - Portfolio  ‚îÇ                   ‚îÇ
‚îÇ  ‚îÇ - Perf       ‚îÇ    ‚îÇ - Metrics    ‚îÇ    ‚îÇ - Risk       ‚îÇ                   ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                   ‚îÇ
‚îÇ                                                                              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Key Components:
1. **Data Ingestion Pipeline**: Collects, validates, and transforms raw data
2. **Feature Store**: Computes, stores, and serves features
3. **Training Pipeline**: Orchestrates model training and validation
4. **Model Registry**: Tracks model versions and metadata
5. **Serving Infrastructure**: Delivers predictions in real-time or batch
6. **Monitoring System**: Tracks model performance and data drift

In [None]:
# ============================================================================
# 1.2 Production ML Pipeline Architecture Implementation
# ============================================================================

@dataclass
class PipelineConfig:
    """Configuration for production ML pipeline"""
    name: str
    version: str
    data_source: str
    feature_config: Dict[str, Any]
    model_config: Dict[str, Any]
    serving_config: Dict[str, Any]
    monitoring_config: Dict[str, Any]


class DataIngestionComponent:
    """
    Data Ingestion Component - First stage of production pipeline
    Responsible for collecting, validating, and transforming raw data
    """
    
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.validation_rules = config.get('validation_rules', {})
        self.transform_pipeline = config.get('transforms', [])
        
    def validate_data(self, df: pd.DataFrame) -> Tuple[bool, List[str]]:
        """Validate incoming data against predefined rules"""
        errors = []
        
        # Check for required columns
        required_cols = self.validation_rules.get('required_columns', [])
        missing_cols = set(required_cols) - set(df.columns)
        if missing_cols:
            errors.append(f"Missing columns: {missing_cols}")
        
        # Check for null values in critical columns
        critical_cols = self.validation_rules.get('no_null_columns', [])
        for col in critical_cols:
            if col in df.columns and df[col].isnull().any():
                null_count = df[col].isnull().sum()
                errors.append(f"Column {col} has {null_count} null values")
        
        # Check data types
        expected_types = self.validation_rules.get('column_types', {})
        for col, expected_type in expected_types.items():
            if col in df.columns:
                if expected_type == 'numeric' and not np.issubdtype(df[col].dtype, np.number):
                    errors.append(f"Column {col} should be numeric")
                elif expected_type == 'datetime' and not pd.api.types.is_datetime64_any_dtype(df[col]):
                    errors.append(f"Column {col} should be datetime")
        
        # Check value ranges
        value_ranges = self.validation_rules.get('value_ranges', {})
        for col, (min_val, max_val) in value_ranges.items():
            if col in df.columns:
                if df[col].min() < min_val or df[col].max() > max_val:
                    errors.append(f"Column {col} values outside range [{min_val}, {max_val}]")
        
        return len(errors) == 0, errors
    
    def ingest(self, df: pd.DataFrame) -> pd.DataFrame:
        """Ingest and transform data"""
        # Validate
        is_valid, errors = self.validate_data(df)
        if not is_valid:
            print(f"‚ö†Ô∏è Data validation warnings: {errors}")
        
        # Apply transforms
        for transform in self.transform_pipeline:
            df = transform(df)
        
        return df


class FeatureEngineeringComponent:
    """
    Feature Engineering Component
    Computes features from raw data for model training and inference
    """
    
    def __init__(self, feature_definitions: Dict[str, callable]):
        self.feature_definitions = feature_definitions
        self.feature_cache = {}
        
    def compute_features(self, df: pd.DataFrame, cache_key: Optional[str] = None) -> pd.DataFrame:
        """Compute all defined features"""
        features = df.copy()
        
        for feature_name, feature_func in self.feature_definitions.items():
            try:
                features[feature_name] = feature_func(df)
            except Exception as e:
                print(f"Error computing {feature_name}: {e}")
                features[feature_name] = np.nan
        
        # Cache if key provided
        if cache_key:
            self.feature_cache[cache_key] = features
        
        return features
    
    def get_cached_features(self, cache_key: str) -> Optional[pd.DataFrame]:
        """Retrieve cached features"""
        return self.feature_cache.get(cache_key)


class ModelTrainingComponent:
    """
    Model Training Component
    Handles model training, validation, and hyperparameter tuning
    """
    
    def __init__(self, model_class, hyperparameters: Dict[str, Any]):
        self.model_class = model_class
        self.hyperparameters = hyperparameters
        self.model = None
        self.training_metrics = {}
        
    def train(self, X_train: np.ndarray, y_train: np.ndarray, 
              X_val: Optional[np.ndarray] = None, 
              y_val: Optional[np.ndarray] = None) -> Dict[str, float]:
        """Train model and compute metrics"""
        self.model = self.model_class(**self.hyperparameters)
        self.model.fit(X_train, y_train)
        
        # Training metrics
        train_pred = self.model.predict(X_train)
        self.training_metrics['train_accuracy'] = accuracy_score(y_train, train_pred)
        
        if hasattr(self.model, 'predict_proba'):
            train_proba = self.model.predict_proba(X_train)[:, 1]
            self.training_metrics['train_auc'] = roc_auc_score(y_train, train_proba)
        
        # Validation metrics
        if X_val is not None and y_val is not None:
            val_pred = self.model.predict(X_val)
            self.training_metrics['val_accuracy'] = accuracy_score(y_val, val_pred)
            
            if hasattr(self.model, 'predict_proba'):
                val_proba = self.model.predict_proba(X_val)[:, 1]
                self.training_metrics['val_auc'] = roc_auc_score(y_val, val_proba)
        
        return self.training_metrics
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """Make predictions"""
        if self.model is None:
            raise ValueError("Model not trained yet")
        return self.model.predict(X)
    
    def predict_proba(self, X: np.ndarray) -> np.ndarray:
        """Get probability predictions"""
        if self.model is None:
            raise ValueError("Model not trained yet")
        return self.model.predict_proba(X)


class ProductionMLPipeline:
    """
    Complete Production ML Pipeline
    Orchestrates all components of the ML system
    """
    
    def __init__(self, config: PipelineConfig):
        self.config = config
        self.data_ingestion = None
        self.feature_engineering = None
        self.model_training = None
        self.is_initialized = False
        
    def initialize(self, ingestion_config: Dict, 
                   feature_definitions: Dict[str, callable],
                   model_class, model_hyperparameters: Dict):
        """Initialize all pipeline components"""
        self.data_ingestion = DataIngestionComponent(ingestion_config)
        self.feature_engineering = FeatureEngineeringComponent(feature_definitions)
        self.model_training = ModelTrainingComponent(model_class, model_hyperparameters)
        self.is_initialized = True
        print(f"‚úÖ Pipeline '{self.config.name}' v{self.config.version} initialized")
        
    def run_training_pipeline(self, raw_data: pd.DataFrame, 
                               target_col: str,
                               feature_cols: List[str],
                               test_size: float = 0.2) -> Dict[str, Any]:
        """Execute full training pipeline"""
        if not self.is_initialized:
            raise ValueError("Pipeline not initialized")
        
        # Step 1: Data Ingestion
        print("üì• Step 1: Data Ingestion...")
        processed_data = self.data_ingestion.ingest(raw_data)
        
        # Step 2: Feature Engineering
        print("üîß Step 2: Feature Engineering...")
        features = self.feature_engineering.compute_features(processed_data)
        
        # Step 3: Prepare training data
        print("üìä Step 3: Preparing Training Data...")
        X = features[feature_cols].dropna()
        y = features.loc[X.index, target_col]
        
        X_train, X_val, y_train, y_val = train_test_split(
            X, y, test_size=test_size, random_state=42
        )
        
        # Step 4: Model Training
        print("üéØ Step 4: Model Training...")
        metrics = self.model_training.train(X_train.values, y_train.values,
                                            X_val.values, y_val.values)
        
        print(f"‚úÖ Training Complete!")
        print(f"   Training Accuracy: {metrics.get('train_accuracy', 0):.4f}")
        print(f"   Validation Accuracy: {metrics.get('val_accuracy', 0):.4f}")
        
        return {
            'metrics': metrics,
            'model': self.model_training.model,
            'feature_cols': feature_cols,
            'target_col': target_col
        }

# Example usage
print("Production ML Pipeline Architecture defined!")
print("Components: DataIngestion ‚Üí FeatureEngineering ‚Üí ModelTraining ‚Üí Serving")

In [None]:
# ============================================================================
# 1.3 Demo: Running the Production Pipeline
# ============================================================================

# Generate synthetic trading data
np.random.seed(42)
n_samples = 1000

dates = pd.date_range(start='2020-01-01', periods=n_samples, freq='D')
synthetic_data = pd.DataFrame({
    'date': dates,
    'open': 100 + np.cumsum(np.random.randn(n_samples) * 0.5),
    'high': 100 + np.cumsum(np.random.randn(n_samples) * 0.5) + np.abs(np.random.randn(n_samples)),
    'low': 100 + np.cumsum(np.random.randn(n_samples) * 0.5) - np.abs(np.random.randn(n_samples)),
    'close': 100 + np.cumsum(np.random.randn(n_samples) * 0.5),
    'volume': np.random.randint(1000000, 10000000, n_samples)
})

# Ensure high >= close >= low
synthetic_data['high'] = synthetic_data[['open', 'high', 'close']].max(axis=1)
synthetic_data['low'] = synthetic_data[['open', 'low', 'close']].min(axis=1)

# Create target: 1 if next day return > 0
synthetic_data['returns'] = synthetic_data['close'].pct_change()
synthetic_data['target'] = (synthetic_data['returns'].shift(-1) > 0).astype(int)

print("Synthetic Trading Data:")
print(synthetic_data.head(10))
print(f"\nShape: {synthetic_data.shape}")
print(f"\nTarget Distribution:\n{synthetic_data['target'].value_counts(normalize=True)}")

In [None]:
# Define feature engineering functions
def compute_sma_5(df):
    return df['close'].rolling(window=5).mean()

def compute_sma_20(df):
    return df['close'].rolling(window=20).mean()

def compute_rsi(df, period=14):
    delta = df['close'].diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
    rs = gain / loss
    return 100 - (100 / (1 + rs))

def compute_volatility(df, period=20):
    return df['returns'].rolling(window=period).std()

def compute_volume_ma(df, period=10):
    return df['volume'].rolling(window=period).mean()

# Feature definitions
feature_definitions = {
    'sma_5': compute_sma_5,
    'sma_20': compute_sma_20,
    'rsi': compute_rsi,
    'volatility': compute_volatility,
    'volume_ma': compute_volume_ma,
}

# Ingestion config with validation rules
ingestion_config = {
    'validation_rules': {
        'required_columns': ['open', 'high', 'low', 'close', 'volume'],
        'no_null_columns': ['close', 'volume'],
        'column_types': {
            'close': 'numeric',
            'volume': 'numeric'
        },
        'value_ranges': {
            'close': (0, 10000),
            'volume': (0, 1e12)
        }
    },
    'transforms': []
}

# Initialize and run pipeline
config = PipelineConfig(
    name="Trading Signal Predictor",
    version="1.0.0",
    data_source="synthetic",
    feature_config={},
    model_config={},
    serving_config={},
    monitoring_config={}
)

pipeline = ProductionMLPipeline(config)
pipeline.initialize(
    ingestion_config=ingestion_config,
    feature_definitions=feature_definitions,
    model_class=RandomForestClassifier,
    model_hyperparameters={'n_estimators': 100, 'max_depth': 5, 'random_state': 42}
)

# Run training
feature_cols = ['sma_5', 'sma_20', 'rsi', 'volatility', 'volume_ma']
results = pipeline.run_training_pipeline(
    raw_data=synthetic_data,
    target_col='target',
    feature_cols=feature_cols,
    test_size=0.2
)

---
## 2. Model Serving Patterns

### 2.1 Overview of Serving Patterns

In production trading systems, there are three main model serving patterns:

| Pattern | Latency | Use Case | Example |
|---------|---------|----------|---------|
| **Batch Prediction** | Minutes-Hours | End-of-day signals, portfolio rebalancing | Daily alpha generation |
| **Real-time Inference** | Milliseconds | Live trading decisions, order routing | HFT signal generation |
| **Streaming Prediction** | Sub-second | Continuous monitoring, event-driven trading | News-based trading |

### Key Considerations for Trading:
- **Latency Requirements**: HFT needs microseconds, swing trading can tolerate minutes
- **Throughput**: How many predictions per second?
- **Consistency**: Ensuring same features produce same predictions
- **Fault Tolerance**: What happens when the model service fails?

In [None]:
# ============================================================================
# 2.2 Model Serving Patterns Implementation
# ============================================================================

class BaseModelServer(ABC):
    """Abstract base class for model servers"""
    
    def __init__(self, model, feature_cols: List[str]):
        self.model = model
        self.feature_cols = feature_cols
        self.prediction_count = 0
        self.total_latency = 0
        
    @abstractmethod
    def predict(self, features: Dict[str, float]) -> Dict[str, Any]:
        pass
    
    def get_stats(self) -> Dict[str, float]:
        avg_latency = self.total_latency / self.prediction_count if self.prediction_count > 0 else 0
        return {
            'prediction_count': self.prediction_count,
            'avg_latency_ms': avg_latency * 1000
        }


class BatchPredictionServer(BaseModelServer):
    """
    Batch Prediction Server
    - Processes large volumes of data at once
    - Optimized for throughput over latency
    - Used for end-of-day predictions, portfolio construction
    """
    
    def __init__(self, model, feature_cols: List[str], batch_size: int = 1000):
        super().__init__(model, feature_cols)
        self.batch_size = batch_size
        
    def predict(self, features_df: pd.DataFrame) -> pd.DataFrame:
        """Batch predict on DataFrame"""
        import time
        start_time = time.time()
        
        # Ensure all required features present
        missing_cols = set(self.feature_cols) - set(features_df.columns)
        if missing_cols:
            raise ValueError(f"Missing features: {missing_cols}")
        
        # Process in batches
        all_predictions = []
        all_probabilities = []
        
        for i in range(0, len(features_df), self.batch_size):
            batch = features_df[self.feature_cols].iloc[i:i+self.batch_size]
            preds = self.model.predict(batch)
            probas = self.model.predict_proba(batch)[:, 1]
            all_predictions.extend(preds)
            all_probabilities.extend(probas)
        
        # Record stats
        end_time = time.time()
        self.prediction_count += len(features_df)
        self.total_latency += (end_time - start_time)
        
        result = features_df.copy()
        result['prediction'] = all_predictions
        result['probability'] = all_probabilities
        
        return result
    
    def predict_for_date(self, features_df: pd.DataFrame, date: str) -> pd.DataFrame:
        """Predict for specific date (typical batch use case)"""
        daily_features = features_df[features_df['date'] == date]
        return self.predict(daily_features)


class RealTimeInferenceServer(BaseModelServer):
    """
    Real-Time Inference Server
    - Single prediction at a time
    - Optimized for low latency
    - Used for live trading decisions
    """
    
    def __init__(self, model, feature_cols: List[str], cache_size: int = 100):
        super().__init__(model, feature_cols)
        self.prediction_cache = {}
        self.cache_size = cache_size
        
    def _get_cache_key(self, features: Dict[str, float]) -> str:
        """Generate cache key from features"""
        sorted_items = sorted(features.items())
        return hashlib.md5(str(sorted_items).encode()).hexdigest()
    
    def predict(self, features: Dict[str, float], use_cache: bool = True) -> Dict[str, Any]:
        """Real-time single prediction"""
        import time
        start_time = time.time()
        
        # Check cache
        if use_cache:
            cache_key = self._get_cache_key(features)
            if cache_key in self.prediction_cache:
                return self.prediction_cache[cache_key]
        
        # Prepare features
        feature_array = np.array([[features.get(col, np.nan) for col in self.feature_cols]])
        
        # Check for missing features
        if np.any(np.isnan(feature_array)):
            return {
                'prediction': None,
                'probability': None,
                'error': 'Missing or invalid features',
                'latency_ms': (time.time() - start_time) * 1000
            }
        
        # Predict
        prediction = int(self.model.predict(feature_array)[0])
        probability = float(self.model.predict_proba(feature_array)[0, 1])
        
        # Record stats
        end_time = time.time()
        latency = end_time - start_time
        self.prediction_count += 1
        self.total_latency += latency
        
        result = {
            'prediction': prediction,
            'probability': probability,
            'signal': 'BUY' if prediction == 1 else 'SELL',
            'confidence': abs(probability - 0.5) * 2,
            'latency_ms': latency * 1000,
            'timestamp': datetime.now().isoformat()
        }
        
        # Cache result
        if use_cache and len(self.prediction_cache) < self.cache_size:
            self.prediction_cache[cache_key] = result
        
        return result


class StreamingPredictionServer(BaseModelServer):
    """
    Streaming Prediction Server
    - Processes continuous stream of data
    - Maintains state across predictions
    - Used for event-driven trading, real-time monitoring
    """
    
    def __init__(self, model, feature_cols: List[str], window_size: int = 100):
        super().__init__(model, feature_cols)
        self.window_size = window_size
        self.feature_buffer = []
        self.prediction_history = []
        self.callbacks = []
        
    def add_callback(self, callback: callable):
        """Add callback for prediction events"""
        self.callbacks.append(callback)
    
    def _notify_callbacks(self, prediction_result: Dict[str, Any]):
        """Notify all registered callbacks"""
        for callback in self.callbacks:
            try:
                callback(prediction_result)
            except Exception as e:
                print(f"Callback error: {e}")
    
    def process_event(self, features: Dict[str, float]) -> Optional[Dict[str, Any]]:
        """Process incoming feature event"""
        import time
        start_time = time.time()
        
        # Add to buffer
        features['event_time'] = datetime.now()
        self.feature_buffer.append(features)
        
        # Keep buffer size manageable
        if len(self.feature_buffer) > self.window_size:
            self.feature_buffer = self.feature_buffer[-self.window_size:]
        
        # Make prediction
        feature_array = np.array([[features.get(col, np.nan) for col in self.feature_cols]])
        
        if np.any(np.isnan(feature_array)):
            return None
        
        prediction = int(self.model.predict(feature_array)[0])
        probability = float(self.model.predict_proba(feature_array)[0, 1])
        
        # Compute prediction momentum (trend in predictions)
        recent_preds = [p.get('prediction', 0) for p in self.prediction_history[-10:]]
        momentum = sum(recent_preds) / len(recent_preds) if recent_preds else 0.5
        
        end_time = time.time()
        self.prediction_count += 1
        self.total_latency += (end_time - start_time)
        
        result = {
            'prediction': prediction,
            'probability': probability,
            'signal': 'BUY' if prediction == 1 else 'SELL',
            'momentum': momentum,
            'buffer_size': len(self.feature_buffer),
            'latency_ms': (end_time - start_time) * 1000,
            'timestamp': datetime.now().isoformat()
        }
        
        self.prediction_history.append(result)
        
        # Notify callbacks
        self._notify_callbacks(result)
        
        return result

print("Model Serving Patterns defined:")
print("  ‚Ä¢ BatchPredictionServer - For end-of-day processing")
print("  ‚Ä¢ RealTimeInferenceServer - For live trading")
print("  ‚Ä¢ StreamingPredictionServer - For event-driven trading")

In [None]:
# ============================================================================
# 2.3 Demo: Model Serving Patterns in Action
# ============================================================================

# Prepare features for serving demo
features_with_target = pipeline.feature_engineering.compute_features(synthetic_data)
features_with_target = features_with_target.dropna()

# Initialize servers with trained model
batch_server = BatchPredictionServer(results['model'], feature_cols, batch_size=100)
realtime_server = RealTimeInferenceServer(results['model'], feature_cols)
streaming_server = StreamingPredictionServer(results['model'], feature_cols)

# Demo 1: Batch Prediction
print("=" * 60)
print("BATCH PREDICTION DEMO")
print("=" * 60)
batch_results = batch_server.predict(features_with_target)
print(f"Processed {len(batch_results)} predictions")
print(f"Stats: {batch_server.get_stats()}")
print(f"\nSample predictions:")
print(batch_results[['close', 'sma_5', 'rsi', 'prediction', 'probability']].head())

# Demo 2: Real-time Prediction
print("\n" + "=" * 60)
print("REAL-TIME INFERENCE DEMO")
print("=" * 60)
sample_features = features_with_target[feature_cols].iloc[100].to_dict()
print(f"Input features: {sample_features}")
rt_result = realtime_server.predict(sample_features)
print(f"Prediction: {rt_result}")
print(f"Stats: {realtime_server.get_stats()}")

# Demo 3: Streaming Prediction
print("\n" + "=" * 60)
print("STREAMING PREDICTION DEMO")
print("=" * 60)

# Add callback for streaming
def trading_callback(result):
    if result['probability'] > 0.7:
        print(f"  üü¢ Strong BUY signal: {result['probability']:.2%}")
    elif result['probability'] < 0.3:
        print(f"  üî¥ Strong SELL signal: {result['probability']:.2%}")

streaming_server.add_callback(trading_callback)

# Simulate stream of events
print("Processing stream of events...")
for i in range(5):
    event_features = features_with_target[feature_cols].iloc[100+i].to_dict()
    stream_result = streaming_server.process_event(event_features)
    print(f"Event {i+1}: Signal={stream_result['signal']}, Prob={stream_result['probability']:.2%}")

print(f"\nStreaming Stats: {streaming_server.get_stats()}")

---
## 3. Feature Store Concepts

### 3.1 What is a Feature Store?

A **Feature Store** is a centralized repository for storing, managing, and serving machine learning features. In quantitative trading, it serves as the bridge between raw market data and model-ready features.

### Key Benefits:
1. **Feature Reuse**: Compute once, use across multiple models
2. **Consistency**: Same features for training and inference
3. **Point-in-Time Correctness**: Prevent look-ahead bias
4. **Versioning**: Track feature definitions and values over time
5. **Low Latency Serving**: Pre-computed features for real-time inference

### Feature Store Architecture:
```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                      FEATURE STORE                          ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îÇ
‚îÇ  ‚îÇ Feature Registry‚îÇ          ‚îÇ Feature Catalog  ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Definitions   ‚îÇ          ‚îÇ - Search        ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Metadata      ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ - Documentation ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Versions      ‚îÇ          ‚îÇ - Lineage       ‚îÇ          ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ
‚îÇ                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îÇ
‚îÇ  ‚îÇ Offline Store   ‚îÇ          ‚îÇ  Online Store   ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Historical    ‚îÇ          ‚îÇ - Low Latency   ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Training Data ‚îÇ          ‚îÇ - Real-time     ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Batch Jobs    ‚îÇ‚óÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ - Serving       ‚îÇ          ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ
‚îÇ                                                              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

In [None]:
# ============================================================================
# 3.2 Feature Store Implementation
# ============================================================================

@dataclass
class FeatureDefinition:
    """Definition of a single feature"""
    name: str
    description: str
    dtype: str
    computation_func: callable
    dependencies: List[str] = field(default_factory=list)
    version: str = "1.0.0"
    created_at: datetime = field(default_factory=datetime.now)
    tags: List[str] = field(default_factory=list)
    
    def to_dict(self) -> Dict:
        return {
            'name': self.name,
            'description': self.description,
            'dtype': self.dtype,
            'dependencies': self.dependencies,
            'version': self.version,
            'created_at': self.created_at.isoformat(),
            'tags': self.tags
        }


class FeatureRegistry:
    """Registry for feature definitions"""
    
    def __init__(self):
        self.features: Dict[str, FeatureDefinition] = {}
        self.feature_groups: Dict[str, List[str]] = {}
        
    def register(self, feature: FeatureDefinition, group: Optional[str] = None):
        """Register a feature definition"""
        self.features[feature.name] = feature
        
        if group:
            if group not in self.feature_groups:
                self.feature_groups[group] = []
            self.feature_groups[group].append(feature.name)
        
        print(f"‚úÖ Registered feature: {feature.name} (v{feature.version})")
    
    def get(self, name: str) -> Optional[FeatureDefinition]:
        """Get feature definition by name"""
        return self.features.get(name)
    
    def list_features(self, group: Optional[str] = None) -> List[str]:
        """List all features or features in a group"""
        if group:
            return self.feature_groups.get(group, [])
        return list(self.features.keys())
    
    def search(self, query: str) -> List[FeatureDefinition]:
        """Search features by name or description"""
        results = []
        for feature in self.features.values():
            if query.lower() in feature.name.lower() or query.lower() in feature.description.lower():
                results.append(feature)
        return results


class OfflineFeatureStore:
    """
    Offline Feature Store for historical data
    - Used for training data preparation
    - Supports point-in-time queries (critical for avoiding look-ahead bias)
    """
    
    def __init__(self, registry: FeatureRegistry):
        self.registry = registry
        self.feature_data: Dict[str, pd.DataFrame] = {}
        self.computation_log = []
        
    def compute_and_store(self, feature_name: str, 
                          source_data: pd.DataFrame,
                          timestamp_col: str = 'date') -> pd.DataFrame:
        """Compute feature values and store them"""
        feature_def = self.registry.get(feature_name)
        if not feature_def:
            raise ValueError(f"Feature {feature_name} not registered")
        
        # Compute feature
        start_time = datetime.now()
        feature_values = feature_def.computation_func(source_data)
        end_time = datetime.now()
        
        # Create feature dataframe with timestamp
        feature_df = pd.DataFrame({
            timestamp_col: source_data[timestamp_col],
            feature_name: feature_values,
            '_computed_at': datetime.now(),
            '_version': feature_def.version
        })
        
        self.feature_data[feature_name] = feature_df
        
        # Log computation
        self.computation_log.append({
            'feature': feature_name,
            'computed_at': end_time,
            'duration_seconds': (end_time - start_time).total_seconds(),
            'row_count': len(feature_df)
        })
        
        return feature_df
    
    def get_features_point_in_time(self, feature_names: List[str],
                                    entity_df: pd.DataFrame,
                                    timestamp_col: str = 'date') -> pd.DataFrame:
        """
        Get features at specific points in time
        Critical for avoiding look-ahead bias in training data!
        """
        result = entity_df.copy()
        
        for feature_name in feature_names:
            if feature_name not in self.feature_data:
                print(f"‚ö†Ô∏è Feature {feature_name} not computed")
                result[feature_name] = np.nan
                continue
            
            feature_df = self.feature_data[feature_name]
            
            # Merge on timestamp (point-in-time join)
            result = result.merge(
                feature_df[[timestamp_col, feature_name]],
                on=timestamp_col,
                how='left'
            )
        
        return result
    
    def get_training_data(self, feature_names: List[str],
                          start_date: str, end_date: str,
                          timestamp_col: str = 'date') -> pd.DataFrame:
        """Get training data for date range"""
        # Create entity dataframe for date range
        dates = pd.date_range(start=start_date, end=end_date, freq='D')
        entity_df = pd.DataFrame({timestamp_col: dates})
        
        return self.get_features_point_in_time(feature_names, entity_df, timestamp_col)


class OnlineFeatureStore:
    """
    Online Feature Store for real-time serving
    - Low latency feature retrieval
    - Caching for frequently accessed features
    """
    
    def __init__(self, registry: FeatureRegistry, cache_ttl_seconds: int = 60):
        self.registry = registry
        self.feature_cache: Dict[str, Dict[str, Any]] = {}
        self.cache_ttl = cache_ttl_seconds
        self.cache_hits = 0
        self.cache_misses = 0
        
    def set_feature(self, entity_id: str, feature_name: str, value: float):
        """Set feature value for an entity (e.g., a symbol)"""
        cache_key = f"{entity_id}:{feature_name}"
        self.feature_cache[cache_key] = {
            'value': value,
            'timestamp': datetime.now(),
            'ttl': self.cache_ttl
        }
    
    def get_feature(self, entity_id: str, feature_name: str) -> Optional[float]:
        """Get feature value for an entity"""
        cache_key = f"{entity_id}:{feature_name}"
        
        if cache_key in self.feature_cache:
            cached = self.feature_cache[cache_key]
            age = (datetime.now() - cached['timestamp']).total_seconds()
            
            if age < cached['ttl']:
                self.cache_hits += 1
                return cached['value']
        
        self.cache_misses += 1
        return None
    
    def get_features(self, entity_id: str, feature_names: List[str]) -> Dict[str, float]:
        """Get multiple features for an entity"""
        return {
            name: self.get_feature(entity_id, name)
            for name in feature_names
        }
    
    def bulk_update(self, entity_id: str, features: Dict[str, float]):
        """Bulk update features for an entity"""
        for name, value in features.items():
            self.set_feature(entity_id, name, value)
    
    def get_cache_stats(self) -> Dict[str, Any]:
        """Get cache statistics"""
        total_requests = self.cache_hits + self.cache_misses
        hit_rate = self.cache_hits / total_requests if total_requests > 0 else 0
        
        return {
            'cache_hits': self.cache_hits,
            'cache_misses': self.cache_misses,
            'hit_rate': hit_rate,
            'cache_size': len(self.feature_cache)
        }


class TradingFeatureStore:
    """
    Unified Feature Store for Trading
    Combines offline and online stores with the registry
    """
    
    def __init__(self):
        self.registry = FeatureRegistry()
        self.offline_store = OfflineFeatureStore(self.registry)
        self.online_store = OnlineFeatureStore(self.registry)
        
    def register_feature(self, name: str, description: str, 
                         computation_func: callable,
                         dtype: str = 'float64',
                         group: Optional[str] = None,
                         tags: Optional[List[str]] = None):
        """Register a new feature"""
        feature = FeatureDefinition(
            name=name,
            description=description,
            dtype=dtype,
            computation_func=computation_func,
            tags=tags or []
        )
        self.registry.register(feature, group)
        
    def materialize_offline(self, feature_names: List[str],
                            source_data: pd.DataFrame,
                            timestamp_col: str = 'date'):
        """Compute and store features in offline store"""
        for name in feature_names:
            self.offline_store.compute_and_store(name, source_data, timestamp_col)
    
    def sync_to_online(self, entity_id: str, features: Dict[str, float]):
        """Sync features to online store"""
        self.online_store.bulk_update(entity_id, features)
    
    def get_training_features(self, feature_names: List[str],
                              start_date: str, end_date: str) -> pd.DataFrame:
        """Get features for model training"""
        return self.offline_store.get_training_data(feature_names, start_date, end_date)
    
    def get_serving_features(self, entity_id: str, 
                             feature_names: List[str]) -> Dict[str, float]:
        """Get features for real-time serving"""
        return self.online_store.get_features(entity_id, feature_names)

print("Feature Store implementation complete!")
print("Components: FeatureRegistry, OfflineStore, OnlineStore, TradingFeatureStore")

In [None]:
# ============================================================================
# 3.3 Demo: Feature Store in Action
# ============================================================================

# Initialize feature store
feature_store = TradingFeatureStore()

# Register trading features
feature_store.register_feature(
    name='sma_5',
    description='5-day Simple Moving Average of close price',
    computation_func=lambda df: df['close'].rolling(window=5).mean(),
    group='technical_indicators',
    tags=['price', 'trend', 'momentum']
)

feature_store.register_feature(
    name='sma_20',
    description='20-day Simple Moving Average of close price',
    computation_func=lambda df: df['close'].rolling(window=20).mean(),
    group='technical_indicators',
    tags=['price', 'trend', 'momentum']
)

feature_store.register_feature(
    name='rsi',
    description='14-day Relative Strength Index',
    computation_func=compute_rsi,
    group='technical_indicators',
    tags=['momentum', 'oscillator']
)

feature_store.register_feature(
    name='volatility',
    description='20-day rolling volatility of returns',
    computation_func=compute_volatility,
    group='risk_metrics',
    tags=['risk', 'volatility']
)

feature_store.register_feature(
    name='volume_ma',
    description='10-day moving average of volume',
    computation_func=compute_volume_ma,
    group='volume_indicators',
    tags=['volume', 'liquidity']
)

# Materialize features to offline store
print("\nüì¶ Materializing features to offline store...")
feature_store.materialize_offline(
    feature_names=['sma_5', 'sma_20', 'rsi', 'volatility', 'volume_ma'],
    source_data=synthetic_data,
    timestamp_col='date'
)

# Get training data
print("\nüìä Getting training data...")
training_data = feature_store.get_training_features(
    feature_names=['sma_5', 'sma_20', 'rsi'],
    start_date='2020-06-01',
    end_date='2020-12-31'
)
print(f"Training data shape: {training_data.shape}")
print(training_data.head())

# Sync latest features to online store for a symbol
print("\nüîÑ Syncing to online store...")
latest_features = {
    'sma_5': 105.23,
    'sma_20': 102.15,
    'rsi': 65.5,
    'volatility': 0.023,
    'volume_ma': 5000000
}
feature_store.sync_to_online('AAPL', latest_features)

# Get features for real-time serving
serving_features = feature_store.get_serving_features(
    entity_id='AAPL',
    feature_names=['sma_5', 'sma_20', 'rsi']
)
print(f"\nServing features for AAPL: {serving_features}")

# Search features
print("\nüîç Searching for 'momentum' features:")
momentum_features = feature_store.registry.search('momentum')
for f in momentum_features:
    print(f"  - {f.name}: {f.description}")

---
## 4. Model Monitoring and Drift Detection

### 4.1 Types of Drift in Production ML

In production trading systems, **drift** refers to changes in the statistical properties of data over time that can degrade model performance.

#### Types of Drift:

| Type | Description | Example in Trading |
|------|-------------|-------------------|
| **Data Drift** | Change in feature distributions | Volume patterns shift due to market structure changes |
| **Concept Drift** | Change in relationship between features and target | Market regime change (bull to bear) |
| **Prediction Drift** | Change in model output distribution | Model becomes overly bullish/bearish |
| **Label Drift** | Change in target variable distribution | Volatility regime change |

### 4.2 Statistical Tests for Drift Detection

Common statistical tests used for drift detection:

1. **Kolmogorov-Smirnov (KS) Test**: Compares two distributions
2. **Population Stability Index (PSI)**: Measures distribution shift
3. **Chi-Square Test**: For categorical variables
4. **Wasserstein Distance**: Earth mover's distance between distributions
5. **Jensen-Shannon Divergence**: Symmetric divergence measure

In [None]:
# ============================================================================
# 4.3 Drift Detection Implementation
# ============================================================================

class DriftDetector:
    """
    Comprehensive drift detection for production ML
    Implements multiple statistical tests for monitoring
    """
    
    def __init__(self, reference_data: pd.DataFrame, feature_cols: List[str]):
        self.reference_data = reference_data
        self.feature_cols = feature_cols
        self.thresholds = {
            'ks_statistic': 0.1,      # KS test threshold
            'psi': 0.2,               # PSI threshold (0.1 = slight, 0.2 = moderate)
            'js_divergence': 0.1      # Jensen-Shannon divergence threshold
        }
        self.drift_history = []
        
    def compute_psi(self, expected: np.ndarray, actual: np.ndarray, 
                    n_bins: int = 10) -> float:
        """
        Population Stability Index (PSI)
        PSI < 0.1: No significant shift
        PSI 0.1-0.2: Moderate shift, monitor closely
        PSI > 0.2: Significant shift, investigate
        """
        # Create bins based on expected distribution
        breakpoints = np.percentile(expected, np.linspace(0, 100, n_bins + 1))
        breakpoints[0] = -np.inf
        breakpoints[-1] = np.inf
        
        # Calculate proportions in each bin
        expected_counts = np.histogram(expected, bins=breakpoints)[0]
        actual_counts = np.histogram(actual, bins=breakpoints)[0]
        
        # Normalize to proportions
        expected_props = expected_counts / len(expected)
        actual_props = actual_counts / len(actual)
        
        # Avoid division by zero
        expected_props = np.where(expected_props == 0, 0.0001, expected_props)
        actual_props = np.where(actual_props == 0, 0.0001, actual_props)
        
        # Calculate PSI
        psi = np.sum((actual_props - expected_props) * np.log(actual_props / expected_props))
        
        return psi
    
    def compute_ks_statistic(self, expected: np.ndarray, actual: np.ndarray) -> Tuple[float, float]:
        """
        Kolmogorov-Smirnov test
        Returns (statistic, p-value)
        """
        statistic, p_value = ks_2samp(expected, actual)
        return statistic, p_value
    
    def compute_js_divergence(self, expected: np.ndarray, actual: np.ndarray,
                               n_bins: int = 50) -> float:
        """
        Jensen-Shannon Divergence
        Symmetric measure of distribution similarity
        """
        # Create histogram bins
        all_data = np.concatenate([expected, actual])
        bins = np.histogram_bin_edges(all_data, bins=n_bins)
        
        # Get distributions
        p = np.histogram(expected, bins=bins, density=True)[0]
        q = np.histogram(actual, bins=bins, density=True)[0]
        
        # Normalize
        p = p / p.sum() if p.sum() > 0 else p
        q = q / q.sum() if q.sum() > 0 else q
        
        # Add small epsilon to avoid log(0)
        epsilon = 1e-10
        p = p + epsilon
        q = q + epsilon
        
        # Compute JS divergence
        m = 0.5 * (p + q)
        js_div = 0.5 * (np.sum(p * np.log(p / m)) + np.sum(q * np.log(q / m)))
        
        return js_div
    
    def detect_feature_drift(self, current_data: pd.DataFrame) -> Dict[str, Dict[str, Any]]:
        """Detect drift for all features"""
        results = {}
        
        for col in self.feature_cols:
            if col not in current_data.columns or col not in self.reference_data.columns:
                continue
            
            reference = self.reference_data[col].dropna().values
            current = current_data[col].dropna().values
            
            if len(reference) < 10 or len(current) < 10:
                continue
            
            # Compute metrics
            psi = self.compute_psi(reference, current)
            ks_stat, ks_pvalue = self.compute_ks_statistic(reference, current)
            js_div = self.compute_js_divergence(reference, current)
            
            # Determine drift status
            drift_detected = (
                psi > self.thresholds['psi'] or 
                ks_stat > self.thresholds['ks_statistic'] or
                js_div > self.thresholds['js_divergence']
            )
            
            results[col] = {
                'psi': psi,
                'ks_statistic': ks_stat,
                'ks_pvalue': ks_pvalue,
                'js_divergence': js_div,
                'drift_detected': drift_detected,
                'reference_mean': np.mean(reference),
                'current_mean': np.mean(current),
                'reference_std': np.std(reference),
                'current_std': np.std(current)
            }
        
        # Log results
        self.drift_history.append({
            'timestamp': datetime.now(),
            'results': results
        })
        
        return results
    
    def get_drift_summary(self, results: Dict[str, Dict[str, Any]]) -> pd.DataFrame:
        """Create summary DataFrame of drift results"""
        summary_data = []
        
        for feature, metrics in results.items():
            summary_data.append({
                'feature': feature,
                'psi': metrics['psi'],
                'ks_statistic': metrics['ks_statistic'],
                'ks_pvalue': metrics['ks_pvalue'],
                'js_divergence': metrics['js_divergence'],
                'drift_detected': metrics['drift_detected'],
                'mean_shift': metrics['current_mean'] - metrics['reference_mean'],
                'std_ratio': metrics['current_std'] / metrics['reference_std'] if metrics['reference_std'] > 0 else 1
            })
        
        return pd.DataFrame(summary_data)


class ModelPerformanceMonitor:
    """
    Monitor model performance metrics over time
    Detect performance degradation
    """
    
    def __init__(self, baseline_metrics: Dict[str, float]):
        self.baseline_metrics = baseline_metrics
        self.performance_history = []
        self.alert_thresholds = {
            'accuracy_drop': 0.05,    # Alert if accuracy drops by 5%
            'auc_drop': 0.05,         # Alert if AUC drops by 5%
            'precision_drop': 0.1,    # Alert if precision drops by 10%
            'recall_drop': 0.1        # Alert if recall drops by 10%
        }
        
    def compute_metrics(self, y_true: np.ndarray, y_pred: np.ndarray,
                        y_proba: Optional[np.ndarray] = None) -> Dict[str, float]:
        """Compute performance metrics"""
        metrics = {
            'accuracy': accuracy_score(y_true, y_pred),
            'precision': precision_score(y_true, y_pred, zero_division=0),
            'recall': recall_score(y_true, y_pred, zero_division=0),
            'f1': f1_score(y_true, y_pred, zero_division=0)
        }
        
        if y_proba is not None:
            try:
                metrics['auc'] = roc_auc_score(y_true, y_proba)
            except:
                metrics['auc'] = 0.5
        
        return metrics
    
    def log_performance(self, y_true: np.ndarray, y_pred: np.ndarray,
                        y_proba: Optional[np.ndarray] = None,
                        timestamp: Optional[datetime] = None) -> Dict[str, Any]:
        """Log and analyze performance"""
        timestamp = timestamp or datetime.now()
        metrics = self.compute_metrics(y_true, y_pred, y_proba)
        
        # Check for degradation
        alerts = []
        for metric_name, current_value in metrics.items():
            baseline_value = self.baseline_metrics.get(metric_name, current_value)
            drop = baseline_value - current_value
            
            threshold_key = f"{metric_name}_drop"
            threshold = self.alert_thresholds.get(threshold_key, 0.1)
            
            if drop > threshold:
                alerts.append({
                    'metric': metric_name,
                    'baseline': baseline_value,
                    'current': current_value,
                    'drop': drop,
                    'threshold': threshold
                })
        
        result = {
            'timestamp': timestamp,
            'metrics': metrics,
            'alerts': alerts,
            'degraded': len(alerts) > 0
        }
        
        self.performance_history.append(result)
        
        return result
    
    def get_performance_trend(self, metric_name: str = 'accuracy') -> pd.DataFrame:
        """Get historical trend for a metric"""
        data = []
        for record in self.performance_history:
            data.append({
                'timestamp': record['timestamp'],
                'value': record['metrics'].get(metric_name, np.nan),
                'baseline': self.baseline_metrics.get(metric_name, np.nan)
            })
        return pd.DataFrame(data)

print("Drift Detection and Performance Monitoring implemented!")
print("Classes: DriftDetector, ModelPerformanceMonitor")

In [None]:
# ============================================================================
# 4.4 Demo: Drift Detection in Action
# ============================================================================

# Split data into reference (training) and current (production) periods
reference_period = features_with_target[features_with_target['date'] < '2020-07-01']
current_period = features_with_target[features_with_target['date'] >= '2020-07-01']

# Simulate drift by modifying current data
drifted_current = current_period.copy()
drifted_current['sma_5'] = drifted_current['sma_5'] * 1.2  # 20% increase
drifted_current['volatility'] = drifted_current['volatility'] * 1.5  # 50% increase
drifted_current['rsi'] = drifted_current['rsi'] + np.random.randn(len(drifted_current)) * 10

# Initialize drift detector
detector = DriftDetector(
    reference_data=reference_period,
    feature_cols=feature_cols
)

# Detect drift
print("=" * 60)
print("DRIFT DETECTION RESULTS")
print("=" * 60)

# Test with non-drifted data
print("\n1. Testing with non-drifted production data:")
results_no_drift = detector.detect_feature_drift(current_period)
summary_no_drift = detector.get_drift_summary(results_no_drift)
print(summary_no_drift.to_string(index=False))

# Test with drifted data
print("\n2. Testing with artificially drifted data:")
results_drift = detector.detect_feature_drift(drifted_current)
summary_drift = detector.get_drift_summary(results_drift)
print(summary_drift.to_string(index=False))

# Visualize drift
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: PSI comparison
ax1 = axes[0, 0]
x = range(len(feature_cols))
psi_no_drift = [results_no_drift[col]['psi'] for col in feature_cols]
psi_drift = [results_drift[col]['psi'] for col in feature_cols]
width = 0.35
ax1.bar([i - width/2 for i in x], psi_no_drift, width, label='No Drift', alpha=0.8)
ax1.bar([i + width/2 for i in x], psi_drift, width, label='With Drift', alpha=0.8)
ax1.axhline(y=0.2, color='r', linestyle='--', label='PSI Threshold')
ax1.set_xlabel('Feature')
ax1.set_ylabel('PSI Score')
ax1.set_title('Population Stability Index (PSI)')
ax1.set_xticks(x)
ax1.set_xticklabels(feature_cols, rotation=45)
ax1.legend()

# Plot 2: KS Statistics
ax2 = axes[0, 1]
ks_no_drift = [results_no_drift[col]['ks_statistic'] for col in feature_cols]
ks_drift = [results_drift[col]['ks_statistic'] for col in feature_cols]
ax2.bar([i - width/2 for i in x], ks_no_drift, width, label='No Drift', alpha=0.8)
ax2.bar([i + width/2 for i in x], ks_drift, width, label='With Drift', alpha=0.8)
ax2.axhline(y=0.1, color='r', linestyle='--', label='KS Threshold')
ax2.set_xlabel('Feature')
ax2.set_ylabel('KS Statistic')
ax2.set_title('Kolmogorov-Smirnov Test')
ax2.set_xticks(x)
ax2.set_xticklabels(feature_cols, rotation=45)
ax2.legend()

# Plot 3: Distribution comparison for sma_5
ax3 = axes[1, 0]
ax3.hist(reference_period['sma_5'].dropna(), bins=30, alpha=0.5, label='Reference', density=True)
ax3.hist(current_period['sma_5'].dropna(), bins=30, alpha=0.5, label='Current (No Drift)', density=True)
ax3.hist(drifted_current['sma_5'].dropna(), bins=30, alpha=0.5, label='Current (With Drift)', density=True)
ax3.set_xlabel('SMA-5 Value')
ax3.set_ylabel('Density')
ax3.set_title('SMA-5 Distribution Comparison')
ax3.legend()

# Plot 4: Distribution comparison for volatility
ax4 = axes[1, 1]
ax4.hist(reference_period['volatility'].dropna(), bins=30, alpha=0.5, label='Reference', density=True)
ax4.hist(current_period['volatility'].dropna(), bins=30, alpha=0.5, label='Current (No Drift)', density=True)
ax4.hist(drifted_current['volatility'].dropna(), bins=30, alpha=0.5, label='Current (With Drift)', density=True)
ax4.set_xlabel('Volatility Value')
ax4.set_ylabel('Density')
ax4.set_title('Volatility Distribution Comparison')
ax4.legend()

plt.tight_layout()
plt.show()

# Alert summary
print("\n" + "=" * 60)
print("DRIFT ALERTS")
print("=" * 60)
for col in feature_cols:
    if results_drift[col]['drift_detected']:
        print(f"‚ö†Ô∏è  DRIFT DETECTED in '{col}':")
        print(f"    PSI: {results_drift[col]['psi']:.4f}")
        print(f"    KS Statistic: {results_drift[col]['ks_statistic']:.4f}")

---
## 5. A/B Testing Framework

### 5.1 A/B Testing in Trading Systems

A/B testing in quantitative trading allows us to compare different models or strategies in a controlled manner. Unlike typical web A/B tests, trading A/B tests have unique challenges:

#### Key Considerations:
1. **Non-IID Data**: Market data is autocorrelated
2. **Market Impact**: Running multiple strategies may affect outcomes
3. **Capital Allocation**: How to split capital between strategies
4. **Statistical Power**: Need sufficient trades for significance
5. **Time-Based Effects**: Markets change over time

### A/B Testing Approaches for Trading:
| Approach | Description | Pros | Cons |
|----------|-------------|------|------|
| **Paper Trading** | Run new model in simulation | No real risk | May not capture market impact |
| **Shadow Mode** | Run alongside production, don't execute | Real data | No execution feedback |
| **Canary Deployment** | Small % of capital | Real validation | Lower statistical power |
| **Time-Split** | Alternate between models | Simple | Time confounds |

In [None]:
# ============================================================================
# 5.2 A/B Testing Framework Implementation
# ============================================================================

@dataclass
class ABTestConfig:
    """Configuration for A/B test"""
    test_name: str
    control_model_name: str
    treatment_model_name: str
    traffic_split: float = 0.5  # Fraction to treatment
    min_samples: int = 100
    confidence_level: float = 0.95
    primary_metric: str = 'sharpe_ratio'
    secondary_metrics: List[str] = field(default_factory=lambda: ['accuracy', 'returns'])


@dataclass  
class ABTestResult:
    """Results from an A/B test"""
    test_name: str
    control_metrics: Dict[str, float]
    treatment_metrics: Dict[str, float]
    statistical_significance: Dict[str, bool]
    p_values: Dict[str, float]
    effect_sizes: Dict[str, float]
    recommendation: str
    sample_sizes: Dict[str, int]


class TradingABTestFramework:
    """
    A/B Testing Framework for Trading Strategies
    Supports statistical comparison of models with multiple metrics
    """
    
    def __init__(self, config: ABTestConfig):
        self.config = config
        self.control_results = []
        self.treatment_results = []
        self.is_active = False
        self.start_time = None
        
    def start_test(self):
        """Start the A/B test"""
        self.is_active = True
        self.start_time = datetime.now()
        self.control_results = []
        self.treatment_results = []
        print(f"üöÄ A/B Test '{self.config.test_name}' started")
        print(f"   Control: {self.config.control_model_name}")
        print(f"   Treatment: {self.config.treatment_model_name}")
        print(f"   Traffic split: {(1-self.config.traffic_split)*100:.0f}% / {self.config.traffic_split*100:.0f}%")
        
    def assign_variant(self) -> str:
        """Randomly assign to control or treatment"""
        if np.random.random() < self.config.traffic_split:
            return 'treatment'
        return 'control'
    
    def log_result(self, variant: str, result: Dict[str, float]):
        """Log result for a variant"""
        if not self.is_active:
            print("‚ö†Ô∏è Test not active")
            return
        
        result['timestamp'] = datetime.now()
        if variant == 'control':
            self.control_results.append(result)
        else:
            self.treatment_results.append(result)
    
    def compute_trading_metrics(self, results: List[Dict]) -> Dict[str, float]:
        """Compute trading-specific metrics"""
        if not results:
            return {}
        
        returns = [r.get('return', 0) for r in results]
        predictions_correct = [r.get('correct', 0) for r in results]
        
        metrics = {
            'n_samples': len(results),
            'mean_return': np.mean(returns),
            'total_return': np.sum(returns),
            'std_return': np.std(returns),
            'sharpe_ratio': np.mean(returns) / np.std(returns) * np.sqrt(252) if np.std(returns) > 0 else 0,
            'win_rate': np.mean(predictions_correct) if predictions_correct else 0,
            'max_drawdown': self._compute_max_drawdown(returns),
            'sortino_ratio': self._compute_sortino_ratio(returns)
        }
        
        return metrics
    
    def _compute_max_drawdown(self, returns: List[float]) -> float:
        """Compute maximum drawdown"""
        if not returns:
            return 0
        cumulative = np.cumprod(1 + np.array(returns))
        running_max = np.maximum.accumulate(cumulative)
        drawdowns = (cumulative - running_max) / running_max
        return abs(min(drawdowns)) if len(drawdowns) > 0 else 0
    
    def _compute_sortino_ratio(self, returns: List[float], target: float = 0) -> float:
        """Compute Sortino ratio (downside deviation)"""
        returns = np.array(returns)
        downside_returns = returns[returns < target]
        downside_std = np.std(downside_returns) if len(downside_returns) > 0 else 1
        return (np.mean(returns) - target) / downside_std * np.sqrt(252) if downside_std > 0 else 0
    
    def run_statistical_tests(self, metric: str) -> Tuple[float, bool, float]:
        """
        Run statistical test for a specific metric
        Returns: (p_value, is_significant, effect_size)
        """
        control_values = [r.get(metric, 0) for r in self.control_results if metric in r]
        treatment_values = [r.get(metric, 0) for r in self.treatment_results if metric in r]
        
        if len(control_values) < 5 or len(treatment_values) < 5:
            return 1.0, False, 0.0
        
        # Two-sample t-test
        t_stat, p_value = stats.ttest_ind(treatment_values, control_values)
        
        # Effect size (Cohen's d)
        pooled_std = np.sqrt((np.var(control_values) + np.var(treatment_values)) / 2)
        effect_size = (np.mean(treatment_values) - np.mean(control_values)) / pooled_std if pooled_std > 0 else 0
        
        # Significance
        alpha = 1 - self.config.confidence_level
        is_significant = p_value < alpha
        
        return p_value, is_significant, effect_size
    
    def analyze_results(self) -> ABTestResult:
        """Comprehensive analysis of A/B test results"""
        # Compute metrics for both variants
        control_metrics = self.compute_trading_metrics(self.control_results)
        treatment_metrics = self.compute_trading_metrics(self.treatment_results)
        
        # Statistical tests for all metrics
        p_values = {}
        significance = {}
        effect_sizes = {}
        
        metrics_to_test = ['return', 'correct']
        for metric in metrics_to_test:
            p_val, is_sig, effect = self.run_statistical_tests(metric)
            p_values[metric] = p_val
            significance[metric] = is_sig
            effect_sizes[metric] = effect
        
        # Generate recommendation
        recommendation = self._generate_recommendation(
            control_metrics, treatment_metrics, significance
        )
        
        return ABTestResult(
            test_name=self.config.test_name,
            control_metrics=control_metrics,
            treatment_metrics=treatment_metrics,
            statistical_significance=significance,
            p_values=p_values,
            effect_sizes=effect_sizes,
            recommendation=recommendation,
            sample_sizes={
                'control': len(self.control_results),
                'treatment': len(self.treatment_results)
            }
        )
    
    def _generate_recommendation(self, control: Dict, treatment: Dict, 
                                  significance: Dict) -> str:
        """Generate deployment recommendation"""
        primary_metric = self.config.primary_metric
        
        if control.get('n_samples', 0) < self.config.min_samples:
            return "INSUFFICIENT_DATA: Need more samples for reliable conclusion"
        
        control_value = control.get(primary_metric, 0)
        treatment_value = treatment.get(primary_metric, 0)
        
        improvement = (treatment_value - control_value) / abs(control_value) if control_value != 0 else 0
        
        if significance.get('return', False) and improvement > 0.05:
            return f"DEPLOY_TREATMENT: {improvement*100:.1f}% improvement in {primary_metric} (statistically significant)"
        elif significance.get('return', False) and improvement < -0.05:
            return f"KEEP_CONTROL: Treatment shows {improvement*100:.1f}% degradation (statistically significant)"
        elif not any(significance.values()):
            return "NO_SIGNIFICANT_DIFFERENCE: Continue testing or consider other factors"
        else:
            return f"MIXED_RESULTS: Further analysis needed (improvement: {improvement*100:.1f}%)"
    
    def stop_test(self) -> ABTestResult:
        """Stop test and return final results"""
        self.is_active = False
        duration = datetime.now() - self.start_time if self.start_time else timedelta(0)
        print(f"\nüõë A/B Test '{self.config.test_name}' stopped after {duration}")
        return self.analyze_results()

print("A/B Testing Framework implemented!")
print("Classes: ABTestConfig, ABTestResult, TradingABTestFramework")

In [None]:
# ============================================================================
# 5.3 Demo: A/B Testing Trading Models
# ============================================================================

# Create two models with different characteristics
# Control: Random Forest (baseline)
# Treatment: Gradient Boosting (challenger)

# Prepare data
X = features_with_target[feature_cols].dropna()
y = features_with_target.loc[X.index, 'target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train models
control_model = RandomForestClassifier(n_estimators=50, max_depth=5, random_state=42)
treatment_model = GradientBoostingClassifier(n_estimators=50, max_depth=5, random_state=42)

control_model.fit(X_train, y_train)
treatment_model.fit(X_train, y_train)

# Configure A/B test
ab_config = ABTestConfig(
    test_name="RF_vs_GBM_Trading_Model",
    control_model_name="RandomForest_v1",
    treatment_model_name="GradientBoosting_v1",
    traffic_split=0.5,
    min_samples=50,
    confidence_level=0.95,
    primary_metric='sharpe_ratio'
)

# Initialize and start test
ab_framework = TradingABTestFramework(ab_config)
ab_framework.start_test()

# Simulate trading decisions
print("\nüìä Simulating trading decisions...")
np.random.seed(42)

for i in range(len(X_test)):
    features = X_test.iloc[i:i+1]
    actual_direction = y_test.iloc[i]
    
    # Assign to variant
    variant = ab_framework.assign_variant()
    
    # Get prediction based on variant
    if variant == 'control':
        prediction = control_model.predict(features)[0]
        prob = control_model.predict_proba(features)[0, 1]
    else:
        prediction = treatment_model.predict(features)[0]
        prob = treatment_model.predict_proba(features)[0, 1]
    
    # Simulate return based on prediction correctness
    # If prediction correct, positive return; otherwise negative
    base_return = np.random.randn() * 0.02  # 2% daily volatility
    if prediction == actual_direction:
        trade_return = abs(base_return) * 0.5 + 0.002  # Small positive edge
        correct = 1
    else:
        trade_return = -abs(base_return) * 0.3 - 0.001  # Loss
        correct = 0
    
    # Log result
    ab_framework.log_result(variant, {
        'return': trade_return,
        'correct': correct,
        'probability': prob,
        'prediction': prediction,
        'actual': actual_direction
    })

# Analyze results
print("\n" + "=" * 60)
print("A/B TEST RESULTS")
print("=" * 60)

results = ab_framework.stop_test()

print(f"\nüìà Control Model ({ab_config.control_model_name}):")
for metric, value in results.control_metrics.items():
    print(f"   {metric}: {value:.4f}")

print(f"\nüìà Treatment Model ({ab_config.treatment_model_name}):")
for metric, value in results.treatment_metrics.items():
    print(f"   {metric}: {value:.4f}")

print(f"\nüìä Statistical Significance:")
for metric, is_sig in results.statistical_significance.items():
    p_val = results.p_values[metric]
    effect = results.effect_sizes[metric]
    sig_str = "‚úÖ Significant" if is_sig else "‚ùå Not significant"
    print(f"   {metric}: {sig_str} (p={p_val:.4f}, effect size={effect:.3f})")

print(f"\nüéØ Recommendation: {results.recommendation}")

# Visualize results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Sample sizes
ax1 = axes[0]
variants = ['Control', 'Treatment']
sizes = [results.sample_sizes['control'], results.sample_sizes['treatment']]
ax1.bar(variants, sizes, color=['blue', 'orange'], alpha=0.7)
ax1.set_ylabel('Number of Samples')
ax1.set_title('Sample Sizes by Variant')
for i, v in enumerate(sizes):
    ax1.text(i, v + 1, str(v), ha='center')

# Plot 2: Key metrics comparison
ax2 = axes[1]
metrics = ['mean_return', 'sharpe_ratio', 'win_rate']
x = np.arange(len(metrics))
width = 0.35
control_vals = [results.control_metrics.get(m, 0) for m in metrics]
treatment_vals = [results.treatment_metrics.get(m, 0) for m in metrics]
ax2.bar(x - width/2, control_vals, width, label='Control', alpha=0.7)
ax2.bar(x + width/2, treatment_vals, width, label='Treatment', alpha=0.7)
ax2.set_ylabel('Value')
ax2.set_title('Metrics Comparison')
ax2.set_xticks(x)
ax2.set_xticklabels(metrics, rotation=45)
ax2.legend()

# Plot 3: P-values
ax3 = axes[2]
p_metrics = list(results.p_values.keys())
p_vals = list(results.p_values.values())
colors = ['green' if p < 0.05 else 'red' for p in p_vals]
ax3.barh(p_metrics, p_vals, color=colors, alpha=0.7)
ax3.axvline(x=0.05, color='black', linestyle='--', label='Œ± = 0.05')
ax3.set_xlabel('P-Value')
ax3.set_title('Statistical Significance')
ax3.legend()

plt.tight_layout()
plt.show()

---
## 6. Model Versioning and Registry

### 6.1 Importance of Model Versioning

In production trading systems, proper model versioning is critical for:

1. **Reproducibility**: Recreate exact model states
2. **Auditability**: Track what model made which decisions
3. **Rollback**: Quickly revert to previous versions
4. **Compliance**: Regulatory requirements for model governance
5. **Experimentation**: Track experiments and compare results

### Model Registry Components:
```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                     MODEL REGISTRY                          ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îÇ
‚îÇ  ‚îÇ  Model Metadata ‚îÇ          ‚îÇ  Model Artifacts ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ  - Name         ‚îÇ          ‚îÇ  - Weights       ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ  - Version      ‚îÇ          ‚îÇ  - Preprocessors ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ  - Author       ‚îÇ          ‚îÇ  - Configs       ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ  - Created At   ‚îÇ          ‚îÇ  - Dependencies  ‚îÇ          ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ
‚îÇ                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îÇ
‚îÇ  ‚îÇ Training Info   ‚îÇ          ‚îÇ   Performance   ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Parameters    ‚îÇ          ‚îÇ   - Metrics     ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Data Version  ‚îÇ          ‚îÇ   - Validation  ‚îÇ          ‚îÇ
‚îÇ  ‚îÇ - Git Commit    ‚îÇ          ‚îÇ   - Monitoring  ‚îÇ          ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ
‚îÇ                                                              ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê‚îÇ
‚îÇ  ‚îÇ                    Lifecycle Stage                      ‚îÇ‚îÇ
‚îÇ  ‚îÇ  [Development] ‚Üí [Staging] ‚Üí [Production] ‚Üí [Archived] ‚îÇ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò‚îÇ
‚îÇ                                                              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

In [None]:
# ============================================================================
# 6.2 Model Registry Implementation
# ============================================================================

from enum import Enum

class ModelStage(Enum):
    """Model lifecycle stages"""
    DEVELOPMENT = "development"
    STAGING = "staging"
    PRODUCTION = "production"
    ARCHIVED = "archived"


@dataclass
class ModelVersion:
    """Represents a specific version of a model"""
    model_name: str
    version: str
    stage: ModelStage
    created_at: datetime
    author: str
    description: str
    
    # Training info
    hyperparameters: Dict[str, Any]
    training_data_version: str
    training_data_hash: str
    git_commit: Optional[str] = None
    
    # Performance metrics
    metrics: Dict[str, float] = field(default_factory=dict)
    
    # Artifacts
    model_artifact_path: Optional[str] = None
    preprocessor_path: Optional[str] = None
    
    # Tags and metadata
    tags: List[str] = field(default_factory=list)
    metadata: Dict[str, Any] = field(default_factory=dict)
    
    def to_dict(self) -> Dict:
        return {
            'model_name': self.model_name,
            'version': self.version,
            'stage': self.stage.value,
            'created_at': self.created_at.isoformat(),
            'author': self.author,
            'description': self.description,
            'hyperparameters': self.hyperparameters,
            'training_data_version': self.training_data_version,
            'training_data_hash': self.training_data_hash,
            'git_commit': self.git_commit,
            'metrics': self.metrics,
            'tags': self.tags,
            'metadata': self.metadata
        }


class ModelRegistry:
    """
    Comprehensive Model Registry
    Tracks model versions, artifacts, and lifecycle
    """
    
    def __init__(self, storage_path: str = "./model_registry"):
        self.storage_path = Path(storage_path)
        self.storage_path.mkdir(parents=True, exist_ok=True)
        
        self.models: Dict[str, Dict[str, ModelVersion]] = {}  # name -> version -> ModelVersion
        self.production_models: Dict[str, str] = {}  # name -> production version
        self.registry_log = []
        
    def _generate_version(self, model_name: str) -> str:
        """Generate next version number"""
        if model_name not in self.models:
            return "1.0.0"
        
        versions = list(self.models[model_name].keys())
        if not versions:
            return "1.0.0"
        
        # Parse latest version and increment
        latest = max(versions, key=lambda v: [int(x) for x in v.split('.')])
        parts = [int(x) for x in latest.split('.')]
        parts[2] += 1  # Increment patch version
        return '.'.join(map(str, parts))
    
    def _compute_data_hash(self, data: pd.DataFrame) -> str:
        """Compute hash of training data for versioning"""
        data_str = data.to_json()
        return hashlib.md5(data_str.encode()).hexdigest()[:12]
    
    def register_model(self, model_name: str, model: Any,
                       hyperparameters: Dict[str, Any],
                       metrics: Dict[str, float],
                       training_data: pd.DataFrame,
                       author: str = "system",
                       description: str = "",
                       tags: Optional[List[str]] = None,
                       preprocessor: Any = None) -> ModelVersion:
        """Register a new model version"""
        
        # Generate version
        version = self._generate_version(model_name)
        
        # Compute data hash
        data_hash = self._compute_data_hash(training_data)
        
        # Create version entry
        model_version = ModelVersion(
            model_name=model_name,
            version=version,
            stage=ModelStage.DEVELOPMENT,
            created_at=datetime.now(),
            author=author,
            description=description,
            hyperparameters=hyperparameters,
            training_data_version=f"v_{data_hash}",
            training_data_hash=data_hash,
            metrics=metrics,
            tags=tags or []
        )
        
        # Save model artifacts
        model_dir = self.storage_path / model_name / version
        model_dir.mkdir(parents=True, exist_ok=True)
        
        # Serialize model
        model_path = model_dir / "model.pkl"
        with open(model_path, 'wb') as f:
            pickle.dump(model, f)
        model_version.model_artifact_path = str(model_path)
        
        # Serialize preprocessor if provided
        if preprocessor:
            preprocessor_path = model_dir / "preprocessor.pkl"
            with open(preprocessor_path, 'wb') as f:
                pickle.dump(preprocessor, f)
            model_version.preprocessor_path = str(preprocessor_path)
        
        # Save metadata
        metadata_path = model_dir / "metadata.json"
        with open(metadata_path, 'w') as f:
            json.dump(model_version.to_dict(), f, indent=2)
        
        # Store in registry
        if model_name not in self.models:
            self.models[model_name] = {}
        self.models[model_name][version] = model_version
        
        # Log
        self.registry_log.append({
            'action': 'register',
            'model_name': model_name,
            'version': version,
            'timestamp': datetime.now()
        })
        
        print(f"‚úÖ Registered {model_name} v{version}")
        return model_version
    
    def transition_stage(self, model_name: str, version: str, 
                         new_stage: ModelStage) -> bool:
        """Transition model to a new lifecycle stage"""
        if model_name not in self.models or version not in self.models[model_name]:
            print(f"‚ùå Model {model_name} v{version} not found")
            return False
        
        model_version = self.models[model_name][version]
        old_stage = model_version.stage
        
        # If promoting to production, archive current production model
        if new_stage == ModelStage.PRODUCTION:
            if model_name in self.production_models:
                old_prod_version = self.production_models[model_name]
                if old_prod_version != version:
                    self.models[model_name][old_prod_version].stage = ModelStage.ARCHIVED
                    print(f"üì¶ Archived {model_name} v{old_prod_version}")
            
            self.production_models[model_name] = version
        
        model_version.stage = new_stage
        
        # Log
        self.registry_log.append({
            'action': 'transition',
            'model_name': model_name,
            'version': version,
            'old_stage': old_stage.value,
            'new_stage': new_stage.value,
            'timestamp': datetime.now()
        })
        
        print(f"‚úÖ Transitioned {model_name} v{version}: {old_stage.value} ‚Üí {new_stage.value}")
        return True
    
    def get_model(self, model_name: str, version: Optional[str] = None,
                  stage: Optional[ModelStage] = None) -> Tuple[Any, ModelVersion]:
        """Load a model from registry"""
        if model_name not in self.models:
            raise ValueError(f"Model {model_name} not found")
        
        # Determine version to load
        if version:
            target_version = version
        elif stage == ModelStage.PRODUCTION and model_name in self.production_models:
            target_version = self.production_models[model_name]
        elif stage:
            # Find latest version in requested stage
            versions = [v for v, mv in self.models[model_name].items() if mv.stage == stage]
            if not versions:
                raise ValueError(f"No model in stage {stage.value}")
            target_version = max(versions)
        else:
            # Get latest version
            target_version = max(self.models[model_name].keys())
        
        model_version = self.models[model_name][target_version]
        
        # Load model artifact
        with open(model_version.model_artifact_path, 'rb') as f:
            model = pickle.load(f)
        
        return model, model_version
    
    def list_models(self, model_name: Optional[str] = None,
                    stage: Optional[ModelStage] = None) -> pd.DataFrame:
        """List registered models"""
        records = []
        
        for name, versions in self.models.items():
            if model_name and name != model_name:
                continue
            
            for version, mv in versions.items():
                if stage and mv.stage != stage:
                    continue
                
                records.append({
                    'model_name': name,
                    'version': version,
                    'stage': mv.stage.value,
                    'created_at': mv.created_at,
                    'author': mv.author,
                    **mv.metrics
                })
        
        return pd.DataFrame(records)
    
    def compare_versions(self, model_name: str, 
                         version_a: str, version_b: str) -> pd.DataFrame:
        """Compare two model versions"""
        if model_name not in self.models:
            raise ValueError(f"Model {model_name} not found")
        
        mv_a = self.models[model_name].get(version_a)
        mv_b = self.models[model_name].get(version_b)
        
        if not mv_a or not mv_b:
            raise ValueError("One or both versions not found")
        
        comparison = []
        
        # Compare metrics
        all_metrics = set(mv_a.metrics.keys()) | set(mv_b.metrics.keys())
        for metric in all_metrics:
            val_a = mv_a.metrics.get(metric, np.nan)
            val_b = mv_b.metrics.get(metric, np.nan)
            diff = val_b - val_a if not np.isnan(val_a) and not np.isnan(val_b) else np.nan
            comparison.append({
                'metric': metric,
                f'v{version_a}': val_a,
                f'v{version_b}': val_b,
                'difference': diff,
                'improvement': '‚úÖ' if diff > 0 else ('‚ùå' if diff < 0 else '‚û°Ô∏è')
            })
        
        return pd.DataFrame(comparison)
    
    def get_production_model(self, model_name: str) -> Tuple[Any, ModelVersion]:
        """Get the production model for a given name"""
        return self.get_model(model_name, stage=ModelStage.PRODUCTION)

print("Model Registry implementation complete!")
print("Features: Versioning, Stage Management, Artifact Storage, Comparison")

In [None]:
# ============================================================================
# 6.3 Demo: Model Registry in Action
# ============================================================================

# Initialize registry
registry = ModelRegistry(storage_path="./demo_model_registry")

# Prepare training data
X_demo = features_with_target[feature_cols].dropna()
y_demo = features_with_target.loc[X_demo.index, 'target']
X_train_demo, X_test_demo, y_train_demo, y_test_demo = train_test_split(
    X_demo, y_demo, test_size=0.2, random_state=42
)

# Train and register multiple model versions
print("=" * 60)
print("REGISTERING MODELS")
print("=" * 60)

# Version 1: Simple Random Forest
model_v1 = RandomForestClassifier(n_estimators=50, max_depth=3, random_state=42)
model_v1.fit(X_train_demo, y_train_demo)
y_pred_v1 = model_v1.predict(X_test_demo)
y_proba_v1 = model_v1.predict_proba(X_test_demo)[:, 1]

metrics_v1 = {
    'accuracy': accuracy_score(y_test_demo, y_pred_v1),
    'precision': precision_score(y_test_demo, y_pred_v1, zero_division=0),
    'recall': recall_score(y_test_demo, y_pred_v1, zero_division=0),
    'auc': roc_auc_score(y_test_demo, y_proba_v1)
}

mv1 = registry.register_model(
    model_name="TradingSignalModel",
    model=model_v1,
    hyperparameters={'n_estimators': 50, 'max_depth': 3},
    metrics=metrics_v1,
    training_data=features_with_target,
    author="alice",
    description="Initial Random Forest model with basic features",
    tags=['baseline', 'random_forest']
)

# Version 2: Improved Random Forest
model_v2 = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model_v2.fit(X_train_demo, y_train_demo)
y_pred_v2 = model_v2.predict(X_test_demo)
y_proba_v2 = model_v2.predict_proba(X_test_demo)[:, 1]

metrics_v2 = {
    'accuracy': accuracy_score(y_test_demo, y_pred_v2),
    'precision': precision_score(y_test_demo, y_pred_v2, zero_division=0),
    'recall': recall_score(y_test_demo, y_pred_v2, zero_division=0),
    'auc': roc_auc_score(y_test_demo, y_proba_v2)
}

mv2 = registry.register_model(
    model_name="TradingSignalModel",
    model=model_v2,
    hyperparameters={'n_estimators': 100, 'max_depth': 5},
    metrics=metrics_v2,
    training_data=features_with_target,
    author="bob",
    description="Improved model with more trees and depth",
    tags=['improved', 'random_forest']
)

# Version 3: Gradient Boosting
model_v3 = GradientBoostingClassifier(n_estimators=100, max_depth=4, random_state=42)
model_v3.fit(X_train_demo, y_train_demo)
y_pred_v3 = model_v3.predict(X_test_demo)
y_proba_v3 = model_v3.predict_proba(X_test_demo)[:, 1]

metrics_v3 = {
    'accuracy': accuracy_score(y_test_demo, y_pred_v3),
    'precision': precision_score(y_test_demo, y_pred_v3, zero_division=0),
    'recall': recall_score(y_test_demo, y_pred_v3, zero_division=0),
    'auc': roc_auc_score(y_test_demo, y_proba_v3)
}

mv3 = registry.register_model(
    model_name="TradingSignalModel",
    model=model_v3,
    hyperparameters={'n_estimators': 100, 'max_depth': 4, 'algorithm': 'gradient_boosting'},
    metrics=metrics_v3,
    training_data=features_with_target,
    author="charlie",
    description="Gradient Boosting model for comparison",
    tags=['experiment', 'gradient_boosting']
)

# List all registered models
print("\n" + "=" * 60)
print("REGISTERED MODELS")
print("=" * 60)
models_df = registry.list_models()
print(models_df.to_string(index=False))

In [None]:
# Lifecycle management - promote models through stages
print("\n" + "=" * 60)
print("MODEL LIFECYCLE MANAGEMENT")
print("=" * 60)

# Promote v1.0.2 to staging
registry.transition_stage("TradingSignalModel", "1.0.2", ModelStage.STAGING)

# Promote v1.0.2 to production (best performance)
registry.transition_stage("TradingSignalModel", "1.0.2", ModelStage.PRODUCTION)

# Compare model versions
print("\n" + "=" * 60)
print("VERSION COMPARISON: v1.0.0 vs v1.0.2")
print("=" * 60)
comparison = registry.compare_versions("TradingSignalModel", "1.0.0", "1.0.2")
print(comparison.to_string(index=False))

# Load production model
print("\n" + "=" * 60)
print("LOADING PRODUCTION MODEL")
print("=" * 60)
prod_model, prod_version = registry.get_production_model("TradingSignalModel")
print(f"Loaded: {prod_version.model_name} v{prod_version.version}")
print(f"Stage: {prod_version.stage.value}")
print(f"Created: {prod_version.created_at}")
print(f"Metrics: {prod_version.metrics}")

# Visualize model comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Metrics by version
ax1 = axes[0]
versions = ['1.0.0', '1.0.1', '1.0.2']
metrics_names = ['accuracy', 'precision', 'recall', 'auc']
x = np.arange(len(metrics_names))
width = 0.25

for i, v in enumerate(versions):
    mv = registry.models["TradingSignalModel"][v]
    values = [mv.metrics.get(m, 0) for m in metrics_names]
    ax1.bar(x + i*width, values, width, label=f'v{v}', alpha=0.8)

ax1.set_ylabel('Score')
ax1.set_title('Model Metrics by Version')
ax1.set_xticks(x + width)
ax1.set_xticklabels(metrics_names)
ax1.legend()
ax1.set_ylim(0, 1)

# Plot 2: Model lifecycle
ax2 = axes[1]
stages = ['development', 'staging', 'production', 'archived']
stage_colors = {'development': 'blue', 'staging': 'orange', 'production': 'green', 'archived': 'gray'}

for i, v in enumerate(versions):
    mv = registry.models["TradingSignalModel"][v]
    stage = mv.stage.value
    color = stage_colors[stage]
    ax2.barh(v, 1, color=color, alpha=0.7, label=stage if stage not in [bar.get_label() for bar in ax2.patches] else '')
    ax2.text(0.5, i, f"{stage.upper()}", ha='center', va='center', fontweight='bold', color='white')

ax2.set_xlabel('Lifecycle Position')
ax2.set_title('Model Lifecycle Stages')
ax2.set_xlim(0, 1)
handles = [plt.Rectangle((0,0),1,1, color=c, alpha=0.7) for c in stage_colors.values()]
ax2.legend(handles, stage_colors.keys(), loc='upper right')

plt.tight_layout()
plt.show()

# Show registry log
print("\n" + "=" * 60)
print("REGISTRY ACTIVITY LOG")
print("=" * 60)
for entry in registry.registry_log[-5:]:
    print(f"  [{entry['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}] "
          f"{entry['action'].upper()}: {entry['model_name']} "
          f"v{entry.get('version', 'N/A')}")

---
## 7. Summary and Key Takeaways

### Production ML Systems for Trading

We've covered the essential components of production ML systems:

### 1. Pipeline Architecture
- **Data Ingestion**: Validate and transform raw market data
- **Feature Engineering**: Compute reproducible features
- **Model Training**: Automated training with validation
- **Serving**: Batch, real-time, and streaming inference

### 2. Model Serving Patterns
| Pattern | Use Case | Latency |
|---------|----------|---------|
| Batch | End-of-day signals | Minutes |
| Real-time | Live trading | Milliseconds |
| Streaming | Event-driven | Sub-second |

### 3. Feature Store
- Centralized feature management
- Point-in-time correctness prevents look-ahead bias
- Online/offline serving for different use cases

### 4. Drift Detection
- **PSI**: Population Stability Index for distribution shifts
- **KS Test**: Statistical comparison of distributions
- Monitor both data drift and model performance

### 5. A/B Testing
- Compare model variants with statistical rigor
- Use trading-specific metrics (Sharpe, win rate)
- Shadow deployment before full rollout

### 6. Model Registry
- Version control for models
- Lifecycle management (dev ‚Üí staging ‚Üí production)
- Audit trail for compliance

---

### Best Practices

1. **Always version everything**: Data, features, models, configs
2. **Monitor continuously**: Set up alerts for drift and performance
3. **Test before deploying**: A/B test with statistical significance
4. **Document thoroughly**: Maintain audit trails
5. **Plan for failure**: Implement rollback mechanisms
6. **Automate**: CI/CD pipelines for reproducibility

### Next Steps
- Study `Trading_Strategy.ipynb` for practical implementation
- Explore MLflow for production model tracking
- Learn about Kubernetes for scalable deployments
- Practice with real market data and backtesting