# 🔄 Model Performance Monitoring & Retraining Strategy

## 🎯 Objective
Implement a comprehensive **model monitoring and retraining system** to detect performance degradation and automatically trigger retraining when necessary.

## 🔧 Monitoring Framework
This notebook implements advanced ML operations (MLOps) practices:

1. **Performance Degradation Detection** - Monitor RMSE, MAE, and R² over time
2. **Data Drift Detection** - Statistical tests for input feature distribution changes
3. **Concept Drift Detection** - Target variable relationship changes
4. **Automated Retraining Triggers** - Smart thresholds and decision logic
5. **Real-World Simulation** - Test monitoring system with synthetic scenarios

## 🚨 Key Questions Addressed
- **When should we retrain?** Performance thresholds and drift detection
- **What triggers retraining?** Multiple monitoring signals and decision logic
- **How to prevent false alarms?** Statistical significance and confidence intervals
- **Production deployment?** Monitoring dashboard and alerting system

## 📊 Monitoring Metrics
- **Performance Metrics**: RMSE, MAE, R², MAPE tracking
- **Drift Metrics**: KS-test, PSI, distribution comparisons
- **Business Metrics**: Prediction accuracy degradation rates
- **System Metrics**: Inference latency and throughput

---

## 📋 Implementation Pipeline

In [1]:
# Essential imports for monitoring and retraining
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Statistical tests and monitoring
from scipy import stats
from scipy.stats import ks_2samp, chi2_contingency
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import TimeSeriesSplit

# Model loading and utilities
import joblib
import json
import os
import sys
from datetime import datetime, timedelta
from typing import Dict, List, Tuple, Optional

# Add project root to path
sys.path.append('..')
from src.data_utils import load_hanoi_weather_data

# Visualization and dashboard
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
plt.style.use('seaborn-v0_8')

print("🔄 MODEL MONITORING & RETRAINING SYSTEM")
print("=" * 50)
print("📊 Performance degradation detection")
print("📈 Data and concept drift monitoring")
print("🚨 Automated retraining triggers")
print("📋 Production monitoring dashboard")
print("🧪 Real-world simulation testing")
print("=" * 50)

🔄 MODEL MONITORING & RETRAINING SYSTEM
📊 Performance degradation detection
📈 Data and concept drift monitoring
🚨 Automated retraining triggers
📋 Production monitoring dashboard
🧪 Real-world simulation testing


## 1. Load Trained Model and Setup Monitoring Environment

In [2]:
# Load the best trained model from Step 6
models_dir = '../models/trained'

try:
    # Load model metadata
    with open(f'{models_dir}/model_metadata.json', 'r') as f:
        model_metadata = json.load(f)
    
    # Load the trained model
    model_files = [f for f in os.listdir(models_dir) if f.startswith('best_model_') and f.endswith('.joblib')]
    if model_files:
        model_path = os.path.join(models_dir, model_files[0])
        trained_model = joblib.load(model_path)
        print(f"✅ Loaded model: {model_path}")
    else:
        raise FileNotFoundError("No trained model found")
    
    # Load feature columns
    feature_columns = joblib.load(f'{models_dir}/feature_columns.joblib')
    print(f"📋 Loaded {len(feature_columns)} feature columns")
    
    # Display model info
    print(f"\n🤖 Model Information:")
    print(f"• Model: {model_metadata['model_name']}")
    print(f"• Training date: {model_metadata['training_date'][:10]}")
    print(f"• Validation RMSE: {model_metadata['validation_performance']['rmse']:.4f}°C")
    print(f"• Test RMSE: {model_metadata['test_performance']['rmse']:.4f}°C")
    print(f"• Features: {model_metadata['feature_count']}")
    print(f"• Forecast horizon: {model_metadata['forecast_horizon']} days")
    
except Exception as e:
    print(f"❌ Error loading model: {str(e)}")
    print("Please run the model training notebook (03_model_training_comprehensive.ipynb) first")
    raise

✅ Loaded model: ../models/trained\best_model_adaboost_optimized.joblib
📋 Loaded 79 feature columns

🤖 Model Information:
• Model: AdaBoost (Optimized)
• Training date: 2025-10-13
• Validation RMSE: 2.3268°C
• Test RMSE: 2.0843°C
• Features: 79
• Forecast horizon: 5 days


## 2. Model Performance Monitoring Class

In [3]:
class ModelPerformanceMonitor:
    """
    Comprehensive model performance monitoring and retraining decision system
    """
    
    def __init__(self, model, feature_columns, baseline_performance, alert_thresholds=None):
        self.model = model
        self.feature_columns = feature_columns
        self.baseline_performance = baseline_performance
        
        # Default alert thresholds
        self.alert_thresholds = alert_thresholds or {
            'rmse_degradation_pct': 15.0,  # 15% RMSE increase triggers warning
            'rmse_critical_pct': 25.0,     # 25% RMSE increase triggers retraining
            'r2_drop_threshold': 0.05,     # R² drop of 0.05 triggers warning
            'r2_critical_drop': 0.10,      # R² drop of 0.10 triggers retraining
            'drift_p_value': 0.05,         # p-value for statistical drift tests
            'consecutive_alerts': 3        # Consecutive alerts before retraining
        }
        
        # Monitoring history
        self.performance_history = []
        self.drift_history = []
        self.alert_count = 0
        
    def evaluate_performance(self, X, y, date_info=None):
        """
        Evaluate model performance on new data
        """
        predictions = self.model.predict(X)
        
        # Calculate metrics
        rmse = np.sqrt(mean_squared_error(y, predictions))
        mae = mean_absolute_error(y, predictions)
        r2 = r2_score(y, predictions)
        mape = np.mean(np.abs((y - predictions) / y)) * 100
        
        # Calculate performance degradation
        baseline_rmse = self.baseline_performance.get('rmse', rmse)
        baseline_r2 = self.baseline_performance.get('r2', r2)
        
        rmse_degradation = ((rmse - baseline_rmse) / baseline_rmse) * 100
        r2_drop = baseline_r2 - r2
        
        performance_data = {
            'timestamp': datetime.now(),
            'date_info': date_info,
            'rmse': rmse,
            'mae': mae,
            'r2': r2,
            'mape': mape,
            'rmse_degradation_pct': rmse_degradation,
            'r2_drop': r2_drop,
            'sample_count': len(X)
        }
        
        # Store in history
        self.performance_history.append(performance_data)
        
        return performance_data
    
    def detect_data_drift(self, X_reference, X_current, feature_subset=None):
        """
        Detect data drift using statistical tests
        """
        if feature_subset is None:
            features_to_test = self.feature_columns[:20]  # Test top 20 features for speed
        else:
            features_to_test = feature_subset
        
        drift_results = {
            'timestamp': datetime.now(),
            'features_tested': len(features_to_test),
            'drift_detected': False,
            'drifted_features': [],
            'test_results': {}
        }
        
        for feature in features_to_test:
            if feature in X_reference.columns and feature in X_current.columns:
                # Kolmogorov-Smirnov test for distribution drift
                statistic, p_value = ks_2samp(
                    X_reference[feature].dropna(),
                    X_current[feature].dropna()
                )
                
                drift_results['test_results'][feature] = {
                    'ks_statistic': statistic,
                    'p_value': p_value,
                    'drift_detected': p_value < self.alert_thresholds['drift_p_value']
                }
                
                if p_value < self.alert_thresholds['drift_p_value']:
                    drift_results['drifted_features'].append(feature)
        
        # Overall drift detection
        drift_ratio = len(drift_results['drifted_features']) / len(features_to_test)
        drift_results['drift_detected'] = drift_ratio > 0.1  # 10% of features showing drift
        drift_results['drift_ratio'] = drift_ratio
        
        # Store in history
        self.drift_history.append(drift_results)
        
        return drift_results
    
    def should_retrain(self, performance_data=None, drift_data=None):
        """
        Decision logic for when to trigger model retraining
        """
        reasons = []
        severity = 'low'
        
        # Performance-based triggers
        if performance_data:
            # Critical RMSE degradation
            if performance_data['rmse_degradation_pct'] > self.alert_thresholds['rmse_critical_pct']:
                reasons.append(f"Critical RMSE degradation: {performance_data['rmse_degradation_pct']:.1f}%")
                severity = 'critical'
            
            # Critical R² drop
            elif performance_data['r2_drop'] > self.alert_thresholds['r2_critical_drop']:
                reasons.append(f"Critical R² drop: {performance_data['r2_drop']:.3f}")
                severity = 'critical'
            
            # Warning-level degradation
            elif performance_data['rmse_degradation_pct'] > self.alert_thresholds['rmse_degradation_pct']:
                self.alert_count += 1
                if self.alert_count >= self.alert_thresholds['consecutive_alerts']:
                    reasons.append(f"Consecutive performance alerts: {self.alert_count}")
                    severity = 'high'
            else:
                self.alert_count = 0  # Reset counter
        
        # Drift-based triggers
        if drift_data and drift_data['drift_detected']:
            drift_ratio = drift_data['drift_ratio']
            if drift_ratio > 0.2:  # 20% of features drifted
                reasons.append(f"Significant data drift: {drift_ratio:.1%} of features")
                severity = 'high' if severity == 'low' else severity
        
        # Retraining decision
        should_retrain = len(reasons) > 0 and severity in ['high', 'critical']
        
        return {
            'should_retrain': should_retrain,
            'severity': severity,
            'reasons': reasons,
            'alert_count': self.alert_count,
            'timestamp': datetime.now()
        }
    
    def generate_monitoring_report(self):
        """
        Generate comprehensive monitoring report
        """
        if not self.performance_history:
            return "No monitoring data available"
        
        latest_performance = self.performance_history[-1]
        
        report = f"""
🔄 MODEL MONITORING REPORT
{'='*50}
📅 Report Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
🤖 Model: {self.model.__class__.__name__}
📊 Monitoring Period: {len(self.performance_history)} evaluations

📈 LATEST PERFORMANCE:
• RMSE: {latest_performance['rmse']:.4f}°C
• R²: {latest_performance['r2']:.4f}
• RMSE Degradation: {latest_performance['rmse_degradation_pct']:.1f}%
• R² Drop: {latest_performance['r2_drop']:.4f}

🚨 ALERT STATUS:
• Alert Count: {self.alert_count}
• Drift Evaluations: {len(self.drift_history)}

⚙️ THRESHOLDS:
• RMSE Warning: {self.alert_thresholds['rmse_degradation_pct']:.0f}%
• RMSE Critical: {self.alert_thresholds['rmse_critical_pct']:.0f}%
• R² Warning: {self.alert_thresholds['r2_drop_threshold']:.3f}
• R² Critical: {self.alert_thresholds['r2_critical_drop']:.3f}
        """
        
        return report

# Initialize monitoring system
baseline_perf = {
    'rmse': model_metadata['test_performance']['rmse'],
    'r2': model_metadata['test_performance']['r2']
}

monitor = ModelPerformanceMonitor(
    model=trained_model,
    feature_columns=feature_columns,
    baseline_performance=baseline_perf
)

print("✅ Model Performance Monitor initialized!")
print(f"📊 Baseline RMSE: {baseline_perf['rmse']:.4f}°C")
print(f"📊 Baseline R²: {baseline_perf['r2']:.4f}")
print("🚨 Alert thresholds configured for production monitoring")

✅ Model Performance Monitor initialized!
📊 Baseline RMSE: 2.0843°C
📊 Baseline R²: -0.3635
🚨 Alert thresholds configured for production monitoring


## 3. Load and Prepare Monitoring Data

In [4]:
# Load the complete dataset for monitoring simulation
data_path = '../data/raw/Hanoi-Daily-10-years.csv'

# Import feature engineering function from previous notebook
def create_forecasting_features(df, target_col='temp', forecast_horizon=5):
    """
    Create comprehensive features for temperature forecasting
    (Copied from previous notebook for consistency)
    """
    df = df.copy()
    
    # Sort by datetime
    df = df.sort_values('datetime').reset_index(drop=True)
    
    # Temporal features
    df['year'] = df['datetime'].dt.year
    df['month'] = df['datetime'].dt.month
    df['day'] = df['datetime'].dt.day
    df['dayofweek'] = df['datetime'].dt.dayofweek
    df['dayofyear'] = df['datetime'].dt.dayofyear
    df['quarter'] = df['datetime'].dt.quarter
    
    # Cyclical encoding
    df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
    df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
    df['dayofyear_sin'] = np.sin(2 * np.pi * df['dayofyear'] / 365)
    df['dayofyear_cos'] = np.cos(2 * np.pi * df['dayofyear'] / 365)
    df['dayofweek_sin'] = np.sin(2 * np.pi * df['dayofweek'] / 7)
    df['dayofweek_cos'] = np.cos(2 * np.pi * df['dayofweek'] / 7)
    
    # Lag features for forecasting
    for lag in [1, 2, 3, 5, 7, 14, 30]:
        df[f'{target_col}_lag_{lag}'] = df[target_col].shift(lag)
        
    # Rolling statistics
    for window in [3, 7, 14, 30]:
        df[f'{target_col}_rolling_mean_{window}'] = df[target_col].rolling(window=window).mean()
        df[f'{target_col}_rolling_std_{window}'] = df[target_col].rolling(window=window).std()
        df[f'{target_col}_rolling_min_{window}'] = df[target_col].rolling(window=window).min()
        df[f'{target_col}_rolling_max_{window}'] = df[target_col].rolling(window=window).max()
    
    # Temperature differences
    df['temp_diff_1d'] = df[target_col] - df[target_col].shift(1)
    df['temp_diff_7d'] = df[target_col] - df[target_col].shift(7)
    
    # Weather feature lags (short-term)
    weather_features = ['tempmax', 'tempmin', 'humidity', 'precip', 'windspeed']
    for feature in weather_features:
        if feature in df.columns:
            for lag in [1, 2, 3]:
                df[f'{feature}_lag_{lag}'] = df[feature].shift(lag)
    
    # Create target variable (forecast_horizon days ahead)
    df[f'{target_col}_target'] = df[target_col].shift(-forecast_horizon)
    
    # Season indicators
    def get_season(month):
        if month in [12, 1, 2]: return 'winter'
        elif month in [3, 4, 5]: return 'spring'
        elif month in [6, 7, 8]: return 'summer'
        else: return 'autumn'
    
    df['season'] = df['month'].apply(get_season)
    df['is_winter'] = (df['season'] == 'winter').astype(int)
    df['is_spring'] = (df['season'] == 'spring').astype(int)
    df['is_summer'] = (df['season'] == 'summer').astype(int)
    df['is_autumn'] = (df['season'] == 'autumn').astype(int)
    
    return df

# Load and process data for monitoring
print("📊 Loading data for monitoring simulation...")
df_raw = load_hanoi_weather_data(data_path)
df_features = create_forecasting_features(df_raw, forecast_horizon=5)
df_clean = df_features.dropna().copy()

print(f"✅ Loaded {len(df_clean)} samples for monitoring")
print(f"📅 Date range: {df_clean['datetime'].min()} to {df_clean['datetime'].max()}")

# Prepare features and target
X_monitoring = df_clean[feature_columns].copy()
y_monitoring = df_clean['temp_target'].copy()
dates_monitoring = df_clean['datetime'].copy()

print(f"🎯 Monitoring dataset: {X_monitoring.shape}")
print(f"📊 Features: {len(feature_columns)}")
print(f"🌡️ Temperature range: {y_monitoring.min():.1f}°C to {y_monitoring.max():.1f}°C")

INFO:src.data_utils:Successfully loaded 3660 records from ../data/raw/Hanoi-Daily-10-years.csv
INFO:src.data_utils:Date range: 2015-09-20 00:00:00 to 2025-09-26 00:00:00
INFO:src.data_utils:Temperature range: 7.0°C to 35.5°C


📊 Loading data for monitoring simulation...
✅ Loaded 737 samples for monitoring
📅 Date range: 2023-01-03 00:00:00 to 2025-09-21 00:00:00
🎯 Monitoring dataset: (737, 79)
📊 Features: 79
🌡️ Temperature range: 10.2°C to 34.6°C


## 4. Simulate Real-World Monitoring Scenario

In [5]:
# Simulate time-based monitoring with different data periods
def simulate_production_monitoring(X, y, dates, monitor, simulation_periods=12):
    """
    Simulate production monitoring over time periods
    """
    print("🚀 SIMULATING PRODUCTION MONITORING")
    print("=" * 50)
    
    # Divide data into time periods (simulate monthly monitoring)
    total_samples = len(X)
    samples_per_period = total_samples // simulation_periods
    
    monitoring_results = []
    retraining_alerts = []
    
    # Reference period (training period equivalent)
    reference_end = int(total_samples * 0.7)  # First 70% as reference
    X_reference = X.iloc[:reference_end]
    
    print(f"📊 Reference period: {reference_end:,} samples")
    print(f"🔄 Monitoring {simulation_periods} time periods")
    print(f"📅 ~{samples_per_period} samples per period\n")
    
    for period in range(simulation_periods):
        # Define period boundaries
        start_idx = reference_end + (period * samples_per_period)
        end_idx = min(start_idx + samples_per_period, total_samples)
        
        if start_idx >= total_samples:
            break
            
        # Extract period data
        X_period = X.iloc[start_idx:end_idx]
        y_period = y.iloc[start_idx:end_idx]
        dates_period = dates.iloc[start_idx:end_idx]
        
        if len(X_period) < 10:  # Skip periods with too few samples
            continue
        
        period_info = {
            'period': period + 1,
            'start_date': dates_period.min(),
            'end_date': dates_period.max(),
            'sample_count': len(X_period)
        }
        
        print(f"📅 Period {period + 1}: {period_info['start_date'].strftime('%Y-%m-%d')} to {period_info['end_date'].strftime('%Y-%m-%d')} ({len(X_period)} samples)")
        
        # Evaluate performance
        performance_data = monitor.evaluate_performance(X_period, y_period, period_info)
        
        # Detect data drift (every few periods to reduce computation)
        drift_data = None
        if period % 3 == 0:  # Check drift every 3 periods
            drift_data = monitor.detect_data_drift(X_reference, X_period)
            print(f"   📊 Drift check: {drift_data['drift_ratio']:.1%} features drifted")
        
        # Check retraining decision
        retraining_decision = monitor.should_retrain(performance_data, drift_data)
        
        # Log results
        period_result = {
            'period': period + 1,
            'performance': performance_data,
            'drift': drift_data,
            'retraining_decision': retraining_decision
        }
        monitoring_results.append(period_result)
        
        # Print key metrics
        print(f"   🎯 RMSE: {performance_data['rmse']:.4f}°C (degradation: {performance_data['rmse_degradation_pct']:.1f}%)")
        print(f"   📈 R²: {performance_data['r2']:.4f} (drop: {performance_data['r2_drop']:.4f})")
        
        # Retraining alerts
        if retraining_decision['should_retrain']:
            retraining_alerts.append(period_result)
            print(f"   🚨 RETRAINING ALERT: {retraining_decision['severity'].upper()}")
            for reason in retraining_decision['reasons']:
                print(f"     • {reason}")
        
        print()  # Empty line for readability
    
    return monitoring_results, retraining_alerts

# Run monitoring simulation
monitoring_results, retraining_alerts = simulate_production_monitoring(
    X_monitoring, y_monitoring, dates_monitoring, monitor, simulation_periods=10
)

print(f"✅ Monitoring simulation completed!")
print(f"📊 Periods monitored: {len(monitoring_results)}")
print(f"🚨 Retraining alerts: {len(retraining_alerts)}")

🚀 SIMULATING PRODUCTION MONITORING
📊 Reference period: 515 samples
🔄 Monitoring 10 time periods
📅 ~73 samples per period

📅 Period 1: 2024-11-18 to 2025-04-05 (73 samples)
   📊 Drift check: 70.0% features drifted
   🎯 RMSE: 2.5432°C (degradation: 22.0%)
   📈 R²: 0.2003 (drop: -0.5638)
   🚨 RETRAINING ALERT: HIGH
     • Significant data drift: 70.0% of features

📅 Period 2: 2025-04-06 to 2025-06-28 (73 samples)
   🎯 RMSE: 2.0044°C (degradation: -3.8%)
   📈 R²: 0.1273 (drop: -0.4908)

📅 Period 3: 2025-06-29 to 2025-09-17 (73 samples)
   🎯 RMSE: 2.0499°C (degradation: -1.6%)
   📈 R²: -0.5178 (drop: 0.1543)
   🚨 RETRAINING ALERT: CRITICAL
     • Critical R² drop: 0.154

✅ Monitoring simulation completed!
📊 Periods monitored: 3
🚨 Retraining alerts: 2


## 5. Performance Monitoring Dashboard

In [6]:
# Create comprehensive monitoring dashboard
def create_monitoring_dashboard(monitoring_results, baseline_performance):
    """
    Create interactive monitoring dashboard with Plotly
    """
    # Extract data for plotting
    periods = [r['period'] for r in monitoring_results]
    rmse_values = [r['performance']['rmse'] for r in monitoring_results]
    r2_values = [r['performance']['r2'] for r in monitoring_results]
    rmse_degradation = [r['performance']['rmse_degradation_pct'] for r in monitoring_results]
    r2_drop = [r['performance']['r2_drop'] for r in monitoring_results]
    
    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=(
            "RMSE Over Time",
            "R² Score Over Time", 
            "RMSE Degradation %",
            "R² Drop from Baseline"
        ),
        specs=[[{"secondary_y": False}, {"secondary_y": False}],
               [{"secondary_y": False}, {"secondary_y": False}]]
    )
    
    # RMSE over time
    fig.add_trace(
        go.Scatter(x=periods, y=rmse_values, mode='lines+markers', name='RMSE',
                  line=dict(color='red', width=2), marker=dict(size=6)),
        row=1, col=1
    )
    # Baseline RMSE
    fig.add_hline(y=baseline_performance['rmse'], line_dash="dash", 
                  line_color="blue", annotation_text="Baseline", row=1, col=1)
    
    # R² over time
    fig.add_trace(
        go.Scatter(x=periods, y=r2_values, mode='lines+markers', name='R²',
                  line=dict(color='green', width=2), marker=dict(size=6)),
        row=1, col=2
    )
    # Baseline R²
    fig.add_hline(y=baseline_performance['r2'], line_dash="dash", 
                  line_color="blue", annotation_text="Baseline", row=1, col=2)
    
    # RMSE degradation
    colors = ['red' if x > 15 else 'orange' if x > 10 else 'green' for x in rmse_degradation]
    fig.add_trace(
        go.Bar(x=periods, y=rmse_degradation, name='RMSE Degradation %',
               marker_color=colors),
        row=2, col=1
    )
    # Warning threshold
    fig.add_hline(y=15, line_dash="dash", line_color="orange", 
                  annotation_text="Warning (15%)", row=2, col=1)
    fig.add_hline(y=25, line_dash="dash", line_color="red", 
                  annotation_text="Critical (25%)", row=2, col=1)
    
    # R² drop
    colors = ['red' if x > 0.10 else 'orange' if x > 0.05 else 'green' for x in r2_drop]
    fig.add_trace(
        go.Bar(x=periods, y=r2_drop, name='R² Drop',
               marker_color=colors),
        row=2, col=2
    )
    # Warning thresholds
    fig.add_hline(y=0.05, line_dash="dash", line_color="orange", 
                  annotation_text="Warning (0.05)", row=2, col=2)
    fig.add_hline(y=0.10, line_dash="dash", line_color="red", 
                  annotation_text="Critical (0.10)", row=2, col=2)
    
    # Update layout
    fig.update_layout(
        title_text="🔄 Model Performance Monitoring Dashboard",
        title_x=0.5,
        height=600,
        showlegend=True
    )
    
    # Update axis labels
    fig.update_xaxes(title_text="Monitoring Period", row=1, col=1)
    fig.update_xaxes(title_text="Monitoring Period", row=1, col=2)
    fig.update_xaxes(title_text="Monitoring Period", row=2, col=1)
    fig.update_xaxes(title_text="Monitoring Period", row=2, col=2)
    
    fig.update_yaxes(title_text="RMSE (°C)", row=1, col=1)
    fig.update_yaxes(title_text="R² Score", row=1, col=2)
    fig.update_yaxes(title_text="Degradation %", row=2, col=1)
    fig.update_yaxes(title_text="R² Drop", row=2, col=2)
    
    return fig

# Create and display dashboard
if monitoring_results:
    dashboard_fig = create_monitoring_dashboard(monitoring_results, baseline_perf)
    dashboard_fig.show()
    
    print("📊 Interactive monitoring dashboard created!")
    print("🎯 Key insights:")
    print(f"• Baseline RMSE: {baseline_perf['rmse']:.4f}°C")
    print(f"• Latest RMSE: {monitoring_results[-1]['performance']['rmse']:.4f}°C")
    print(f"• Performance trend: {'📈 Improving' if monitoring_results[-1]['performance']['rmse'] < baseline_perf['rmse'] else '📉 Degrading'}")
else:
    print("❌ No monitoring results available for dashboard creation")

📊 Interactive monitoring dashboard created!
🎯 Key insights:
• Baseline RMSE: 2.0843°C
• Latest RMSE: 2.0499°C
• Performance trend: 📈 Improving


## 6. Retraining Decision Analysis

In [7]:
# Analyze retraining decisions and create summary
def analyze_retraining_decisions(monitoring_results, retraining_alerts):
    """
    Comprehensive analysis of retraining decisions
    """
    print("🔍 RETRAINING DECISION ANALYSIS")
    print("=" * 50)
    
    if not retraining_alerts:
        print("✅ No retraining alerts triggered during monitoring period")
        print("📊 Model performance remains stable")
        return
    
    print(f"🚨 Total retraining alerts: {len(retraining_alerts)}")
    print(f"📊 Alert rate: {len(retraining_alerts)/len(monitoring_results)*100:.1f}%")
    
    # Analyze alert patterns
    severity_counts = {}
    reason_counts = {}
    
    for alert in retraining_alerts:
        decision = alert['retraining_decision']
        severity = decision['severity']
        
        # Count severities
        severity_counts[severity] = severity_counts.get(severity, 0) + 1
        
        # Count reasons
        for reason in decision['reasons']:
            reason_type = reason.split(':')[0]  # Extract reason type
            reason_counts[reason_type] = reason_counts.get(reason_type, 0) + 1
    
    print(f"\n📊 Alert Severity Breakdown:")
    for severity, count in severity_counts.items():
        print(f"• {severity.title()}: {count} alerts")
    
    print(f"\n📋 Alert Reason Analysis:")
    for reason, count in reason_counts.items():
        print(f"• {reason}: {count} occurrences")
    
    # Detailed alert information
    print(f"\n🚨 DETAILED ALERT TIMELINE:")
    for i, alert in enumerate(retraining_alerts, 1):
        period = alert['period']
        decision = alert['retraining_decision']
        performance = alert['performance']
        
        print(f"\n{i}. Period {period} - {decision['severity'].upper()} Alert:")
        print(f"   📅 Date: {performance['date_info']['start_date'].strftime('%Y-%m-%d')}")
        print(f"   🎯 RMSE: {performance['rmse']:.4f}°C (degradation: {performance['rmse_degradation_pct']:.1f}%)")
        print(f"   📈 R²: {performance['r2']:.4f} (drop: {performance['r2_drop']:.4f})")
        print(f"   🚨 Reasons:")
        for reason in decision['reasons']:
            print(f"     • {reason}")
    
    return {
        'total_alerts': len(retraining_alerts),
        'alert_rate': len(retraining_alerts)/len(monitoring_results),
        'severity_breakdown': severity_counts,
        'reason_breakdown': reason_counts
    }

# Run retraining analysis
retraining_analysis = analyze_retraining_decisions(monitoring_results, retraining_alerts)

# Generate final monitoring report
final_report = monitor.generate_monitoring_report()
print("\n" + final_report)

🔍 RETRAINING DECISION ANALYSIS
🚨 Total retraining alerts: 2
📊 Alert rate: 66.7%

📊 Alert Severity Breakdown:
• High: 1 alerts
• Critical: 1 alerts

📋 Alert Reason Analysis:
• Significant data drift: 1 occurrences
• Critical R² drop: 1 occurrences

🚨 DETAILED ALERT TIMELINE:

1. Period 1 - HIGH Alert:
   📅 Date: 2024-11-18
   🎯 RMSE: 2.5432°C (degradation: 22.0%)
   📈 R²: 0.2003 (drop: -0.5638)
   🚨 Reasons:
     • Significant data drift: 70.0% of features

2. Period 3 - CRITICAL Alert:
   📅 Date: 2025-06-29
   🎯 RMSE: 2.0499°C (degradation: -1.6%)
   📈 R²: -0.5178 (drop: 0.1543)
   🚨 Reasons:
     • Critical R² drop: 0.154


🔄 MODEL MONITORING REPORT
📅 Report Date: 2025-10-13 17:25:58
🤖 Model: AdaBoostRegressor
📊 Monitoring Period: 3 evaluations

📈 LATEST PERFORMANCE:
• RMSE: 2.0499°C
• R²: -0.5178
• RMSE Degradation: -1.6%
• R² Drop: 0.1543

🚨 ALERT STATUS:
• Alert Count: 0
• Drift Evaluations: 1

⚙️ THRESHOLDS:
• RMSE Critical: 25%
• R² Critical: 0.100
        


## 7. Production Deployment Recommendations

In [8]:
# Production deployment guidelines and recommendations
def generate_production_recommendations(monitoring_results, retraining_analysis):
    """
    Generate production deployment recommendations based on monitoring analysis
    """
    print("🚀 PRODUCTION DEPLOYMENT RECOMMENDATIONS")
    print("=" * 60)
    
    # Model stability assessment
    performance_variance = np.var([r['performance']['rmse'] for r in monitoring_results])
    avg_degradation = np.mean([r['performance']['rmse_degradation_pct'] for r in monitoring_results])
    
    print(f"📊 MODEL STABILITY ASSESSMENT:")
    print(f"• Performance variance: {performance_variance:.6f}")
    print(f"• Average degradation: {avg_degradation:.1f}%")
    
    if performance_variance < 0.1 and avg_degradation < 10:
        stability = "🟢 STABLE"
        monitoring_frequency = "Weekly"
    elif performance_variance < 0.5 and avg_degradation < 20:
        stability = "🟡 MODERATE"
        monitoring_frequency = "Daily"
    else:
        stability = "🔴 UNSTABLE"
        monitoring_frequency = "Real-time"
    
    print(f"• Stability status: {stability}")
    print(f"• Recommended monitoring: {monitoring_frequency}")
    
    # Retraining recommendations
    print(f"\n🔄 RETRAINING STRATEGY:")
    if retraining_analysis and retraining_analysis['alert_rate'] > 0.3:
        print(f"• High alert rate ({retraining_analysis['alert_rate']:.1%}) - Consider monthly retraining")
    elif retraining_analysis and retraining_analysis['alert_rate'] > 0.1:
        print(f"• Moderate alert rate ({retraining_analysis['alert_rate']:.1%}) - Quarterly retraining recommended")
    else:
        print(f"• Low alert rate - Semi-annual retraining sufficient")
    
    # Monitoring infrastructure
    print(f"\n🏗️ MONITORING INFRASTRUCTURE:")
    print(f"• Implement automated performance tracking")
    print(f"• Set up alert system for {monitoring_frequency.lower()} checks")
    print(f"• Deploy A/B testing framework for model updates")
    print(f"• Maintain model versioning and rollback capabilities")
    
    # Key metrics to track
    print(f"\n📊 KEY PRODUCTION METRICS:")
    print(f"• Primary: RMSE degradation > 15% (warning), > 25% (critical)")
    print(f"• Secondary: R² drop > 0.05 (warning), > 0.10 (critical)")
    print(f"• Drift: > 10% of features showing distribution changes")
    print(f"• Business: Temperature forecast accuracy within ±2°C")
    
    # Cost-benefit analysis
    print(f"\n💰 COST-BENEFIT CONSIDERATIONS:")
    print(f"• Monitoring cost: Low (automated scripts)")
    print(f"• Retraining cost: Medium (compute + validation time)")
    print(f"• Downtime cost: High (inaccurate forecasts)")
    print(f"• Recommendation: Invest in automated monitoring")
    
    # Implementation checklist
    checklist = [
        "✅ Model performance monitoring system",
        "✅ Statistical drift detection",
        "✅ Automated alert thresholds",
        "⏳ Real-time monitoring dashboard",
        "⏳ Automated retraining pipeline",
        "⏳ A/B testing framework",
        "⏳ Model versioning system",
        "⏳ Production deployment automation"
    ]
    
    print(f"\n📋 IMPLEMENTATION STATUS:")
    for item in checklist:
        print(f"  {item}")
    
    return {
        'stability': stability,
        'monitoring_frequency': monitoring_frequency,
        'performance_variance': performance_variance,
        'avg_degradation': avg_degradation
    }

# Generate recommendations
production_recommendations = generate_production_recommendations(monitoring_results, retraining_analysis)

print(f"\n🎯 NEXT STEPS FOR STEP 8 & 9:")
print(f"• Step 8: Implement hourly data forecasting")
print(f"• Step 9: ONNX model optimization for deployment")
print(f"• Production: Deploy monitoring system with alerting")
print(f"• Automation: Build CI/CD pipeline for model updates")

🚀 PRODUCTION DEPLOYMENT RECOMMENDATIONS
📊 MODEL STABILITY ASSESSMENT:
• Performance variance: 0.059518
• Average degradation: 5.5%
• Stability status: 🟢 STABLE
• Recommended monitoring: Weekly

🔄 RETRAINING STRATEGY:
• High alert rate (66.7%) - Consider monthly retraining

🏗️ MONITORING INFRASTRUCTURE:
• Implement automated performance tracking
• Set up alert system for weekly checks
• Deploy A/B testing framework for model updates
• Maintain model versioning and rollback capabilities

📊 KEY PRODUCTION METRICS:
• Drift: > 10% of features showing distribution changes
• Business: Temperature forecast accuracy within ±2°C

💰 COST-BENEFIT CONSIDERATIONS:
• Monitoring cost: Low (automated scripts)
• Retraining cost: Medium (compute + validation time)
• Downtime cost: High (inaccurate forecasts)
• Recommendation: Invest in automated monitoring

📋 IMPLEMENTATION STATUS:
  ✅ Model performance monitoring system
  ✅ Statistical drift detection
  ✅ Automated alert thresholds
  ⏳ Real-time monitor

## 8. Summary and Key Insights

In [9]:
# Final summary of Step 7 implementation
print("🎯 STEP 7 IMPLEMENTATION SUMMARY")
print("=" * 60)

print("✅ COMPLETED COMPONENTS:")
print("• Model Performance Monitoring System")
print("• Data Drift Detection Framework")
print("• Automated Retraining Decision Logic")
print("• Real-world Monitoring Simulation")
print("• Interactive Performance Dashboard")
print("• Production Deployment Guidelines")

print("\n🔍 KEY INSIGHTS DISCOVERED:")
if monitoring_results:
    latest_perf = monitoring_results[-1]['performance']
    print(f"• Current model RMSE: {latest_perf['rmse']:.4f}°C")
    print(f"• Performance degradation: {latest_perf['rmse_degradation_pct']:.1f}%")
    print(f"• Model stability: {production_recommendations['stability']}")
    print(f"• Recommended monitoring: {production_recommendations['monitoring_frequency']}")

if retraining_alerts:
    print(f"• Retraining alerts triggered: {len(retraining_alerts)}")
    print(f"• Alert frequency: {len(retraining_alerts)/len(monitoring_results)*100:.1f}%")
else:
    print(f"• No retraining alerts during simulation")
    print(f"• Model remains stable over time")

print("\n🚨 WHEN TO RETRAIN (Decision Framework):")
print("• IMMEDIATE: RMSE degradation > 25% OR R² drop > 0.10")
print("• SCHEDULED: 3+ consecutive performance warnings")
print("• PROACTIVE: > 20% of features showing data drift")
print("• PERIODIC: Quarterly retraining regardless of alerts")

print("\n🔄 MONITORING BEST PRACTICES:")
print("• Track both performance and drift metrics")
print("• Use statistical significance tests")
print("• Implement sliding window analysis")
print("• Maintain baseline performance benchmarks")
print("• Set up automated alerting systems")

print("\n🎯 BUSINESS VALUE:")
print("• Prevents model degradation in production")
print("• Reduces manual monitoring overhead")
print("• Maintains forecast accuracy over time")
print("• Enables proactive model maintenance")
print("• Supports confident production deployment")

print("\n🚀 READY FOR STEP 8: Hourly Data Enhancement")
print("🔧 READY FOR STEP 9: ONNX Deployment Optimization")
print("\n📊 Step 7 SUCCESSFULLY COMPLETED! 🎉")

🎯 STEP 7 IMPLEMENTATION SUMMARY
✅ COMPLETED COMPONENTS:
• Model Performance Monitoring System
• Data Drift Detection Framework
• Automated Retraining Decision Logic
• Real-world Monitoring Simulation
• Interactive Performance Dashboard
• Production Deployment Guidelines

🔍 KEY INSIGHTS DISCOVERED:
• Current model RMSE: 2.0499°C
• Performance degradation: -1.6%
• Model stability: 🟢 STABLE
• Recommended monitoring: Weekly
• Retraining alerts triggered: 2
• Alert frequency: 66.7%

🚨 WHEN TO RETRAIN (Decision Framework):
• IMMEDIATE: RMSE degradation > 25% OR R² drop > 0.10
• PROACTIVE: > 20% of features showing data drift
• PERIODIC: Quarterly retraining regardless of alerts

🔄 MONITORING BEST PRACTICES:
• Track both performance and drift metrics
• Use statistical significance tests
• Implement sliding window analysis
• Maintain baseline performance benchmarks
• Set up automated alerting systems

🎯 BUSINESS VALUE:
• Prevents model degradation in production
• Reduces manual monitoring over