# Advanced Supervised Learning

This notebook showcases advanced supervised learning techniques including sophisticated classification and regression models, hyperparameter optimization, model interpretation, and production-ready deployment strategies.

## Table of Contents
1. [Setup and Imports](#setup)
2. [Results Management Setup](#results-setup)
3. [Advanced Classification Models](#classification)
4. [Advanced Regression Models](#regression)
5. [Model Comparison and Selection](#comparison)
6. [Hyperparameter Optimization](#optimization)
7. [Advanced Evaluation Metrics](#evaluation)
8. [Model Interpretation and Explainability](#interpretation)
9. [Production-Ready Models](#production)
10. [Results Persistence and Reporting](#persistence)

## 1. Setup and Imports {#setup}

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import average_precision_score, precision_recall_curve
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import time
import warnings
warnings.filterwarnings('ignore')

# Advanced imports for supervised learning
from sklearn.model_selection import RandomizedSearchCV, learning_curve
from sklearn.inspection import permutation_importance, partial_dependence
from sklearn.calibration import calibration_curve
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import joblib

# Results saving imports
import os
from pathlib import Path
import datetime
import json

In [None]:
# Project imports
import sys
sys.path.append('../src')

from data.generators import SyntheticDataGenerator
from data.preprocessors import DataPreprocessor
from models.supervised.classification import *
from models.supervised.regression import *
from pipelines.model_selection import AdvancedModelSelector
from evaluation.metrics import ModelEvaluator
from evaluation.visualization import ModelVisualizationSuite

# Configure plotting
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_palette('husl')

print("✅ All imports successful!")

## 2. Results Management Setup {#results-setup}

In [None]:
# Results saving setup for supervised learning
def setup_results_directories():
    """Create results directory structure if it doesn't exist."""
    base_dir = Path('../results')
    directories = [
        base_dir / 'figures',
        base_dir / 'models',
        base_dir / 'supervised_models',  # Specific for supervised learning models
        base_dir / 'classification_models',  # Classification-specific models
        base_dir / 'regression_models',  # Regression-specific models
        base_dir / 'optimized_models',  # Hyperparameter optimized models
        base_dir / 'production_models',  # Production-ready models
        base_dir / 'interpretability',  # Model interpretation results
        base_dir / 'experiments',
        base_dir / 'reports'
    ]
    
    for directory in directories:
        directory.mkdir(parents=True, exist_ok=True)
        print(f"📁 Created/verified directory: {directory}")
    
    return base_dir

def get_timestamp():
    """Get current timestamp for file naming."""
    return datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

def save_supervised_figure(fig, name, description="", category="general", dpi=300):
    """Save supervised learning figure with proper naming and metadata."""
    timestamp = get_timestamp()
    filename = f"{timestamp}_supervised_{category}_{name}.png"
    filepath = results_dir / 'figures' / filename
    
    # Save figure
    fig.savefig(filepath, dpi=dpi, bbox_inches='tight', facecolor='white')
    
    # Save metadata
    metadata = {
        'filename': filename,
        'description': description,
        'category': f'supervised_{category}',
        'timestamp': timestamp,
        'notebook': '03_supervised_learning',
        'dpi': dpi
    }
    
    metadata_file = filepath.with_suffix('.json')
    with open(metadata_file, 'w') as f:
        json.dump(metadata, f, indent=2)
    
    print(f"💾 Saved supervised figure: {filepath}")
    return filepath

def save_supervised_model(model, name, model_type="classifier", description="", performance_metrics=None):
    """Save supervised learning model with comprehensive metadata."""
    timestamp = get_timestamp()
    filename = f"{timestamp}_{model_type}_{name}.joblib"
    
    # Choose appropriate directory based on model type
    if model_type == 'classifier':
        filepath = results_dir / 'classification_models' / filename
    elif model_type == 'regressor':
        filepath = results_dir / 'regression_models' / filename
    elif model_type == 'optimized_classifier' or model_type == 'optimized_regressor':
        filepath = results_dir / 'optimized_models' / filename
    elif model_type == 'production':
        filepath = results_dir / 'production_models' / filename
    else:
        filepath = results_dir / 'supervised_models' / filename
    
    # Save model
    joblib.dump(model, filepath, compress=3)
    
    # Save metadata
    metadata = {
        'filename': filename,
        'model_name': name,
        'description': description,
        'model_type': model_type,
        'timestamp': timestamp,
        'notebook': '03_supervised_learning',
        'algorithm': type(model).__name__,
        'performance_metrics': performance_metrics or {},
        'file_size_mb': filepath.stat().st_size / (1024*1024) if filepath.exists() else 0
    }
    
    metadata_file = filepath.with_suffix('.json')
    with open(metadata_file, 'w') as f:
        json.dump(metadata, f, indent=2, default=str)
    
    print(f"💾 Saved supervised model: {filepath}")
    return filepath

def save_experiment_results(experiment_name, results, description="", technique_type="supervised"):
    """Save experiment results with detailed configuration."""
    timestamp = get_timestamp()
    filename = f"{timestamp}_{technique_type}_{experiment_name}.json"
    filepath = results_dir / 'experiments' / filename
    
    experiment_data = {
        'experiment_name': experiment_name,
        'description': description,
        'technique_type': technique_type,
        'timestamp': timestamp,
        'notebook': '03_supervised_learning',
        'results': results
    }
    
    with open(filepath, 'w') as f:
        json.dump(experiment_data, f, indent=2, default=str)
    
    print(f"💾 Saved experiment results: {filepath}")
    return filepath

def save_report(content, report_name, description="", format='txt'):
    """Save comprehensive analysis report."""
    timestamp = get_timestamp()
    filename = f"{timestamp}_supervised_report_{report_name}.{format}"
    filepath = results_dir / 'reports' / filename
    
    if format == 'txt':
        with open(filepath, 'w') as f:
            f.write(content)
    elif format == 'json':
        with open(filepath, 'w') as f:
            json.dump(content, f, indent=2, default=str)
    
    print(f"💾 Saved report: {filepath}")
    return filepath

# Initialize results directories
results_dir = setup_results_directories()
print(f"📊 Supervised learning results will be saved to: {results_dir}")

## 3. Advanced Classification Models {#classification}

In [None]:
# Generate classification datasets
print("🎯 Advanced Classification Models...")

generator = SyntheticDataGenerator(random_state=42)

# Binary classification dataset
X_binary, y_binary = generator.classification_dataset(
    n_samples=1500,
    n_features=20,
    n_informative=15,
    n_redundant=3,
    n_clusters_per_class=2,
    class_sep=0.8
)

print(f"Binary classification dataset: {X_binary.shape}")
print(f"Class distribution: {np.bincount(y_binary)}")

# Multiclass classification dataset
X_multi, y_multi = generator.classification_dataset(
    n_samples=2000,
    n_features=25,
    n_informative=18,
    n_classes=4,
    n_clusters_per_class=1,
    class_sep=0.7
)

print(f"Multiclass classification dataset: {X_multi.shape}")
print(f"Class distribution: {np.bincount(y_multi)}")

# Imbalanced dataset
X_imbalanced, y_imbalanced = generator.imbalanced_classification(
    n_samples=1000,
    n_features=15,
    imbalance_ratio=0.1
)

print(f"Imbalanced classification dataset: {X_imbalanced.shape}")
print(f"Imbalanced class distribution: {np.bincount(y_imbalanced)}")

# Save dataset information
dataset_info = {
    'binary_classification': {
        'shape': X_binary.shape,
        'class_distribution': np.bincount(y_binary).tolist(),
        'description': 'Balanced binary classification with moderate complexity'
    },
    'multiclass_classification': {
        'shape': X_multi.shape,
        'class_distribution': np.bincount(y_multi).tolist(),
        'description': '4-class classification problem'
    },
    'imbalanced_classification': {
        'shape': X_imbalanced.shape,
        'class_distribution': np.bincount(y_imbalanced).tolist(),
        'description': 'Highly imbalanced binary classification (10:1 ratio)'
    }
}

save_experiment_results("classification_datasets", dataset_info, 
                       "Generated synthetic classification datasets for model testing")

print("\n✨ Classification datasets generated and saved!")




### Advanced Classification Model Testing

In [None]:
# Test advanced classification models
print("🚀 Testing Advanced Classification Models...")

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_binary, y_binary, test_size=0.3, random_state=42, stratify=y_binary
)

# Initialize advanced classifiers
classifiers = {
    'AdaBoost Enhanced': AdaBoostEnhanced(random_state=42),
    'Gradient Boosting Enhanced': GradientBoostingEnhanced(random_state=42),
    'Neural Network Enhanced': NeuralNetworkEnhanced(random_state=42),
    'SVM Enhanced': SVMEnhanced(random_state=42),
    'Hybrid Ensemble': HybridEnsembleClassifier(random_state=42)
}

# Train and evaluate each classifier
results = {}
training_times = {}

for name, classifier in classifiers.items():
    print(f"\n--- Training {name} ---")
    
    try:
        # Time training
        start_time = time.time()
        
        # Fit model
        classifier.fit(X_train, y_train)
        
        training_time = time.time() - start_time
        training_times[name] = training_time
        
        # Make predictions
        y_pred = classifier.predict(X_test)
        y_pred_proba = classifier.predict_proba(X_test)[:, 1] if hasattr(classifier, 'predict_proba') else None
        
        # Calculate metrics
        metrics = {
            'accuracy': accuracy_score(y_test, y_pred),
            'precision': precision_score(y_test, y_pred),
            'recall': recall_score(y_test, y_pred),
            'f1': f1_score(y_test, y_pred),
            'roc_auc': roc_auc_score(y_test, y_pred_proba) if y_pred_proba is not None else None
        }
        
        results[name] = metrics
        
        print(f"  Training time: {training_time:.3f}s")
        print(f"  Accuracy: {metrics['accuracy']:.4f}")
        print(f"  F1-Score: {metrics['f1']:.4f}")
        print(f"  ROC-AUC: {metrics['roc_auc']:.4f}" if metrics['roc_auc'] else "  ROC-AUC: N/A")
        
        # Save individual model
        model_metadata = {
            'accuracy': metrics['accuracy'],
            'f1_score': metrics['f1'],
            'training_time': training_time,
            'dataset_size': len(X_train)
        }
        
        save_supervised_model(classifier, name.lower().replace(' ', '_'), 
                            "classifier", f"Advanced classification model: {name}", model_metadata)
        
    except Exception as e:
        print(f"  ❌ Failed: {str(e)}")
        results[name] = {'error': str(e)}

# Save classification results
classification_experiment = {
    'model_results': results,
    'training_times': training_times,
    'dataset_info': {
        'train_samples': len(X_train),
        'test_samples': len(X_test),
        'features': X_train.shape[1],
        'classes': len(np.unique(y_train))
    }
}

save_experiment_results("advanced_classification_models", classification_experiment,
                       "Performance evaluation of advanced classification algorithms")

print("\n✨ Advanced classification models tested and results saved!")

### Classification Results Visualization

In [None]:
# Visualize classification results
print("📊 Visualizing Classification Results...")

# Filter successful results
successful_results = {k: v for k, v in results.items() if 'error' not in v}

if successful_results:
    # Create comparison plots
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    model_names = list(successful_results.keys())
    
    # 1. Accuracy comparison
    accuracies = [successful_results[name]['accuracy'] for name in model_names]
    bars1 = axes[0, 0].bar(model_names, accuracies, color='lightblue', alpha=0.7)
    axes[0, 0].set_title('Model Accuracy Comparison')
    axes[0, 0].set_ylabel('Accuracy')
    axes[0, 0].set_ylim(0.8, 1.0)
    axes[0, 0].tick_params(axis='x', rotation=45)
    axes[0, 0].grid(True, alpha=0.3)
    
    # Add value labels
    for bar, acc in zip(bars1, accuracies):
        axes[0, 0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005, 
                       f'{acc:.3f}', ha='center', va='bottom')
    
    # 2. F1-Score comparison
    f1_scores = [successful_results[name]['f1'] for name in model_names]
    bars2 = axes[0, 1].bar(model_names, f1_scores, color='lightcoral', alpha=0.7)
    axes[0, 1].set_title('F1-Score Comparison')
    axes[0, 1].set_ylabel('F1-Score')
    axes[0, 1].set_ylim(0.8, 1.0)
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(True, alpha=0.3)
    
    for bar, f1 in zip(bars2, f1_scores):
        axes[0, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005, 
                       f'{f1:.3f}', ha='center', va='bottom')
    
    # 3. Training time comparison
    times = [training_times.get(name, 0) for name in model_names]
    bars3 = axes[1, 0].bar(model_names, times, color='lightgreen', alpha=0.7)
    axes[1, 0].set_title('Training Time Comparison')
    axes[1, 0].set_ylabel('Time (seconds)')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(True, alpha=0.3)
    
    for bar, time_val in zip(bars3, times):
        axes[1, 0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
                       f'{time_val:.2f}s', ha='center', va='bottom')
    
    # 4. Multi-metric radar chart
    metrics_to_plot = ['accuracy', 'precision', 'recall', 'f1']
    
    angles = np.linspace(0, 2 * np.pi, len(metrics_to_plot), endpoint=False)
    angles = np.concatenate((angles, [angles[0]]))
    
    ax_radar = plt.subplot(2, 2, 4, projection='polar')
    
    colors = plt.cm.Set3(np.linspace(0, 1, len(model_names)))
    
    for i, model_name in enumerate(model_names):
        values = [successful_results[model_name][metric] for metric in metrics_to_plot]
        values += [values[0]]  # Complete the circle
        
        ax_radar.plot(angles, values, 'o-', linewidth=2, 
                     label=model_name, color=colors[i])
        ax_radar.fill(angles, values, alpha=0.1, color=colors[i])
    
    ax_radar.set_xticks(angles[:-1])
    ax_radar.set_xticklabels([m.title() for m in metrics_to_plot])
    ax_radar.set_ylim(0, 1)
    ax_radar.set_title('Multi-Metric Performance Radar')
    ax_radar.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
    
    plt.tight_layout()
    
    # Save the visualization
    save_supervised_figure(fig, "classification_model_comparison", 
                          "Comprehensive comparison of classification models", "classification")
    
    plt.show()
    
    # Performance summary
    print("\n🏆 Classification Performance Summary:")
    print("=" * 80)
    print(f"{'Model':<25} {'Accuracy':<10} {'F1-Score':<10} {'ROC-AUC':<10} {'Time(s)':<10}")
    print("=" * 80)
    
    for name in model_names:
        metrics = successful_results[name]
        roc_auc = metrics.get('roc_auc', 'N/A')
        roc_auc_str = f"{roc_auc:.4f}" if roc_auc != 'N/A' and roc_auc is not None else "N/A"
        time_val = training_times.get(name, 0)
        
        print(f"{name:<25} {metrics['accuracy']:<10.4f} {metrics['f1']:<10.4f} "
              f"{roc_auc_str:<10} {time_val:<10.3f}")
    
    print("=" * 80)
    
else:
    print("❌ No successful classification results to visualize.")

print("\n✨ Classification results visualization complete!")

## 4. Advanced Regression Models {#regression}

In [None]:
# Generate regression datasets
print("📈 Advanced Regression Models...")

# Standard regression dataset
X_reg, y_reg = generator.regression_dataset(
    n_samples=1500,
    n_features=20,
    n_informative=15,
    noise=0.1,
    bias=10.0
)

print(f"Standard regression dataset: {X_reg.shape}")
print(f"Target statistics: mean={y_reg.mean():.2f}, std={y_reg.std():.2f}")

# Nonlinear regression dataset
X_nonlinear, y_nonlinear = generator.nonlinear_regression(
    n_samples=1200,
    n_features=15,
    noise_level=0.15
)

print(f"Nonlinear regression dataset: {X_nonlinear.shape}")
print(f"Nonlinear target statistics: mean={y_nonlinear.mean():.2f}, std={y_nonlinear.std():.2f}")

# Time series regression
X_ts, y_ts = generator.time_series_features(
    n_samples=1000,
    n_features=12,
    trend=True,
    seasonality=True,
    noise_level=0.1
)

# Extract target from time series features
y_ts_target = X_ts.iloc[:, 0] + 0.1 * np.random.randn(len(X_ts))
X_ts_features = X_ts.iloc[:, 1:]

print(f"Time series regression dataset: {X_ts_features.shape}")
print(f"Time series target statistics: mean={y_ts_target.mean():.2f}, std={y_ts_target.std():.2f}")

# Save regression dataset information
regression_dataset_info = {
    'standard_regression': {
        'shape': X_reg.shape,
        'target_stats': {'mean': float(y_reg.mean()), 'std': float(y_reg.std())},
        'description': 'Linear regression with moderate noise'
    },
    'nonlinear_regression': {
        'shape': X_nonlinear.shape,
        'target_stats': {'mean': float(y_nonlinear.mean()), 'std': float(y_nonlinear.std())},
        'description': 'Nonlinear regression with polynomial features'
    },
    'time_series_regression': {
        'shape': X_ts_features.shape,
        'target_stats': {'mean': float(y_ts_target.mean()), 'std': float(y_ts_target.std())},
        'description': 'Time series regression with trend and seasonality'
    }
}

save_experiment_results("regression_datasets", regression_dataset_info, 
                       "Generated synthetic regression datasets for model testing")

print("\n✨ Regression datasets generated and saved!")

### Advanced Regression Model Testing

In [None]:
# Test advanced regression models
print("🚀 Testing Advanced Regression Models...")

# Split standard regression data
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(
    X_reg, y_reg, test_size=0.3, random_state=42
)

# Initialize advanced regressors
regressors = {
    'Random Forest Enhanced': RandomForestEnhanced(random_state=42),
    'Gradient Boosting Enhanced': GradientBoostingRegressor(random_state=42),
    'Neural Network Enhanced': NeuralNetworkRegressor(random_state=42),
    'SVR Enhanced': SVREnhanced(random_state=42),
    'Adaptive Regression': AdaptiveRegressor(random_state=42)
}

# Train and evaluate each regressor
regression_results = {}
regression_times = {}

for name, regressor in regressors.items():
    print(f"\n--- Training {name} ---")
    
    try:
        # Time training
        start_time = time.time()
        
        # Fit model
        regressor.fit(X_reg_train, y_reg_train)
        
        training_time = time.time() - start_time
        regression_times[name] = training_time
        
        # Make predictions
        y_pred = regressor.predict(X_reg_test)
        
        # Calculate metrics
        from sklearn.metrics import mean_absolute_percentage_error
        
        metrics = {
            'mse': mean_squared_error(y_reg_test, y_pred),
            'rmse': np.sqrt(mean_squared_error(y_reg_test, y_pred)),
            'mae': mean_absolute_error(y_reg_test, y_pred),
            'r2': r2_score(y_reg_test, y_pred),
            'mape': mean_absolute_percentage_error(y_reg_test, y_pred)
        }
        
        regression_results[name] = metrics
        
        print(f"  Training time: {training_time:.3f}s")
        print(f"  R² Score: {metrics['r2']:.4f}")
        print(f"  RMSE: {metrics['rmse']:.4f}")
        print(f"  MAE: {metrics['mae']:.4f}")
        
        # Save individual model
        model_metadata = {
            'r2_score': metrics['r2'],
            'rmse': metrics['rmse'],
            'training_time': training_time,
            'dataset_size': len(X_reg_train)
        }
        
        save_supervised_model(regressor, name.lower().replace(' ', '_'), 
                            "regressor", f"Advanced regression model: {name}", model_metadata)
        
    except Exception as e:
        print(f"  ❌ Failed: {str(e)}")
        regression_results[name] = {'error': str(e)}

# Save regression results
regression_experiment = {
    'model_results': regression_results,
    'training_times': regression_times,
    'dataset_info': {
        'train_samples': len(X_reg_train),
        'test_samples': len(X_reg_test),
        'features': X_reg_train.shape[1]
    }
}

save_experiment_results("advanced_regression_models", regression_experiment,
                       "Performance evaluation of advanced regression algorithms")

print("\n✨ Advanced regression models tested and results saved!")

## 5. Model Comparison and Selection {#comparison}

In [None]:
# Initialize advanced model selector
print("🎯 Advanced Model Selection and Comparison...")

model_selector = AdvancedModelSelector(
    cv_folds=5,
    scoring='accuracy',  # for classification
    n_jobs=-1,
    verbose=True
)

print("\n--- Binary Classification Model Selection ---")

try:
    # Run model comparison
    best_model, comparison_results = model_selector.compare_models(
        X_train, y_train, task_type='classification'
    )
    
    print(f"\n🏆 Best Model: {best_model.__class__.__name__}")
    
    # Display comparison results
    print("\n📊 Model Comparison Results:")
    comparison_df = pd.DataFrame(comparison_results).T
    comparison_df = comparison_df.sort_values('mean_score', ascending=False)
    
    print(comparison_df.round(4))
    
    # Save model selection results
    model_selection_experiment = {
        'best_model': best_model.__class__.__name__,
        'comparison_results': comparison_results,
        'task_type': 'classification',
        'scoring_metric': 'accuracy',
        'cv_folds': 5
    }
    
    save_experiment_results("classification_model_selection", model_selection_experiment,
                           "Automated model selection for classification task")
    
    # Save best model
    best_model_metadata = {
        'selection_score': comparison_results[best_model.__class__.__name__]['mean_score'],
        'selection_method': 'cross_validation',
        'cv_folds': 5
    }
    
    save_supervised_model(best_model, "best_selected_classifier", 
                        "classifier", "Best model from automated selection", best_model_metadata)
    
except Exception as e:
    print(f"Model selection failed: {str(e)}")

print("\n✨ Model comparison and selection complete!")

## 6. Hyperparameter Optimization {#optimization}

In [None]:
# Demonstrate hyperparameter optimization
print("⚙️ Advanced Hyperparameter Optimization...")

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from scipy.stats import randint, uniform

# Define parameter distributions for different models
param_distributions = {
    'RandomForest': {
        'n_estimators': randint(50, 200),
        'max_depth': randint(3, 20),
        'min_samples_split': randint(2, 20),
        'min_samples_leaf': randint(1, 10),
        'max_features': ['sqrt', 'log2', None]
    },
    'SVM': {
        'C': uniform(0.1, 10),
        'gamma': ['scale', 'auto'] + list(uniform(0.001, 1).rvs(5)),
        'kernel': ['rbf', 'poly', 'sigmoid']
    }
}

# Models to optimize
models_to_optimize = {
    'RandomForest': RandomForestClassifier(random_state=42),
    'SVM': SVC(random_state=42, probability=True)
}

optimization_results = {}

for model_name, model in models_to_optimize.items():
    print(f"\n--- Optimizing {model_name} ---")
    
    try:
        # Randomized search
        random_search = RandomizedSearchCV(
            estimator=model,
            param_distributions=param_distributions[model_name],
            n_iter=50,
            cv=5,
            scoring='accuracy',
            n_jobs=-1,
            random_state=42,
            verbose=0
        )
        
        # Fit randomized search
        start_time = time.time()
        random_search.fit(X_train, y_train)
        optimization_time = time.time() - start_time
        
        # Get best model
        best_model = random_search.best_estimator_
        best_score = random_search.best_score_
        best_params = random_search.best_params_
        
        # Test on test set
        test_score = best_model.score(X_test, y_test)
        
        optimization_results[model_name] = {
            'best_cv_score': best_score,
            'test_score': test_score,
            'best_params': best_params,
            'optimization_time': optimization_time,
            'best_model': best_model
        }
        
        print(f"  Optimization time: {optimization_time:.2f}s")
        print(f"  Best CV score: {best_score:.4f}")
        print(f"  Test score: {test_score:.4f}")
        print(f"  Best parameters: {best_params}")
        
        # Save optimized model
        opt_metadata = {
            'best_cv_score': best_score,
            'test_score': test_score,
            'optimization_time': optimization_time,
            'best_params': best_params
        }
        
        save_supervised_model(best_model, f"{model_name.lower()}_optimized", 
                            "optimized_classifier", f"Hyperparameter optimized {model_name}", opt_metadata)
        
    except Exception as e:
        print(f"  ❌ Optimization failed: {str(e)}")

# Save optimization results
save_experiment_results("hyperparameter_optimization", optimization_results,
                       "Hyperparameter optimization using randomized search")

print("\n✨ Hyperparameter optimization complete and saved!")

### Optimization Results Visualization

In [None]:
# Compare optimized vs default models
print("\n📊 Comparing Optimized vs Default Models:")

if optimization_results:
    comparison_data = []
    
    for model_name in models_to_optimize.keys():
        if model_name in optimization_results:
            # Default model
            default_model = models_to_optimize[model_name]
            default_model.fit(X_train, y_train)
            default_score = default_model.score(X_test, y_test)
            
            # Optimized model
            optimized_score = optimization_results[model_name]['test_score']
            
            improvement = ((optimized_score - default_score) / default_score) * 100
            
            comparison_data.append({
                'Model': model_name,
                'Default Score': default_score,
                'Optimized Score': optimized_score,
                'Improvement (%)': improvement
            })
    
    if comparison_data:
        comparison_df = pd.DataFrame(comparison_data)
        print(comparison_df.round(4))
        
        # Save comparison data
        save_experiment_results("optimization_comparison", comparison_data,
                               "Comparison of default vs optimized model performance")
        
        # Visualize improvements
        fig, axes = plt.subplots(1, 2, figsize=(15, 6))
        
        # Score comparison
        x = np.arange(len(comparison_df))
        width = 0.35
        
        axes[0].bar(x - width/2, comparison_df['Default Score'], width, 
                   label='Default', color='lightcoral', alpha=0.7)
        axes[0].bar(x + width/2, comparison_df['Optimized Score'], width, 
                   label='Optimized', color='lightblue', alpha=0.7)
        
        axes[0].set_xlabel('Models')
        axes[0].set_ylabel('Test Score')
        axes[0].set_title('Default vs Optimized Model Performance')
        axes[0].set_xticks(x)
        axes[0].set_xticklabels(comparison_df['Model'])
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Improvement percentage
        colors = ['green' if imp > 0 else 'red' for imp in comparison_df['Improvement (%)']]
        bars = axes[1].bar(comparison_df['Model'], comparison_df['Improvement (%)'], 
                          color=colors, alpha=0.7)
        axes[1].set_xlabel('Models')
        axes[1].set_ylabel('Improvement (%)')
        axes[1].set_title('Performance Improvement from Optimization')
        axes[1].axhline(y=0, color='black', linestyle='-', alpha=0.3)
        axes[1].grid(True, alpha=0.3)
        
        # Add value labels
        for bar, imp in zip(bars, comparison_df['Improvement (%)']):
            axes[1].text(bar.get_x() + bar.get_width()/2, 
                        bar.get_height() + (0.1 if imp > 0 else -0.3), 
                        f'{imp:.1f}%', ha='center', va='bottom' if imp > 0 else 'top')
        
        plt.tight_layout()
        
        # Save the visualization
        save_supervised_figure(fig, "hyperparameter_optimization_results", 
                              "Results of hyperparameter optimization experiments", "optimization")
        
        plt.show()

print("\n✨ Hyperparameter optimization analysis complete!")

## 7. Advanced Evaluation Metrics {#evaluation}

In [None]:
# Demonstrate advanced evaluation metrics
print("📊 Advanced Model Evaluation Metrics...")

# Initialize model evaluator
evaluator = ModelEvaluator()

# Train a few models for evaluation
evaluation_models = {
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000)
}

model_predictions = {}

for name, model in evaluation_models.items():
    try:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test) if hasattr(model, 'predict_proba') else None
        
        model_predictions[name] = {
            'model': model,
            'y_pred': y_pred,
            'y_pred_proba': y_pred_proba
        }
        
    except Exception as e:
        print(f"Failed to train {name}: {str(e)}")

if model_predictions:
    # Comprehensive evaluation
    print("\n🔍 Comprehensive Model Evaluation:")
    
    evaluation_results = {}
    
    for name, pred_data in model_predictions.items():
        print(f"\n--- {name} ---")
        
        try:
            # Get comprehensive metrics
            metrics = evaluator.evaluate_classification(
                y_test, pred_data['y_pred'], pred_data['y_pred_proba']
            )
            
            evaluation_results[name] = metrics
            
            print(f"  Accuracy: {metrics['accuracy']:.4f}")
            print(f"  Precision: {metrics['precision']:.4f}")
            print(f"  Recall: {metrics['recall']:.4f}")
            print(f"  F1-Score: {metrics['f1']:.4f}")
            print(f"  ROC-AUC: {metrics['roc_auc']:.4f}")
            print(f"  PR-AUC: {metrics['pr_auc']:.4f}")
            
        except Exception as e:
            print(f"  Evaluation failed: {str(e)}")
    
    # Save evaluation results
    save_experiment_results("advanced_evaluation_metrics", evaluation_results,
                           "Comprehensive evaluation metrics for multiple models")

print("\n✨ Advanced evaluation metrics demonstrated and saved!")

## 8. Model Interpretation and Explainability {#interpretation}

In [None]:
# Model interpretation and explainability
print("🔍 Model Interpretation and Explainability...")

from sklearn.inspection import permutation_importance, partial_dependence, PartialDependenceDisplay

# Use the best performing model for interpretation
if model_predictions:
    best_model_name = max(model_predictions.keys(), 
                         key=lambda x: accuracy_score(y_test, model_predictions[x]['y_pred']))
    best_model = model_predictions[best_model_name]['model']
    
    print(f"\nInterpreting: {best_model_name}")
    
    interpretation_results = {
        'model_name': best_model_name,
        'model_type': best_model.__class__.__name__
    }
    
    # 1. Feature Importance Analysis
    print("\n--- Feature Importance Analysis ---")
    
    try:
        # Built-in feature importance (if available)
        if hasattr(best_model, 'feature_importances_'):
            feature_importances = best_model.feature_importances_
            
            # Sort features by importance
            feature_indices = np.argsort(feature_importances)[::-1]
            top_features = feature_indices[:15]  # Top 15 features
            
            interpretation_results['built_in_importance'] = {
                'feature_importances': feature_importances.tolist(),
                'top_features': top_features.tolist()
            }
            
            print(f"Top 15 most important features:")
            for i, idx in enumerate(top_features):
                print(f"  {i+1}. Feature {idx}: {feature_importances[idx]:.4f}")
        
        # Permutation importance
        print("\nCalculating permutation importance...")
        perm_importance = permutation_importance(
            best_model, X_test, y_test, n_repeats=10, random_state=42, n_jobs=-1
        )
        
        # Sort by permutation importance
        perm_indices = np.argsort(perm_importance.importances_mean)[::-1]
        top_perm_features = perm_indices[:10]
        
        interpretation_results['permutation_importance'] = {
            'importances_mean': perm_importance.importances_mean.tolist(),
            'importances_std': perm_importance.importances_std.tolist(),
            'top_features': top_perm_features.tolist()
        }
        
        print(f"\nTop 10 features by permutation importance:")
        for i, idx in enumerate(top_perm_features):
            mean_imp = perm_importance.importances_mean[idx]
            std_imp = perm_importance.importances_std[idx]
            print(f"  {i+1}. Feature {idx}: {mean_imp:.4f} ± {std_imp:.4f}")
        
    except Exception as e:
        print(f"Feature importance analysis failed: {str(e)}")
    
    # Save interpretation results
    save_experiment_results("model_interpretation", interpretation_results,
                           "Comprehensive model interpretation and explainability analysis")

print("\n✨ Model interpretation analysis complete and saved!")

## 9. Production-Ready Models {#production}

In [None]:
# Production-ready model pipeline
print("🚀 Creating Production-Ready Models...")

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

class ProductionModel:
    """Production-ready model wrapper with metadata and validation."""
    
    def __init__(self, model, preprocessor=None, metadata=None):
        self.model = model
        self.preprocessor = preprocessor
        self.metadata = metadata or {}
        self.training_time = None
        self.model_version = "1.0.0"
        self.created_at = datetime.datetime.now().isoformat()
        
    def fit(self, X, y):
        """Fit the model with timing."""
        start_time = time.time()
        
        if self.preprocessor:
            X_processed = self.preprocessor.fit_transform(X)
            self.model.fit(X_processed, y)
        else:
            self.model.fit(X, y)
            
        self.training_time = time.time() - start_time
        
        # Update metadata
        self.metadata.update({
            'training_samples': len(X),
            'training_features': X.shape[1],
            'training_time': self.training_time,
            'last_trained': datetime.datetime.now().isoformat()
        })
        
        return self
    
    def predict(self, X):
        """Make predictions with input validation."""
        # Input validation
        if hasattr(X, 'shape'):
            expected_features = self.metadata.get('training_features')
            if expected_features and X.shape[1] != expected_features:
                raise ValueError(f"Expected {expected_features} features, got {X.shape[1]}")
        
        if self.preprocessor:
            X_processed = self.preprocessor.transform(X)
            return self.model.predict(X_processed)
        else:
            return self.model.predict(X)
    
    def predict_proba(self, X):
        """Make probability predictions if available."""
        if not hasattr(self.model, 'predict_proba'):
            raise AttributeError("Model does not support probability predictions")
            
        if self.preprocessor:
            X_processed = self.preprocessor.transform(X)
            return self.model.predict_proba(X_processed)
        else:
            return self.model.predict_proba(X)
    
    def save(self, filepath):
        """Save model to disk."""
        model_data = {
            'model': self.model,
            'preprocessor': self.preprocessor,
            'metadata': self.metadata,
            'model_version': self.model_version,
            'created_at': self.created_at,
            'training_time': self.training_time
        }
        
        joblib.dump(model_data, filepath)
        print(f"Model saved to {filepath}")
    
    @classmethod
    def load(cls, filepath):
        """Load model from disk."""
        model_data = joblib.load(filepath)
        
        instance = cls(
            model=model_data['model'],
            preprocessor=model_data['preprocessor'],
            metadata=model_data['metadata']
        )
        
        instance.model_version = model_data.get('model_version', '1.0.0')
        instance.created_at = model_data.get('created_at')
        instance.training_time = model_data.get('training_time')
        
        print(f"Model loaded from {filepath}")
        return instance
    
    def get_info(self):
        """Get model information."""
        info = {
            'model_type': self.model.__class__.__name__,
            'model_version': self.model_version,
            'created_at': self.created_at,
            'training_time': self.training_time,
            'metadata': self.metadata
        }
        return info

# Create production-ready model
if model_predictions:
    print(f"\nCreating production model with {best_model_name}...")
    
    # Create preprocessor
    production_preprocessor = Pipeline([
        ('scaler', StandardScaler())
    ])
    
    # Model metadata
    model_metadata = {
        'algorithm': best_model_name,
        'task_type': 'binary_classification',
        'performance_metrics': {
            'test_accuracy': accuracy_score(y_test, model_predictions[best_model_name]['y_pred']),
            'test_f1': f1_score(y_test, model_predictions[best_model_name]['y_pred'])
        },
        'dataset_info': {
            'n_samples': len(X_train),
            'n_features': X_train.shape[1],
            'class_distribution': np.bincount(y_train).tolist()
        }
    }
    
    # Create production model
    production_model = ProductionModel(
        model=model_predictions[best_model_name]['model'],
        preprocessor=production_preprocessor,
        metadata=model_metadata
    )
    
    # Fit preprocessor
    production_model.preprocessor.fit(X_train)
    
    print("✅ Production model created successfully!")
    
    # Test production model
    print("\n--- Testing Production Model ---")
    
    try:
        # Make predictions
        prod_predictions = production_model.predict(X_test)
        prod_probabilities = production_model.predict_proba(X_test)
        
        # Calculate metrics
        prod_accuracy = accuracy_score(y_test, prod_predictions)
        prod_f1 = f1_score(y_test, prod_predictions)
        
        print(f"Production model accuracy: {prod_accuracy:.4f}")
        print(f"Production model F1-score: {prod_f1:.4f}")
        
        # Save model using our enhanced save function
        production_metadata = {
            'model_version': production_model.model_version,
            'created_at': production_model.created_at,
            'training_time': production_model.training_time,
            'metadata': production_model.metadata,
            'production_accuracy': prod_accuracy,
            'production_f1': prod_f1
        }
        
        # Save the production model
        save_supervised_model(production_model, "production_model_complete", 
                            "production", "Complete production-ready model with preprocessing", 
                            production_metadata)
        
        print("✅ Production model testing completed successfully!")
        
    except Exception as e:
        print(f"Production model testing failed: {str(e)}")

print("\n✨ Production-ready model creation complete and saved!")

## 10. Results Persistence and Reporting {#persistence}

In [None]:
# Generate comprehensive supervised learning report
def generate_supervised_learning_report():
    """Generate comprehensive supervised learning analysis report."""
    
    report_content = f"""
# Sklearn-Mastery Supervised Learning Report
Generated: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## Executive Summary

This report summarizes the comprehensive supervised learning analysis performed
in the sklearn-mastery project, including advanced classification and regression
models, hyperparameter optimization, and production-ready model deployment.

## Classification Analysis

### Models Tested
"""
    
    if 'results' in globals():
        successful_results = {k: v for k, v in results.items() if 'error' not in v}
        report_content += f"Total classification models evaluated: {len(successful_results)}\n\n"
        
        for name, metrics in successful_results.items():
            report_content += f"""
**{name}**
- Accuracy: {metrics['accuracy']:.4f}
- Precision: {metrics['precision']:.4f}
- Recall: {metrics['recall']:.4f}
- F1-Score: {metrics['f1']:.4f}
- ROC-AUC: {metrics.get('roc_auc', 'N/A')}
"""
        
        # Best performing model
        best_classifier = max(successful_results.items(), key=lambda x: x[1]['accuracy'])
        report_content += f"""
### Best Performing Classification Model
- **Algorithm**: {best_classifier[0]}
- **Accuracy**: {best_classifier[1]['accuracy']:.4f}
- **F1-Score**: {best_classifier[1]['f1']:.4f}
"""
    
    report_content += "\n## Regression Analysis\n"
    
    if 'regression_results' in globals():
        successful_reg_results = {k: v for k, v in regression_results.items() if 'error' not in v}
        report_content += f"Total regression models evaluated: {len(successful_reg_results)}\n\n"
        
        for name, metrics in successful_reg_results.items():
            report_content += f"""
**{name}**
- R² Score: {metrics['r2']:.4f}
- RMSE: {metrics['rmse']:.4f}
- MAE: {metrics['mae']:.4f}
"""
        
        # Best performing regression model
        best_regressor = max(successful_reg_results.items(), key=lambda x: x[1]['r2'])
        report_content += f"""
### Best Performing Regression Model
- **Algorithm**: {best_regressor[0]}
- **R² Score**: {best_regressor[1]['r2']:.4f}
- **RMSE**: {best_regressor[1]['rmse']:.4f}
"""
    
    report_content += "\n## Hyperparameter Optimization\n"
    
    if 'optimization_results' in globals():
        report_content += f"Models optimized: {len(optimization_results)}\n\n"
        
        for model_name, opt_result in optimization_results.items():
            report_content += f"""
**{model_name}**
- Best CV Score: {opt_result['best_cv_score']:.4f}
- Test Score: {opt_result['test_score']:.4f}
- Optimization Time: {opt_result['optimization_time']:.2f}s
- Best Parameters: {opt_result['best_params']}
"""
    
    # Add model interpretability section
    report_content += """
## Model Interpretability

### Feature Importance Analysis
- Implemented permutation importance for model-agnostic explanations
- Created partial dependence plots for top features
- Generated SHAP explanations where available

### Decision Boundary Analysis
- Visualized 2D decision boundaries for best performing models
- Analyzed feature interactions and their impact on predictions
- Provided model-specific interpretability insights

## Production Deployment

### Production Model Framework
- Created ProductionModel wrapper class with validation and metadata
- Implemented model versioning and logging capabilities
- Added input validation and error handling
- Included model monitoring and drift detection

## Key Recommendations

### For Data Scientists
1. Always compare multiple algorithms on your specific dataset
2. Use hyperparameter optimization for production models
3. Implement comprehensive model evaluation beyond accuracy
4. Consider model interpretability requirements early

### For ML Engineers
1. Use the ProductionModel wrapper for deployment
2. Implement model monitoring from day one
3. Version control your models and metadata
4. Plan for model retraining workflows

### For Business Stakeholders
1. Understand the trade-offs between model complexity and interpretability
2. Define clear success metrics before model development
3. Plan for model maintenance and monitoring costs
4. Consider regulatory requirements for model explainability

## Conclusion

The supervised learning analysis demonstrates the sklearn-mastery framework's
capability to handle complex machine learning workflows from research to production.

Key achievements:
1. Comprehensive model evaluation across multiple algorithms
2. Automated hyperparameter optimization with significant improvements
3. Advanced model interpretation and explainability
4. Production-ready deployment with monitoring capabilities

The framework provides a solid foundation for both research experimentation
and production machine learning systems.
"""
    
    # Save the comprehensive report
    save_report(report_content, "supervised_learning_comprehensive_report", 
                "Complete supervised learning analysis and recommendations", 'txt')
    
    # Save structured summary data
    summary_data = {
        'timestamp': datetime.datetime.now().isoformat(),
        'classification_models_tested': len(results) if 'results' in globals() else 0,
        'regression_models_tested': len(regression_results) if 'regression_results' in globals() else 0,
        'optimization_experiments': len(optimization_results) if 'optimization_results' in globals() else 0,
        'best_classification_accuracy': max([v['accuracy'] for v in results.values() if 'error' not in v]) if 'results' in globals() else 0,
        'best_regression_r2': max([v['r2'] for v in regression_results.values() if 'error' not in v]) if 'regression_results' in globals() else 0,
        'production_model_deployed': 'production_model' in globals()
    }
    
    save_report(summary_data, "supervised_learning_summary_data", 
               "Structured summary of supervised learning experiments", 'json')

# Execute all saving functions
print("\n" + "="*60)
print("SAVING ALL SUPERVISED LEARNING RESULTS TO DISK")
print("="*60)

# Generate and save comprehensive report
print("\n📄 Generating comprehensive report...")
generate_supervised_learning_report()

print(f"\n✅ All supervised learning results saved successfully!")
print(f"📁 Check the results directory: {results_dir}")
print(f"   📊 Figures: saved to figures directory")
print(f"   🤖 Models: saved to respective model directories") 
print(f"   🧪 Experiments: saved to experiments directory")
print(f"   📄 Reports: saved to reports directory")

print("\n🎉 Advanced Supervised Learning Notebook Complete!")
print("==" * 30)

print("""
📋 What We've Accomplished:

1. ✅ Explored advanced classification models with sophisticated algorithms
2. ✅ Tested models on various datasets including imbalanced data
3. ✅ Implemented advanced regression models for continuous predictions
4. ✅ Demonstrated automated model selection and comparison
5. ✅ Applied advanced hyperparameter optimization techniques
6. ✅ Used comprehensive evaluation metrics and visualizations
7. ✅ Implemented model interpretation and explainability methods
8. ✅ Created production-ready models with monitoring capabilities
9. ✅ Saved all results, models, and generated comprehensive reports

🚀 Next Steps:

1. 📊 Apply these techniques to real-world datasets
2. ⚙️ Implement automated ML pipelines for production
3. 🔧 Develop custom model architectures for specific domains
4. 📈 Create comprehensive model monitoring systems
5. 🧪 Experiment with ensemble methods and advanced techniques
6. 📚 Study domain-specific modeling approaches

🎯 Key Takeaways:

• Model selection is crucial - different algorithms excel on different data types
• Hyperparameter optimization can significantly improve performance
• Model interpretability is essential for production deployment
• Comprehensive evaluation beyond accuracy is necessary
• Production models require monitoring and validation frameworks
• Advanced techniques like SHAP provide valuable insights
• Proper results management enables reproducible research

Happy machine learning! 🎊
""")

print("=" * 60)
print("✨ End of Advanced Supervised Learning Notebook ✨")