# Tier 5: Advanced Classification Methods

---

**Author:** Brandon Deloatch
**Affiliation:** Quipu Research Labs, LLC
**Date:** 2025-10-02
**Version:** v1.3
**License:** MIT
**Notebook ID:** c386813e-e3b5-4313-a3af-c74d45353313

---

## Citation
Brandon Deloatch, "Tier 5: Advanced Classification Methods," Quipu Research Labs, LLC, v1.3, 2025-10-02.

Please cite this notebook if used or adapted in publications, presentations, or derivative work.

---

## Contributors / Acknowledgments
- **Primary Author:** Brandon Deloatch (Quipu Research Labs, LLC)
- **Institutional Support:** Quipu Research Labs, LLC - Advanced Analytics Division
- **Technical Framework:** Built on scikit-learn, pandas, numpy, and plotly ecosystems
- **Methodological Foundation:** Statistical learning principles and modern data science best practices

---

## Version History
| Version | Date | Notes |
|---------|------|-------|
| v1.3 | 2025-10-02 | Enhanced professional formatting, comprehensive documentation, interactive visualizations |
| v1.2 | 2024-09-15 | Updated analysis methods, improved data generation algorithms |
| v1.0 | 2024-06-10 | Initial release with core analytical framework |

---

## Environment Dependencies
- **Python:** 3.8+
- **Core Libraries:** pandas 2.0+, numpy 1.24+, scikit-learn 1.3+
- **Visualization:** plotly 5.0+, matplotlib 3.7+
- **Statistical:** scipy 1.10+, statsmodels 0.14+
- **Development:** jupyter-lab 4.0+, ipywidgets 8.0+

> **Reproducibility Note:** Use requirements.txt or environment.yml for exact dependency matching.

---

## Data Provenance
| Dataset | Source | License | Notes |
|---------|--------|---------|-------|
| Synthetic Data | Generated in-notebook | MIT | Custom algorithms for realistic simulation |
| Statistical Distributions | NumPy/SciPy | BSD-3-Clause | Standard library implementations |
| ML Algorithms | Scikit-learn | BSD-3-Clause | Industry-standard implementations |
| Visualization Schemas | Plotly | MIT | Interactive dashboard frameworks |

---

## Execution Provenance Logs
- **Created:** 2025-10-02
- **Notebook ID:** c386813e-e3b5-4313-a3af-c74d45353313
- **Execution Environment:** Jupyter Lab / VS Code
- **Computational Requirements:** Standard laptop/workstation (2GB+ RAM recommended)

> **Auto-tracking:** Execution metadata can be programmatically captured for reproducibility.

---

## Disclaimer & Responsible Use
This notebook is provided "as-is" for educational, research, and professional development purposes. Users assume full responsibility for any results, applications, or decisions derived from this analysis.

**Professional Standards:**
- Validate all results against domain expertise and additional data sources
- Respect licensing and attribution requirements for all dependencies
- Follow ethical guidelines for data analysis and algorithmic decision-making
- Credit all methodological sources and derivative frameworks appropriately

**Academic & Commercial Use:**
- Permitted under MIT license with proper attribution
- Suitable for educational curriculum and professional training
- Appropriate for commercial adaptation with citation requirements
- Recommended for reproducible research and transparent analytics

---



In [4]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, StratifiedKFold
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score,
 precision_score, recall_score, f1_score, roc_auc_score, roc_curve)
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.datasets import make_classification
import warnings
warnings.filterwarnings('ignore')

print(" Tier 5: Advanced Classification Methods")
print("=" * 42)
print(" CROSS-REFERENCES:")
print("• Prerequisites: Tier2_LogisticRegression.ipynb, Tier2_DecisionTree.ipynb, Tier2_RandomForest.ipynb")
print("• Builds On: Tier5_SVM.ipynb, Tier5_GradientBoosting.ipynb, Tier5_NeuralNetworks.ipynb")
print("• Complements: Tier4_kMeans.ipynb (unsupervised vs supervised)")
print("• Advanced: Advanced_EnsembleClassification.ipynb, Advanced_ImbalancedLearning.ipynb")
print("=" * 42)
print("Advanced Classification Techniques:")
print("• Multi-algorithm ensemble comparison")
print("• Hyperparameter optimization strategies")
print("• Performance metrics and evaluation")
print("• Class imbalance handling")
print("• Feature importance and selection")

 Tier 5: Advanced Classification Methods
 CROSS-REFERENCES:
• Prerequisites: Tier2_LogisticRegression.ipynb, Tier2_DecisionTree.ipynb, Tier2_RandomForest.ipynb
• Builds On: Tier5_SVM.ipynb, Tier5_GradientBoosting.ipynb, Tier5_NeuralNetworks.ipynb
• Complements: Tier4_kMeans.ipynb (unsupervised vs supervised)
• Advanced: Advanced_EnsembleClassification.ipynb, Advanced_ImbalancedLearning.ipynb
Advanced Classification Techniques:
• Multi-algorithm ensemble comparison
• Hyperparameter optimization strategies
• Performance metrics and evaluation
• Class imbalance handling
• Feature importance and selection


In [6]:
# Generate comprehensive multi-class classification dataset
np.random.seed(42)

def generate_advanced_classification_dataset(n_samples=5000, n_features=20, n_classes=4):
    """Generate realistic multi-class dataset with varying complexity."""

    # Create base classification dataset
    X, y = make_classification(
        n_samples=n_samples,
        n_features=n_features,
        n_informative=12,
        n_redundant=5,
        n_clusters_per_class=2,
        n_classes=n_classes,
        class_sep=0.8,
        random_state=42
    )

    # Add noise and make more realistic
    noise = np.random.normal(0, 0.1, X.shape)
    X = X + noise

    # Create feature names
    feature_names = [f'feature_{i+1:02d}' for i in range(n_features)]

    # Create realistic business context
    business_features = {
        'customer_age': np.random.normal(40, 15, n_samples),
        'income_level': np.random.lognormal(10, 0.5, n_samples),
        'purchase_frequency': np.random.poisson(5, n_samples),
        'customer_tenure': np.random.exponential(2, n_samples),
        'satisfaction_score': np.random.beta(8, 2, n_samples) * 10,
        'support_tickets': np.random.poisson(2, n_samples)
    }

    # Combine technical and business features
    business_array = np.column_stack(list(business_features.values()))
    X_business = np.column_stack([X, business_array])
    all_feature_names = feature_names + list(business_features.keys())

    # Create class labels with business meaning
    class_names = ['Low_Value', 'Medium_Value', 'High_Value', 'Premium']

    # Create DataFrame
    df = pd.DataFrame(X_business, columns=all_feature_names)
    df['customer_class'] = [class_names[i] for i in y]
    df['class_numeric'] = y

    return df, all_feature_names

# Generate the dataset
classification_df, feature_names = generate_advanced_classification_dataset()

print(" Advanced Classification Dataset Created:")
print(f"Dataset shape: {classification_df.shape}")
print(f"Classes: {classification_df['customer_class'].unique()}")
print(f"Class distribution:\n{classification_df['customer_class'].value_counts()}")
print(f"Features: {len(feature_names)} (technical + business)")

# Display sample data
print(f"\nSample data preview:")
print(classification_df[['customer_age', 'income_level', 'satisfaction_score', 'customer_class']].head())

 Advanced Classification Dataset Created:
Dataset shape: (5000, 28)
Classes: ['Premium' 'Medium_Value' 'Low_Value' 'High_Value']
Class distribution:
customer_class
Medium_Value    1256
Low_Value       1253
Premium         1247
High_Value      1244
Name: count, dtype: int64
Features: 26 (technical + business)

Sample data preview:
   customer_age  income_level  satisfaction_score customer_class
0     55.458919  21631.437852            8.582699        Premium
1     22.669678  25114.084987            8.229359        Premium
2     48.631558  24727.220425            7.577868   Medium_Value
3     30.711423  30888.998998            8.803176      Low_Value
4     35.088958  13980.508283            7.975653     High_Value


In [7]:
# 1. COMPREHENSIVE CLASSIFICATION ALGORITHM COMPARISON
print(" 1. COMPREHENSIVE CLASSIFICATION ALGORITHM COMPARISON")
print("=" * 55)

# Prepare data
X = classification_df[feature_names].values
y = classification_df['class_numeric'].values

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale features for algorithms that need it
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define classification algorithms
classifiers = {
 'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
 'Decision Tree': DecisionTreeClassifier(random_state=42, max_depth=10),
 'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
 'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
 'SVM (RBF)': SVC(kernel='rbf', probability=True, random_state=42),
 'K-Nearest Neighbors': KNeighborsClassifier(n_neighbors=5),
 'Neural Network': MLPClassifier(hidden_layer_sizes=(100, 50), random_state=42, max_iter=500),
 'Naive Bayes': GaussianNB()
}

# Train and evaluate all classifiers
results = {}
cv_scores = {}

print("Training and evaluating classifiers...")
for name, clf in classifiers.items():
    print(f"Training {name}...")

    # Use scaled data for algorithms that need it
    if name in ['Logistic Regression', 'SVM (RBF)', 'K-Nearest Neighbors', 'Neural Network']:
        X_train_use = X_train_scaled
        X_test_use = X_test_scaled
    else:
        X_train_use = X_train
        X_test_use = X_test

    # Train the classifier
    clf.fit(X_train_use, y_train)

    # Make predictions
    y_pred = clf.predict(X_test_use)
    y_pred_proba = clf.predict_proba(X_test_use) if hasattr(clf, 'predict_proba') else None

    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')

    # Cross-validation scores
    cv_score = cross_val_score(clf, X_train_use, y_train, cv=5, scoring='accuracy')
    cv_scores[name] = cv_score

    # Store results
    results[name] = {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'cv_mean': cv_score.mean(),
        'cv_std': cv_score.std(),
        'y_pred': y_pred,
        'y_pred_proba': y_pred_proba,
        'model': clf
    }

    print(f" Accuracy: {accuracy:.3f} | F1: {f1:.3f} | CV: {cv_score.mean():.3f} ± {cv_score.std():.3f}")

# Create results DataFrame
results_df = pd.DataFrame({
    name: {
        'Accuracy': results[name]['accuracy'],
        'Precision': results[name]['precision'],
        'Recall': results[name]['recall'],
        'F1-Score': results[name]['f1_score'],
        'CV Mean': results[name]['cv_mean'],
        'CV Std': results[name]['cv_std']
    }
    for name in results.keys()
}).T

print(f"\n CLASSIFICATION PERFORMANCE SUMMARY:")
print("=" * 45)
print(results_df.round(3))

# Find best performer
best_classifier = results_df['F1-Score'].idxmax()
best_f1 = results_df.loc[best_classifier, 'F1-Score']
print(f"\n Best Performer: {best_classifier} (F1-Score: {best_f1:.3f})")

 1. COMPREHENSIVE CLASSIFICATION ALGORITHM COMPARISON
Training and evaluating classifiers...
Training Logistic Regression...
 Accuracy: 0.585 | F1: 0.580 | CV: 0.589 ± 0.016
Training Decision Tree...
 Accuracy: 0.598 | F1: 0.596 | CV: 0.575 ± 0.023
Training Random Forest...
 Accuracy: 0.598 | F1: 0.596 | CV: 0.575 ± 0.023
Training Random Forest...
 Accuracy: 0.765 | F1: 0.762 | CV: 0.782 ± 0.017
Training Gradient Boosting...
 Accuracy: 0.765 | F1: 0.762 | CV: 0.782 ± 0.017
Training Gradient Boosting...
 Accuracy: 0.707 | F1: 0.704 | CV: 0.719 ± 0.011
Training SVM (RBF)...
 Accuracy: 0.707 | F1: 0.704 | CV: 0.719 ± 0.011
Training SVM (RBF)...
 Accuracy: 0.792 | F1: 0.790 | CV: 0.790 ± 0.019
Training K-Nearest Neighbors...
 Accuracy: 0.703 | F1: 0.700 | CV: 0.713 ± 0.010
Training Neural Network...
 Accuracy: 0.792 | F1: 0.790 | CV: 0.790 ± 0.019
Training K-Nearest Neighbors...
 Accuracy: 0.703 | F1: 0.700 | CV: 0.713 ± 0.010
Training Neural Network...
 Accuracy: 0.820 | F1: 0.820 | CV: 0

In [8]:
# 2. INTERACTIVE CLASSIFICATION VISUALIZATIONS
print(" 2. INTERACTIVE CLASSIFICATION VISUALIZATIONS")
print("=" * 47)

# Create comprehensive classification dashboard
fig = make_subplots(
 rows=3, cols=2,
 subplot_titles=[
 'Algorithm Performance Comparison',
 'Cross-Validation Score Distributions',
 'Feature Importance (Random Forest)',
 'Confusion Matrix Heatmap (Best Model)',
 'ROC Curves (Multi-Class)',
 'Classification Decision Boundaries (2D Projection)'
 ],
 specs=[[{"secondary_y": False}, {"secondary_y": False}],
 [{"secondary_y": False}, {"type": "heatmap"}],
 [{"secondary_y": False}, {"secondary_y": False}]]
)

# 1. Performance comparison bar chart
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
colors = ['blue', 'green', 'orange', 'red']

for i, metric in enumerate(metrics):
    fig.add_trace(
        go.Bar(
            x=results_df.index,
            y=results_df[metric],
            name=metric,
            marker_color=colors[i],
            opacity=0.8,
            offsetgroup=i
        ),
        row=1, col=1
    )

# 2. Cross-validation distributions
for name in cv_scores.keys():
    fig.add_trace(
        go.Box(
            y=cv_scores[name],
            name=name,
            boxpoints='all',
            jitter=0.3,
            pointpos=-1.8
        ),
        row=1, col=2
    )

# 3. Feature importance (Random Forest)
rf_model = results['Random Forest']['model']
feature_importance = pd.DataFrame({
 'feature': feature_names,
 'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=True).tail(15)

fig.add_trace(
 go.Bar(
 x=feature_importance['importance'],
 y=feature_importance['feature'],
 orientation='h',
 marker_color='forestgreen',
 name='Feature Importance'
 ),
 row=2, col=1
)

# 4. Confusion matrix for best model
best_y_pred = results[best_classifier]['y_pred']
cm = confusion_matrix(y_test, best_y_pred)
class_names = ['Low_Value', 'Medium_Value', 'High_Value', 'Premium']

fig.add_trace(
 go.Heatmap(
 z=cm,
 x=class_names,
 y=class_names,
 colorscale='Blues',
 showscale=True,
 text=cm,
 texttemplate="%{text}",
 hovertemplate='Predicted: %{x}<br>Actual: %{y}<br>Count: %{z}<extra></extra>'
 ),
 row=2, col=2
)

# 5. ROC Curves (One-vs-Rest for multi-class)
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import label_binarize

# Binarize the output
y_test_bin = label_binarize(y_test, classes=[0, 1, 2, 3])
n_classes = y_test_bin.shape[1]

# Get probabilities for best classifier
best_model = results[best_classifier]['model']
if best_classifier in ['Logistic Regression', 'SVM (RBF)', 'K-Nearest Neighbors', 'Neural Network']:
 X_test_use = X_test_scaled
else:
 X_test_use = X_test

y_score = best_model.predict_proba(X_test_use)

# Compute ROC curve for each class
for i in range(n_classes):
    fpr, tpr, _ = roc_curve(y_test_bin[:, i], y_score[:, i])
    roc_auc = auc(fpr, tpr)

    fig.add_trace(
        go.Scatter(
            x=fpr,
            y=tpr,
            mode='lines',
            name=f'Class {class_names[i]} (AUC = {roc_auc:.2f})',
            line=dict(width=2)
        ),
        row=3, col=1
    )

# Add diagonal line
fig.add_trace(
 go.Scatter(
 x=[0, 1],
 y=[0, 1],
 mode='lines',
 line=dict(dash='dash', color='black'),
 name='Random Classifier',
 showlegend=False
 ),
 row=3, col=1
)

# 6. Decision boundary visualization (2D projection using first 2 features)
# Create a mesh for decision boundary
X_subset = X_test[:, :2] # Use first 2 features for 2D visualization
h = 0.02
x_min, x_max = X_subset[:, 0].min() - 1, X_subset[:, 0].max() + 1
y_min, y_max = X_subset[:, 1].min() - 1, X_subset[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Use simplified model for boundary visualization
simple_rf = RandomForestClassifier(n_estimators=50, random_state=42)
X_train_2d = X_train[:, :2]
simple_rf.fit(X_train_2d, y_train)

# Get predictions for mesh
mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = simple_rf.predict(mesh_points)
Z = Z.reshape(xx.shape)

# Add contour plot
fig.add_trace(
 go.Contour(
 x=np.arange(x_min, x_max, h),
 y=np.arange(y_min, y_max, h),
 z=Z,
 colorscale='Viridis',
 opacity=0.3,
 showscale=False,
 contours=dict(showlabels=True)
 ),
 row=3, col=2
)

# Add scatter plot of actual data points
colors_map = {0: 'red', 1: 'blue', 2: 'green', 3: 'purple'}
for class_val in [0, 1, 2, 3]:
    mask = y_test == class_val
    fig.add_trace(
        go.Scatter(
            x=X_subset[mask, 0],
            y=X_subset[mask, 1],
            mode='markers',
            name=class_names[class_val],
            marker=dict(color=colors_map[class_val], size=6, opacity=0.8),
            showlegend=False
        ),
        row=3, col=2
    )

# Update layout
fig.update_layout(
 height=1200,
 title="Advanced Classification Methods - Comprehensive Analysis Dashboard",
 showlegend=True
)

# Update axis labels
fig.update_xaxes(title_text="Classifier", row=1, col=1)
fig.update_xaxes(title_text="Classifier", row=1, col=2)
fig.update_xaxes(title_text="Feature Importance", row=2, col=1)
fig.update_xaxes(title_text="Predicted Class", row=2, col=2)
fig.update_xaxes(title_text="False Positive Rate", row=3, col=1)
fig.update_xaxes(title_text="Feature 1", row=3, col=2)

fig.update_yaxes(title_text="Score", row=1, col=1)
fig.update_yaxes(title_text="Cross-Validation Score", row=1, col=2)
fig.update_yaxes(title_text="Feature", row=2, col=1)
fig.update_yaxes(title_text="Actual Class", row=2, col=2)
fig.update_yaxes(title_text="True Positive Rate", row=3, col=1)
fig.update_yaxes(title_text="Feature 2", row=3, col=2)

fig.show()

 2. INTERACTIVE CLASSIFICATION VISUALIZATIONS


In [14]:
# 3. ADVANCED HYPERPARAMETER OPTIMIZATION
print(" 3. ADVANCED HYPERPARAMETER OPTIMIZATION")
print("=" * 45)

# Hyperparameter optimization for top 3 performers
top_performers = results_df.nlargest(3, 'F1-Score').index.tolist()

print(f"Optimizing hyperparameters for top 3 performers: {top_performers}")

# Define parameter grids
param_grids = {
 'Random Forest': {
 'n_estimators': [50, 100, 200],
 'max_depth': [5, 10, 15, None],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 4]
 },
 'Gradient Boosting': {
 'n_estimators': [50, 100, 150],
 'learning_rate': [0.05, 0.1, 0.15],
 'max_depth': [3, 5, 7],
 'subsample': [0.8, 0.9, 1.0]
 },
 'SVM (RBF)': {
 'C': [0.1, 1, 10, 100],
 'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1],
 'kernel': ['rbf', 'poly']
 },
 'Logistic Regression': {
 'C': [0.1, 1, 10, 100],
 'penalty': ['l1', 'l2', 'elasticnet'],
 'solver': ['liblinear', 'saga'],
 'max_iter': [1000, 2000]
 }
}

optimized_results = {}

for classifier_name in top_performers:
    if classifier_name in param_grids:
        print(f"\nOptimizing {classifier_name}...")

        # Get base classifier
        if classifier_name == 'Random Forest':
            base_clf = RandomForestClassifier(random_state=42)
            X_use = X_train
        elif classifier_name == 'Gradient Boosting':
            base_clf = GradientBoostingClassifier(random_state=42)
            X_use = X_train
        elif classifier_name == 'SVM (RBF)':
            base_clf = SVC(random_state=42, probability=True)
            X_use = X_train_scaled
        elif classifier_name == 'Logistic Regression':
            base_clf = LogisticRegression(random_state=42)
            X_use = X_train_scaled

        # Perform grid search
        grid_search = GridSearchCV(
            base_clf,
            param_grids[classifier_name],
            cv=3, # Reduced for speed
            scoring='f1_weighted',
            n_jobs=-1,
            verbose=0
        )

        grid_search.fit(X_use, y_train)

        # Get best model and evaluate
        best_model = grid_search.best_estimator_

        # Test predictions
        if classifier_name in ['SVM (RBF)', 'Logistic Regression']:
            X_test_use = X_test_scaled
        else:
            X_test_use = X_test

        y_pred_optimized = best_model.predict(X_test_use)

        # Calculate improved metrics
        accuracy_opt = accuracy_score(y_test, y_pred_optimized)
        f1_opt = f1_score(y_test, y_pred_optimized, average='weighted')

        optimized_results[classifier_name] = {
            'best_params': grid_search.best_params_,
            'best_score': grid_search.best_score_,
            'test_accuracy': accuracy_opt,
            'test_f1': f1_opt,
            'improvement': f1_opt - results[classifier_name]['f1_score']
        }

        print(f" Best parameters: {grid_search.best_params_}")
        print(f" Cross-validation score: {grid_search.best_score_:.3f}")
        print(f" Test F1 improvement: {optimized_results[classifier_name]['improvement']:+.3f}")
# Business insights and ROI calculation
print(f"\n BUSINESS INSIGHTS AND ROI ANALYSIS:")
print("=" * 40)

# Simulate business metrics
total_customers = 10000
classification_accuracy = results_df.loc[best_classifier, 'Accuracy']

# Business impact scenarios
scenarios = {
 'Customer Retention': {
 'base_retention_rate': 0.85,
 'improved_retention': 0.85 + (classification_accuracy - 0.8) * 0.1,
 'avg_customer_value': 2500,
 'description': 'Improved churn prediction accuracy'
 },
 'Marketing Efficiency': {
 'base_conversion': 0.15,
 'improved_conversion': 0.15 + (classification_accuracy - 0.8) * 0.05,
 'marketing_budget': 500000,
 'description': 'Better targeting through customer classification'
 },
 'Fraud Detection': {
 'fraud_rate': 0.02,
 'detection_accuracy': classification_accuracy,
 'avg_fraud_loss': 1500,
 'description': 'Reduced fraud losses through better detection'
 }
}

total_annual_benefits = 0

for scenario_name, metrics in scenarios.items():
    print(f"\n{scenario_name}:")

    if scenario_name == 'Customer Retention':
        base_retained = total_customers * metrics['base_retention_rate']
        improved_retained = total_customers * metrics['improved_retention']
        additional_retained = improved_retained - base_retained
        value_improvement = additional_retained * metrics['avg_customer_value']

        print(f" • Base retention: {metrics['base_retention_rate']*100:.1f}%")
        print(f" • Improved retention: {metrics['improved_retention']*100:.1f}%")
        print(f" • Additional customers retained: {additional_retained:.0f}")
        print(f" • Annual value improvement: ${value_improvement:,.0f}")
        total_annual_benefits += value_improvement

    elif scenario_name == 'Marketing Efficiency':
        base_conversions = metrics['marketing_budget'] / 100 * metrics['base_conversion']
        improved_conversions = metrics['marketing_budget'] / 100 * metrics['improved_conversion']
        additional_conversions = improved_conversions - base_conversions
        value_improvement = additional_conversions * 150 # Avg profit per conversion

        print(f" • Base conversion rate: {metrics['base_conversion']*100:.1f}%")
        print(f" • Improved conversion rate: {metrics['improved_conversion']*100:.1f}%")
        print(f" • Additional conversions: {additional_conversions:.0f}")
        print(f" • Annual value improvement: ${value_improvement:,.0f}")
        total_annual_benefits += value_improvement

    elif scenario_name == 'Fraud Detection':
        total_fraud_cases = total_customers * metrics['fraud_rate']
        detected_fraud = total_fraud_cases * metrics['detection_accuracy']
        prevented_losses = detected_fraud * metrics['avg_fraud_loss']

        print(f" • Total fraud cases: {total_fraud_cases:.0f}")
        print(f" • Detection accuracy: {metrics['detection_accuracy']*100:.1f}%")
        print(f" • Cases detected: {detected_fraud:.0f}")
        print(f" • Prevented losses: ${prevented_losses:,.0f}")
        total_annual_benefits += prevented_losses
# ROI calculation
implementation_cost = 300_000 # Initial development and deployment
annual_operational_cost = 75_000 # Maintenance and monitoring
net_annual_benefits = total_annual_benefits - annual_operational_cost
roi = (net_annual_benefits - implementation_cost) / implementation_cost

print(f"\n CLASSIFICATION SYSTEM ROI:")
print("=" * 30)
print(f"• Total annual benefits: ${total_annual_benefits:,.0f}")
print(f"• Implementation cost: ${implementation_cost:,.0f}")
print(f"• Annual operational cost: ${annual_operational_cost:,.0f}")
print(f"• Net annual benefits: ${net_annual_benefits:,.0f}")
print(f"• ROI: {roi*100:.0f}%")
print(f"• Payback period: {implementation_cost/net_annual_benefits*12:.1f} months")

print(f"\n Cross-Reference Learning Path:")
print(f"• Foundation: Tier2_LogisticRegression.ipynb (basic classification)")
print(f"• Building Blocks: Tier2_DecisionTree.ipynb, Tier2_RandomForest.ipynb")
print(f"• Advanced Methods: Tier5_SVM.ipynb, Tier5_NeuralNetworks.ipynb")
print(f"• Specialized: Advanced_EnsembleClassification.ipynb, Advanced_ImbalancedLearning.ipynb")
print(f"• Complete Guide: CROSS_REFERENCE_GUIDE.md")

 3. ADVANCED HYPERPARAMETER OPTIMIZATION
Optimizing hyperparameters for top 3 performers: ['Neural Network', 'SVM (RBF)', 'Random Forest']

Optimizing SVM (RBF)...
 Best parameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
 Cross-validation score: 0.805
 Test F1 improvement: +0.020

Optimizing Random Forest...
 Best parameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
 Cross-validation score: 0.805
 Test F1 improvement: +0.020

Optimizing Random Forest...
 Best parameters: {'max_depth': 15, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}
 Cross-validation score: 0.779
 Test F1 improvement: +0.012

 BUSINESS INSIGHTS AND ROI ANALYSIS:

Customer Retention:
 • Base retention: 85.0%
 • Improved retention: 85.2%
 • Additional customers retained: 20
 • Annual value improvement: $50,000

Marketing Efficiency:
 • Base conversion rate: 15.0%
 • Improved conversion rate: 15.1%
 • Additional conversions: 5
 • Annual value improvement: $750

Fraud Detection:
 • Total fraud case