# Final Project Analysis: Bank Marketing Campaign

**Objective:** This notebook synthesizes and visualizes the performance of all trained models to identify the champion model and understand the strategic insights from the project.

**Methodology:**
1.  **Load All Results:** Automatically find and parse all `classification_report.txt` files.
2.  **Load Top Models:** Load the saved model objects for the best-performing classifiers.
3.  **Generate Visualizations:** Create a suite of 10 high-impact visualizations to compare performance, analyze trade-offs, and understand model behavior.

---

In [10]:
# (Keep all your existing imports)
import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import warnings
from sklearn.metrics import RocCurveDisplay
import plotly.express as px

# --- ADDITION: Import ipywidgets ---
from ipywidgets import interactive, VBox, HTML
import ipywidgets as widgets

# Set a consistent style for all plots
plt.style.use('seaborn-v0_8-whitegrid')
warnings.filterwarnings('ignore')

print("Libraries imported successfully, including ipywidgets.")

Libraries imported successfully, including ipywidgets.


In [11]:
# --- Configuration ---
# List of the top-performing and most interesting models for detailed, comparative plots.
# This list has been updated to match the final set of trained models.
MODELS_TO_ANALYZE_DETAIL = [
    # The Champion Models
    'rf_tuned_smote_optimal_thresh',
    'tabnet_smote_target_encoding',
    
    # The Other Main Contenders
    'deep_learning_mlp_tuned',
    'logistic_regression_smote_te',
    'catboost_target_encoding',
    'lightgbm_rfe_smote_target_encoding',
    
    # The Advanced Ensembles
    'stacking_ensemble_smote_v2',
    'voting_model', # Assuming you have a voting model from our scripts
    
    # The Alternative RF model for comparison
    'rf_tuned_class_weight'
]

# Path to the processed test data needed for generating predictions
TEST_DATA_PATH = 'data/processed_target_encoding'

print("Configuration updated to include all top models for detailed analysis.")
# Clean up the list to only include models that actually exist
existing_models_to_analyze = []
for model_name in MODELS_TO_ANALYZE_DETAIL:
    if os.path.exists(f'models/{model_name}'):
        existing_models_to_analyze.append(model_name)
    else:
        print(f"  - Warning: Model folder 'models/{model_name}' not found. It will be excluded from detailed plots.")
MODELS_TO_ANALYZE_DETAIL = existing_models_to_analyze

print(f"\nFinal list of models for detailed plots: {MODELS_TO_ANALYZE_DETAIL}")

Configuration updated to include all top models for detailed analysis.

Final list of models for detailed plots: ['rf_tuned_smote_optimal_thresh', 'tabnet_smote_target_encoding', 'deep_learning_mlp_tuned', 'logistic_regression_smote_te', 'catboost_target_encoding', 'lightgbm_rfe_smote_target_encoding', 'stacking_ensemble_smote_v2', 'rf_tuned_class_weight']


In [12]:
def parse_report(report_path):
    """Parses a classification_report.txt file to extract metrics for the positive class."""
    try:
        with open(report_path, 'r') as f:
            for line in f:
                if 'Yes (deposit)' in line:
                    parts = line.split()
                    if len(parts) >= 5:
                        return float(parts[2]), float(parts[3]), float(parts[4])
    except Exception:
        return None, None, None
    return None, None, None

# Find all classification reports automatically
report_files = glob.glob('results/*/classification_report.txt')

results_data = []
for file_path in report_files:
    model_name = os.path.basename(os.path.dirname(file_path))
    p, r, f1 = parse_report(file_path)
    if f1 is not None:
        results_data.append({
            'Model': model_name.replace('_', ' ').title(),
            'Precision (Yes)': p,
            'Recall (Yes)': r,
            'F1-Score (Yes)': f1
        })

# Create and display the final DataFrame
results_df = pd.DataFrame(results_data)
results_df = results_df.sort_values(by='F1-Score (Yes)', ascending=False).reset_index(drop=True)

print("--- Final Model Performance Summary ---")
print(results_df.to_string())

--- Final Model Performance Summary ---
                                Model  Precision (Yes)  Recall (Yes)  F1-Score (Yes)
0       Rf Tuned Smote Optimal Thresh             0.51          0.83            0.64
1             Deep Learning Mlp Tuned             0.51          0.67            0.58
2        Logistic Regression Smote Te             0.54          0.59            0.56
3        Tabnet Smote Target Encoding             0.55          0.34            0.42
4               Rf Tuned Class Weight             0.64          0.31            0.42
5          Stacking Ensemble Smote V2             0.65          0.28            0.39
6            Catboost Target Encoding             0.69          0.27            0.38
7             Stacking Ensemble Tuned             0.57          0.21            0.31
8  Lightgbm Rfe Smote Target Encoding             0.69          0.16            0.26


In [13]:
# --- Load Test Data and ALL Available Models ---

print("="*70)
print("--- Loading Test Data and ALL Available Models for Analysis ---")
print("="*70)

# 1. Load the test data
try:
    X_test = pd.read_csv('data/processed_target_encoding/X_test_processed.csv')
    y_test = pd.read_csv('data/processed_target_encoding/y_test.csv').values.ravel()
    print(f"Test data loaded successfully. Shape: {X_test.shape}")
except FileNotFoundError:
    print(f"\nFATAL: Test data not found. Cannot generate detailed plots.")
    # Stop execution if data is missing
    raise

# 2. Define folders to explicitly ignore
FOLDERS_TO_IGNORE = ['best_params', 'preprocessor_target_encoding']

# 3. Automatically discover all model folders
all_folders = [os.path.basename(p) for p in glob.glob('models/*') if os.path.isdir(p)]
model_folders_to_load = sorted([f for f in all_folders if f not in FOLDERS_TO_IGNORE])
print(f"\nFound {len(model_folders_to_load)} model folders to load: {model_folders_to_load}")

# 4. Loop through folders, load models with the correct function, and get predictions
models_predictions = {}
# --- ADDITION: Import Keras loader ---
from tensorflow.keras.models import load_model

for model_folder in model_folders_to_load:
    try:
        model_path_search = glob.glob(f'models/{model_folder}/model.*')
        if not model_path_search:
            print(f"  - Warning: No model file found in 'models/{model_folder}'. Skipping.")
            continue
        
        model_path = model_path_search[0]
        
        # --- KEY CHANGE: Use the correct loader based on file extension ---
        if model_path.endswith('.keras'):
            model = load_model(model_path)
            print(f"  - Loading Keras model from {model_folder}...")
        else:
            model = joblib.load(model_path)
            print(f"  - Loading joblib model from {model_folder}...")
        # --- END OF CHANGE ---

        # Handle data format differences
        X_test_input = X_test.to_numpy() if 'tabnet' in model_folder or 'deep_learning' in model_folder else X_test
        y_pred_proba = model.predict_proba(X_test_input)
        
        model_name = model_folder.replace('_', ' ').title()
        models_predictions[model_name] = {'model': model, 'y_probas': y_pred_proba}
        print(f"  - Successfully processed model: {model_name}")
        
    except Exception as e:
        print(f"  - CRITICAL Warning: Could not load or predict with model '{model_folder}'. It will be excluded. Error: {e}")

if not models_predictions:
    print("\nWarning: No models could be successfully loaded. Detailed plots will be skipped.")
else:
    print(f"\nSuccessfully loaded {len(models_predictions)} models for detailed plotting.")

--- Loading Test Data and ALL Available Models for Analysis ---
Test data loaded successfully. Shape: (8238, 38)

Found 9 model folders to load: ['catboost_target_encoding', 'deep_learning_mlp_tuned', 'lightgbm_rfe_smote_target_encoding', 'logistic_regression_smote_te', 'rf_tuned_class_weight', 'rf_tuned_smote_optimal_thresh', 'stacking_ensemble_smote_v2', 'stacking_ensemble_tuned', 'tabnet_smote_target_encoding']
  - Loading joblib model from catboost_target_encoding...
  - Successfully processed model: Catboost Target Encoding
  - Loading Keras model from deep_learning_mlp_tuned...
  - Loading joblib model from lightgbm_rfe_smote_target_encoding...
  - Successfully processed model: Lightgbm Rfe Smote Target Encoding
  - Loading joblib model from logistic_regression_smote_te...
  - Successfully processed model: Logistic Regression Smote Te
  - Loading joblib model from rf_tuned_class_weight...
  - Successfully processed model: Rf Tuned Class Weight
  - Loading joblib model from rf_tun

## Visualization 1 (Interactive): Overall Metric Comparison

**Purpose:** To provide a high-level executive summary of performance. Use the widget below to select one or more models to compare their Precision, Recall, and F1-Score side-by-side.

In [14]:
# --- Visualization 1 (Interactive): Overall Metric Comparison with Plotly ---

# The data is already in results_df from the previous cells

# "Melt" the DataFrame to prepare it for grouped bar plotting
df_melted = results_df.melt(id_vars='Model', var_name='Metric', value_name='Score',
                            value_vars=['Precision (Yes)', 'Recall (Yes)', 'F1-Score (Yes)'])

# Create the figure with a single line of code
fig = px.bar(
    df_melted,
    x='Model',
    y='Score',
    color='Metric',
    barmode='group',  # This creates the grouped (side-by-side) bars
    text='Score',     # This adds the score as text on the bars
    title='<b>Comparative Analysis of Model Performance (Positive Class)</b>',
    labels={'Score': 'Metric Score', 'Model': 'Model Name'},
    height=600
)

# Update the text on the bars to be formatted nicely
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
# Update the layout for a cleaner look
fig.update_layout(
    yaxis_range=[0,1],
    xaxis_tickangle=-45,
    title_x=0.5, # Center the title
    legend_title_text='Metrics'
)

# Show the interactive figure
fig.show()

## Visualization 2 (Interactive): Comparative ROC Curves

**Purpose:** To measure the pure discriminative power of the selected models. The Area Under the Curve (AUC) represents a model's ability to correctly rank positive samples higher than negative ones. Use the widget to choose which models to compare.

In [15]:
import plotly.graph_objects as go
from sklearn.metrics import roc_curve, roc_auc_score

# --- Visualization 3 (Interactive): Comparative ROC Curves with Plotly ---

# Create a figure object
fig = go.Figure()

# Add the 'Chance Level' line first so it's in the background
fig.add_shape(type='line', line=dict(dash='dash'), x0=0, x1=1, y0=0, y1=1)

# Loop through our loaded models to calculate and plot their ROC curves
for model_name, data in models_predictions.items():
    y_probas = data['y_probas'][:, 1]
    
    # Calculate ROC curve components
    fpr, tpr, _ = roc_curve(y_test, y_probas)
    # Calculate AUC score
    auc_score = roc_auc_score(y_test, y_probas)
    
    # Add the ROC curve for this model to the figure
    fig.add_trace(go.Scatter(x=fpr, y=tpr, name=f'{model_name} (AUC={auc_score:.3f})', mode='lines'))

# Update the layout for a professional look
fig.update_layout(
    xaxis_title='False Positive Rate',
    yaxis_title='True Positive Rate (Recall)',
    yaxis=dict(scaleanchor="x", scaleratio=1),
    xaxis=dict(constrain='domain'),
    width=800, height=700,
    title='<b>Comparative ROC Curves for Top Models</b>',
    title_x=0.5,
    legend_title_text='Models'
)

fig.show()

## Visualization 3: Normalized Confusion Matrix Grid

**Purpose:** To provide a detailed breakdown of each top model's error types. The matrices are normalized by the true class, showing the percentage of each class that was correctly or incorrectly classified. This is the best way to visually compare recall rates.

In [16]:
from plotly.subplots import make_subplots
from sklearn.metrics import confusion_matrix

# --- Visualization 4 (Interactive): Normalized Confusion Matrix Grid with Plotly ---

# Determine the grid size (we'll aim for 3 columns)
n_models = len(models_predictions)
n_cols = 3
n_rows = (n_models + n_cols - 1) // n_cols # Ceiling division to get number of rows

# Create the subplots
fig = make_subplots(
    rows=n_rows, cols=n_cols,
    subplot_titles=list(models_predictions.keys()),
    vertical_spacing=0.15
)

# Loop through the models and add a heatmap for each
row = 1
col = 1
for model_name, data in models_predictions.items():
    # Using a default 0.5 threshold for this comparison
    y_pred = (data['y_probas'][:, 1] > 0.5).astype(int)
    
    # Calculate the confusion matrix, normalized by the true class
    cm = confusion_matrix(y_test, y_pred, normalize='true')
    cm_text = [[f'{val:.1%}' for val in row] for row in cm] # Format as percentage text
    
    # Create the heatmap trace
    heatmap = go.Heatmap(
        z=cm,
        x=['Predicted No', 'Predicted Yes'],
        y=['True No', 'True Yes'],
        text=cm_text,
        texttemplate="%{text}",
        colorscale='Blues',
        showscale=False # Turn off individual color bars
    )
    
    fig.add_trace(heatmap, row=row, col=col)
    
    # Move to the next subplot position
    col += 1
    if col > n_cols:
        col = 1
        row += 1

# Update the overall layout
fig.update_layout(
    height=400 * n_rows,
    width=1000,
    title_text='<b>Normalized Confusion Matrix Grid (Success Rate per Class)</b>',
    title_x=0.5
)

fig.show()

## Visualization 4: Calibration Curves

**Purpose:** To check the trustworthiness of each model's predicted probabilities. A perfectly calibrated model's curve will lie along the diagonal. This means that if it predicts a 70% probability of success, it is correct 70% of the time.

In [17]:
from sklearn.calibration import calibration_curve

# --- Visualization 5 (Interactive): Calibration Curves with Plotly ---

# Create a figure object
fig = go.Figure()

# Add the 'Perfectly Calibrated' diagonal line
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 1], mode='lines', line=dict(dash='dash'), name='Perfectly Calibrated'))

# Loop through the models to calculate and plot their calibration curves
for model_name, data in models_predictions.items():
    y_probas = data['y_probas'][:, 1]
    
    # Calculate the calibration curve components
    fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_probas, n_bins=15)
    
    # Add the calibration curve for this model
    fig.add_trace(go.Scatter(x=mean_predicted_value, y=fraction_of_positives, name=model_name, mode='lines+markers'))

# Update the layout
fig.update_layout(
    xaxis_title='Mean Predicted Probability (Confidence)',
    yaxis_title='Fraction of Positives (Actual Success Rate)',
    width=800, height=700,
    title='<b>Calibration Curves for Top Models</b>',
    title_x=0.5,
    legend_title_text='Models'
)

fig.show()

Visualization 5 (Alternative): Cumulative Gains Chart
Why it's important: It's the key business plot, showing the percentage of "Yes" subscribers captured versus the percentage of customers contacted. It directly visualizes the ROI of each model.

In [19]:
print("\n" + "="*70)
print("--- Generating Visualization 5: Cumulative Gains Chart ---")
print("="*70)

def calculate_cumulative_gain(y_true, y_probas):
    """Calculates the points needed for a cumulative gains chart."""
    # Combine true labels and predicted probabilities, then sort by probability
    df = pd.DataFrame({'y_true': y_true, 'y_probas': y_probas}).sort_values('y_probas', ascending=False)
    
    # Calculate cumulative sums
    df['cumulative_positives'] = df['y_true'].cumsum()
    total_positives = df['y_true'].sum()
    
    # Calculate percentages
    df['percentage_of_positives_captured'] = df['cumulative_positives'] / total_positives
    df['percentage_of_sample_targeted'] = np.arange(1, len(df) + 1) / len(df)
    
    return df['percentage_of_sample_targeted'].tolist(), df['percentage_of_positives_captured'].tolist()

if not models_predictions:
    print("No models loaded to generate gains chart.")
else:
    # Create a figure object
    fig = go.Figure()

    # Add the "Random Selection" baseline
    fig.add_trace(go.Scatter(
        x=[0, 1], y=[0, 1],
        mode='lines',
        line=dict(color='black', dash='dash'),
        name='Random Selection'
    ))
    
    # Add the "Perfect Model" baseline
    # First, calculate the points for the perfect model
    perfect_x, perfect_y = calculate_cumulative_gain(y_test, y_test) # Probas are the true labels
    fig.add_trace(go.Scatter(
        x=perfect_x, y=perfect_y,
        mode='lines',
        line=dict(color='grey', dash='dot'),
        name='Perfect Model'
    ))

    # Loop through our trained models
    for model_name, data in models_predictions.items():
        y_probas = data['y_probas'][:, 1]
        
        # Calculate the gains for this model
        x_vals, y_vals = calculate_cumulative_gain(y_test, y_probas)
        
        # Add the gains curve for this model to the plot
        fig.add_trace(go.Scatter(x=x_vals, y=y_vals, name=model_name, mode='lines'))

    # Update the layout for a professional look
    fig.update_layout(
        xaxis_title='Percentage of Sample Targeted',
        yaxis_title='Percentage of Positive Class ("Yes") Captured',
        width=900, height=700,
        title='<b>Cumulative Gains: Capturing Subscribers by Targeting Top Customers</b>',
        title_x=0.5,
        legend_title_text='Models',
        xaxis=dict(tickformat=".0%"),
        yaxis=dict(tickformat=".0%")
    )

    print("Interactive Cumulative Gains chart generated.")
    fig.show()


--- Generating Visualization 5: Cumulative Gains Chart ---
Interactive Cumulative Gains chart generated.


Visualization 6 (Alternative): Feature Importance Comparison
Why it's important: It's our main tool for interpretability, showing what features the models found most predictive.

In [20]:
# --- Visualization 6 (Interactive): Feature Importance Plots ---

print("\n" + "="*70)
print("--- Generating Visualization 6: Feature Importance Plots ---")
print("="*70)

# We focus on models with built-in feature importances
tree_model_names = [name for name in models_predictions.keys() if 'Rf' in name or 'Lightgbm' in name or 'Catboost' in name]

if not tree_model_names:
    print("No tree-based models (RF, LGBM, CatBoost) found for feature importance analysis.")
else:
    print(f"Generating importance plots for: {', '.join(tree_model_names)}")
    for model_name in tree_model_names:
        try:
            data = models_predictions[model_name]
            
            # Find the actual classifier within the pipeline
            classifier = None
            if 'classifier' in data['model'].named_steps:
                classifier = data['model'].named_steps['classifier']
            
            if classifier is None:
                raise ValueError("Could not find a step named 'classifier' in the model pipeline.")

            # Get the correct feature names depending on whether RFE was used
            if 'rfe' in data['model'].named_steps:
                support = data['model'].named_steps['rfe'].support_
                feature_names = X_test.columns[support]
            else:
                feature_names = X_test.columns

            importances = classifier.feature_importances_
            
            feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
            feature_importance_df = feature_importance_df.sort_values('Importance', ascending=False).head(15)
            
            # Create the interactive bar chart with Plotly Express
            fig = px.bar(
                feature_importance_df.sort_values('Importance', ascending=True), # Horizontal bars look better ascending
                x='Importance',
                y='Feature',
                orientation='h',
                title=f'<b>Top 15 Feature Importances for<br>{model_name}</b>',
                height=600
            )
            fig.update_layout(title_x=0.5)
            fig.show()

        except Exception as e:
            print(f"Could not plot feature importance for {model_name}. Error: {e}")


--- Generating Visualization 6: Feature Importance Plots ---
Generating importance plots for: Catboost Target Encoding, Lightgbm Rfe Smote Target Encoding, Rf Tuned Class Weight, Rf Tuned Smote Optimal Thresh


Could not plot feature importance for Rf Tuned Class Weight. Error: 'RandomForestClassifier' object has no attribute 'named_steps'


Visualization 7: Distribution of Predicted Probabilities
Why it's important: This is a powerful diagnostic tool. It shows you the "confidence profile" of each model.
A good, confident model will produce a U-shaped or bimodal distribution—it will predict probabilities that are very close to 0 or 1.
An unconfident or confused model will produce a distribution clustered in the middle (e.g., around 0.3-0.6).
This helps explain why a model's F1-score might be low.

In [21]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# --- Visualization 7 (Interactive): Distribution of Predicted Probabilities ---

print("\n" + "="*70)
print("--- Generating Visualization 7: Distribution of Predicted Probabilities ---")
print("="*70)

if not models_predictions:
    print("No models loaded to generate plots.")
else:
    # Determine the grid size for the subplots
    num_models = len(models_predictions)
    # Create a grid that's roughly square
    ncols = int(np.ceil(np.sqrt(num_models)))
    nrows = int(np.ceil(num_models / ncols))
    
    # Create subplots
    fig = make_subplots(
        rows=nrows, cols=ncols,
        subplot_titles=list(models_predictions.keys())
    )
    
    # Keep track of the current subplot position
    row, col = 1, 1
    
    for model_name, data in models_predictions.items():
        # Get the predicted probabilities for the "Yes" class
        y_probas_yes = data['y_probas'][:, 1]
        
        # Add a histogram for this model to the subplot grid
        fig.add_trace(
            go.Histogram(x=y_probas_yes, name=model_name, nbinsx=30),
            row=row, col=col
        )
        
        # Move to the next subplot position
        col += 1
        if col > ncols:
            col = 1
            row += 1
            
    # Update the layout for a clean, final appearance
    fig.update_layout(
        title_text='<b>Distribution of Predicted Probabilities for "Yes" Class</b>',
        title_x=0.5,
        showlegend=False, # The subplot titles serve as the legend
        height=300 * nrows, # Adjust height based on number of rows
        width=900
    )
    fig.update_xaxes(title_text="Predicted Probability")
    fig.update_yaxes(title_text="Count")
    
    print("Probability distribution plots generated.")
    fig.show()


--- Generating Visualization 7: Distribution of Predicted Probabilities ---
Probability distribution plots generated.


Visualization 8: Model Correlation Heatmap
Why it's important: This plot is essential for understanding your ensemble models. An ensemble works best when it combines diverse models that make different kinds of errors. This heatmap shows how correlated the predictions of your models are.
High Correlation (dark color): The two models think very similarly. Putting them in an ensemble is redundant and won't help much.
Low Correlation (light color): The two models are making different predictions. Combining them in an ensemble is a great idea, as they can correct each other's mistakes.

In [22]:
# --- Visualization 8 (Interactive): Model Prediction Correlation Heatmap ---

print("\n" + "="*70)
print("--- Generating Visualization 8: Model Prediction Correlation Heatmap ---")
print("="*70)

if len(models_predictions) < 2:
    print("Need at least two models to compare correlations.")
else:
    # Create a DataFrame where each column is the probability predictions from one model
    proba_df = pd.DataFrame({
        model_name: data['y_probas'][:, 1]
        for model_name, data in models_predictions.items()
    })
    
    # Calculate the Pearson correlation matrix
    correlation_matrix = proba_df.corr(method='pearson')
    
    # Create the interactive heatmap with Plotly Express
    fig = px.imshow(
        correlation_matrix,
        text_auto=True, # Display the correlation values on the heatmap
        aspect="auto",
        labels=dict(color="Correlation"),
        title='<b>Correlation of Model Predictions</b>'
    )
    
    fig.update_layout(
        title_x=0.5,
        height=700,
        width=700
    )
    
    print("Correlation heatmap generated.")
    fig.show()


--- Generating Visualization 8: Model Prediction Correlation Heatmap ---
Correlation heatmap generated.


Visualization 9: The Profitability Curve (Cost-Benefit Analysis)
Why it's a different type: This plot transcends standard metrics and translates model performance directly into financial impact. It calculates the net profit you would achieve by targeting different percentages of customers, based on a set of business assumptions (cost per call vs. profit per subscription). It finds the "sweet spot" where profitability is maximized.

In [23]:
# --- Visualization 9 (Interactive): Profitability Curve ---

print("\n" + "="*70)
print("--- Generating Visualization 9: Profitability Curve ---")
print("="*70)

# --- Define Business Assumptions (These can be changed) ---
PROFIT_PER_SUBSCRIPTION = 400  # Hypothetical profit from one successful 'Yes'
COST_PER_CALL = 5              # Hypothetical cost of contacting one customer

def calculate_profit_curve(y_true, y_probas):
    """Calculates the net profit at different targeting thresholds."""
    profits = []
    thresholds = np.linspace(0, 1, 101) # Test 101 different thresholds from 0% to 100%
    
    for threshold in thresholds:
        y_pred = (y_probas >= threshold).astype(int)
        cm = confusion_matrix(y_true, y_pred)
        TN, FP, FN, TP = cm.ravel()
        
        total_profit = (TP * PROFIT_PER_SUBSCRIPTION)
        total_cost = (TP + FP) * COST_PER_CALL
        net_profit = total_profit - total_cost
        profits.append(net_profit)
        
    return thresholds, profits

if not models_predictions:
    print("No models loaded to generate profit curves.")
else:
    fig = go.Figure()
    
    for model_name, data in models_predictions.items():
        thresholds, profits = calculate_profit_curve(y_test, data['y_probas'][:, 1])
        fig.add_trace(go.Scatter(x=thresholds, y=profits, name=model_name, mode='lines'))

    # Update the layout
    fig.update_layout(
        title_text='<b>Profitability Curve: Net Profit vs. Decision Threshold</b>',
        title_x=0.5,
        xaxis_title='Decision Threshold (Probability to Classify as "Yes")',
        yaxis_title='Estimated Net Profit (€)',
        legend_title_text='Models',
        height=700,
        width=900
    )
    
    print("Profitability curve chart generated.")
    fig.show()


--- Generating Visualization 9: Profitability Curve ---
Profitability curve chart generated.


Visualization 10: The "What If" Simulator (Partial Dependence Plot)
Why it's a different type: This is our deepest dive into model interpretability. While feature importance tells us what is important, a Partial Dependence Plot (PDP) shows us how a feature impacts the prediction. It answers "what if" questions, like: "On average, how does the model's prediction of success change as the number of campaign contacts increases from 1 to 10?"

In [24]:
from sklearn.inspection import partial_dependence

# --- Visualization 10 (Interactive): Partial Dependence Plot ---

print("\n" + "="*70)
print("--- Generating Visualization 10: Partial Dependence Plot ---")
print("="*70)

# --- Configuration: Choose a feature to analyze ---
# Let's analyze the effect of the 'campaign' feature (number of contacts)
FEATURE_TO_ANALYZE = 'campaign'

if FEATURE_TO_ANALYZE not in X_test.columns:
    print(f"Error: Feature '{FEATURE_TO_ANALYZE}' not found in the test data columns.")
else:
    fig = go.Figure()

    print(f"Calculating Partial Dependence for feature: '{FEATURE_TO_ANALYZE}' for all models...")
    for model_name, data in models_predictions.items():
        try:
            # Partial dependence requires the model and the training data
            # NOTE: This can be slow for complex models! We use a sample of X_train for speed.
            pdp_result = partial_dependence(
                estimator=data['model'],
                X=X_train.sample(n=1000, random_state=42), # Use a sample for speed
                features=[FEATURE_TO_ANALYZE],
                kind='average'
            )
            
            # The result gives the feature values and the average prediction
            feature_values = pdp_result['values'][0]
            avg_prediction = pdp_result['average'][0]

            fig.add_trace(go.Scatter(x=feature_values, y=avg_prediction, name=model_name, mode='lines+markers'))
            
        except Exception as e:
            print(f"  - Could not compute PDP for {model_name}. It might not be compatible. Error: {e}")

    # Update the layout
    fig.update_layout(
        title_text=f"<b>Partial Dependence Plot: Impact of '{FEATURE_TO_ANALYZE}' on Prediction</b>",
        title_x=0.5,
        xaxis_title=f"Value of '{FEATURE_TO_ANALYZE}'",
        yaxis_title='Average Predicted Probability of "Yes"',
        legend_title_text='Models',
        height=700,
        width=900
    )

    print("Partial Dependence Plot generated.")
    fig.show()


--- Generating Visualization 10: Partial Dependence Plot ---
Error: Feature 'campaign' not found in the test data columns.


Visualization 11: The Fairness & Reliability Check (Error Analysis by Subgroup)
Why it's a different type: This plot tests the model's robustness and fairness. An overall high F1-score is meaningless if the model is heavily biased and only works for certain segments of the population. This visualization breaks down the champion model's performance across different customer job categories to see if there are any hidden weaknesses or biases.

In [25]:
from sklearn.metrics import f1_score

# --- Visualization 11 (Interactive): Error Analysis by Subgroup ---

print("\n" + "="*70)
print("--- Generating Visualization 11: Error Analysis by Subgroup ---")
print("="*70)

try:
    # --- Step 1: Get the champion model's predictions ---
    champion_name = results_df.loc[0, 'Model']
    champion_data = models_predictions[champion_name]
    # Re-calculate predictions using the optimal threshold if available, otherwise 0.5
    optimal_thresh_path = f"models/{champion_name.replace(' ', '_').lower()}/optimal_threshold.joblib"
    if os.path.exists(optimal_thresh_path):
        optimal_threshold = joblib.load(optimal_thresh_path)
        print(f"Using optimal threshold ({optimal_threshold:.2f}) for champion model '{champion_name}'.")
    else:
        optimal_threshold = 0.5
        print(f"Using default threshold (0.5) for champion model '{champion_name}'.")
        
    y_pred_champion = (champion_data['y_probas'][:, 1] >= optimal_threshold).astype(int)

    # --- Step 2: Load original, unprocessed data to get the true 'job' labels ---
    raw_df = pd.read_csv('data/raw/bank-additional-full.csv', sep=';')
    # Align the raw data with our test set using the index
    test_jobs = raw_df.iloc[X_test.index]['job']

    # --- Step 3: Combine into a single DataFrame for analysis ---
    error_analysis_df = pd.DataFrame({
        'job': test_jobs,
        'y_true': y_test,
        'y_pred': y_pred_champion
    }).dropna() # Drop rows where job might be missing

    # --- Step 4: Calculate F1-score for each job category ---
    f1_by_job = error_analysis_df.groupby('job').apply(
        lambda g: f1_score(g['y_true'], g['y_pred'], pos_label=1, zero_division=0)
    ).sort_values(ascending=False)
    
    # --- Step 5: Visualize the results ---
    fig = px.bar(
        f1_by_job,
        x=f1_by_job.index,
        y=f1_by_job.values,
        title=f"<b>Champion Model ('{champion_name}') Performance by Customer Job Type</b>",
        labels={'index': 'Job Category', 'y': 'F1-Score (Yes)'},
        height=600
    )
    fig.update_layout(title_x=0.5, yaxis_range=[0,1])
    
    print("Subgroup analysis plot generated.")
    fig.show()

except Exception as e:
    print(f"Could not perform subgroup analysis. An error occurred: {e}")


--- Generating Visualization 11: Error Analysis by Subgroup ---
Using optimal threshold (0.16) for champion model 'Rf Tuned Smote Optimal Thresh'.
Subgroup analysis plot generated.


## Further Analysis & Next Steps

This notebook provides a comprehensive visual summary. To go even deeper, the following analyses could be performed:

*   **Feature Importance Analysis:** Investigate *which* features the top-performing models (like TabNet and the Tuned Random Forest) found most predictive.
*   **Cumulative Gains / Lift Charts:** Translate model performance into direct business value by showing the percentage of subscribers captured when targeting the top X% of customers.
*   **Error Analysis by Subgroup:** Investigate if the champion model performs equally well across different customer segments (e.g., by `job` or `education`) to check for potential bias.