# Model Training & Evaluation

**Notebook 4 of 4: Customer Churn Prediction Project**

## Objectives
1. Train baseline models (5 algorithms)
2. Compare model performance
3. Handle class imbalance with SMOTE
4. Perform hyperparameter tuning
5. Evaluate final model
6. Analyze business impact

## Prerequisites
This notebook assumes data has been prepared in `03_data_preparation_ml.ipynb`

---


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, confusion_matrix, classification_report,
    roc_curve, auc
)

from imblearn.over_sampling import SMOTE

import warnings
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.max_columns', None)
%matplotlib inline

In [2]:
# Load preprocessed data from previous notebook
# Make sure you've run 03_data_preparation_ml.ipynb first

X_train = pd.read_csv('../data/processed/X_train.csv')
X_val = pd.read_csv('../data/processed/X_val.csv')
X_test = pd.read_csv('../data/processed/X_test.csv')

y_train = pd.read_csv('../data/processed/y_train.csv')['Churn']
y_val = pd.read_csv('../data/processed/y_val.csv')['Churn']
y_test = pd.read_csv('../data/processed/y_test.csv')['Churn']

print(f"  X_train shape: {X_train.shape}")
print(f"  X_val shape: {X_val.shape}")
print(f"  X_test shape: {X_test.shape}")
print(f"  Class distribution (train): {y_train.value_counts().to_dict()}")

  X_train shape: (4225, 32)
  X_val shape: (1409, 32)
  X_test shape: (1409, 32)
  Class distribution (train): {0: 3104, 1: 1121}


# 5-machine-learning-initial-exploration

## Objective
Conduct initial screening of multiple algorithms to identify best candidates for deeper analysis.

## Approach
Test 5 different algorithms in 2 phases:
- **Phase 1:** Baseline models without class balancing
- **Phase 2:** Same models with SMOTE class balancing

## Models Tested
1. Logistic Regression (linear classifier)
2. Decision Tree (non-linear, interpretable)
3. Random Forest (ensemble method)
4. Gradient Boosting (advanced ensemble)
5. K-Nearest Neighbors (instance-based)

## Evaluation Metrics
- **Primary:** Recall (minimize missed churners)
- **Secondary:** F1-Score (balance precision/recall)
- **Additional:** Precision, Accuracy, ROC-AUC

## Goal
Identify top 2 models for systematic comparison and optimization.

In [3]:
# MACHINE LEARNING - PHASE 1: BASELINE MODELS

#Librairies

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier  
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    confusion_matrix,
    classification_report
)

In [4]:
#CLASS IMBALANCE CHECK

#Training set distribution
print(y_train.value_counts())
print(f"\nNo Churn: {(y_train==0).sum()} ({(y_train==0).mean():.1%})")
print(f"Churn:    {(y_train==1).sum()} ({(y_train==1).mean():.1%})")

#Validation set distribution:
print(y_val.value_counts())
print(f"\nNo Churn: {(y_val==0).sum()} ({(y_val==0).mean():.1%})")
print(f"Churn:    {(y_val==1).sum()} ({(y_val==1).mean():.1%})")


Churn
0    3104
1    1121
Name: count, dtype: int64

No Churn: 3104 (73.5%)
Churn:    1121 (26.5%)
Churn
0    1035
1     374
Name: count, dtype: int64

No Churn: 1035 (73.5%)
Churn:    374 (26.5%)


### PHASE 1: BASELINE MODELS (NO BALANCING)

In [5]:
print("\nX_train shape:", X_train.shape)
print("\nData types:")
print(X_train.dtypes.value_counts())

print("\nNon-numeric columns:")
non_numeric = X_train.select_dtypes(include=['object', 'category']).columns.tolist()

if len(non_numeric) > 0:
    print(f"Found {len(non_numeric)} non-numeric columns:")
    for col in non_numeric:
        print(f"   - {col}: {X_train[col].dtype}")
        print(f"     Sample values: {X_train[col].head(3).tolist()}")
else:
    print("All columns are numeric!")


X_train shape: (4225, 32)

Data types:
bool       21
int64       6
float64     3
object      2
Name: count, dtype: int64

Non-numeric columns:
Found 2 non-numeric columns:
   - tenure_group: object
     Sample values: ['Long (>=12)', 'Short (<12)', 'Long (>=12)']
   - charges_group: object
     Sample values: ['Low (0-35)', 'High (70+)', 'High (70+)']


In [6]:
# Drop categorical bins (keep only numerical originals)
columns_to_drop = ['tenure_group', 'charges_group']  # Ajoute autres si nécessaire

X_train = X_train.drop(columns=columns_to_drop, errors='ignore')
X_val = X_val.drop(columns=columns_to_drop, errors='ignore')
X_test = X_test.drop(columns=columns_to_drop, errors='ignore')

print(f"Dropped {len(columns_to_drop)} columns")
print(f"New X_train shape: {X_train.shape}")

Dropped 2 columns
New X_train shape: (4225, 30)


In [7]:
# MODEL 1/4: LOGISTIC REGRESSION

# Create and train model
lr = LogisticRegression(random_state=42, max_iter=1000)

print(f"  Training samples: {len(X_train)}")
print(f"  No Churn: {(y_train==0).sum()} (73.5%)")
print(f"  Churn: {(y_train==1).sum()} (26.5%)")

lr.fit(X_train, y_train)

  Training samples: 4225
  No Churn: 3104 (73.5%)
  Churn: 1121 (26.5%)


In [8]:
# Predictions on validation set
y_pred_lr = lr.predict(X_val)
y_pred_proba_lr = lr.predict_proba(X_val)[:, 1]

In [9]:
# Calculate metrics

acc_lr = accuracy_score(y_val, y_pred_lr)
prec_lr = precision_score(y_val, y_pred_lr)
rec_lr = recall_score(y_val, y_pred_lr)
f1_lr = f1_score(y_val, y_pred_lr)
roc_lr = roc_auc_score(y_val, y_pred_proba_lr)

print(f"\nAccuracy:  {acc_lr:.4f}")
print(f"Precision: {prec_lr:.4f}")
print(f"Recall:    {rec_lr:.4f} (Primary)")
print(f"F1-Score:  {f1_lr:.4f} (Selection)")
print(f"ROC-AUC:   {roc_lr:.4f}")


Accuracy:  0.8055
Precision: 0.6678
Recall:    0.5321 (Primary)
F1-Score:  0.5923 (Selection)
ROC-AUC:   0.8349


In [10]:
# Confusion Matrix
print("\nConfusion Matrix:")
cm_lr = confusion_matrix(y_val, y_pred_lr)
print(cm_lr)

TN, FP, FN, TP = cm_lr.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\n Business Impact:")
print(f"   Correctly caught churners: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"   MISSED churners: {FN} (lost revenue!)")
print(f"   Unnecessary campaigns: {FP}")
print(f"   Correctly kept customers: {TN}")


Confusion Matrix:
[[936  99]
 [175 199]]

                Predicted
              No      Yes
Actual  No    936      99
        Yes   175     199

 Business Impact:
   Correctly caught churners: 199/374 (53.2%)
   MISSED churners: 175 (lost revenue!)
   Unnecessary campaigns: 99
   Correctly kept customers: 936


In [11]:
# Store results for comparison later
results_baseline = []

metrics_lr = {
    'Model': 'Logistic Regression',
    'Accuracy': acc_lr,
    'Precision': prec_lr,
    'Recall': rec_lr,
    'F1-Score': f1_lr,
    'ROC-AUC': roc_lr
}

results_baseline.append(metrics_lr)

In [12]:

# MODEL 2/4: DECISION TREE
# Create and train
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)

In [13]:
# Predictions

y_pred_dt = dt.predict(X_val)
y_pred_proba_dt = dt.predict_proba(X_val)[:, 1]


In [14]:
# Metrics

acc_dt = accuracy_score(y_val, y_pred_dt)
prec_dt = precision_score(y_val, y_pred_dt)
rec_dt = recall_score(y_val, y_pred_dt)
f1_dt = f1_score(y_val, y_pred_dt)
roc_dt = roc_auc_score(y_val, y_pred_proba_dt)

print(f"\nAccuracy:  {acc_dt:.4f}")
print(f"Precision: {prec_dt:.4f}")
print(f"Recall:    {rec_dt:.4f}")
print(f"F1-Score:  {f1_dt:.4f}")
print(f"ROC-AUC:   {roc_dt:.4f}")


Accuracy:  0.7140
Precision: 0.4640
Recall:    0.5000
F1-Score:  0.4813
ROC-AUC:   0.6447


In [15]:
# Confusion Matrix

cm_dt = confusion_matrix(y_val, y_pred_dt)
print(cm_dt)

TN, FP, FN, TP = cm_dt.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\nBusiness Impact:")
print(f"  Correctly caught: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"  MISSED churners: {FN}")


[[819 216]
 [187 187]]

                Predicted
              No      Yes
Actual  No    819     216
        Yes   187     187

Business Impact:
  Correctly caught: 187/374 (50.0%)
  MISSED churners: 187


In [16]:
# Store results
metrics_dt = {
    'Model': 'Decision Tree',
    'Accuracy': acc_dt,
    'Precision': prec_dt,
    'Recall': rec_dt,
    'F1-Score': f1_dt,
    'ROC-AUC': roc_dt
}
results_baseline.append(metrics_dt)

In [17]:
# MODEL 3/4: RANDOM FOREST

# Create and train
rf = RandomForestClassifier(random_state=42, n_estimators=100)
rf.fit(X_train, y_train)

In [18]:
# Predictions

y_pred_rf = rf.predict(X_val)
y_pred_proba_rf = rf.predict_proba(X_val)[:, 1]

In [19]:
# Metrics

acc_rf = accuracy_score(y_val, y_pred_rf)
prec_rf = precision_score(y_val, y_pred_rf)
rec_rf = recall_score(y_val, y_pred_rf)
f1_rf = f1_score(y_val, y_pred_rf)
roc_rf = roc_auc_score(y_val, y_pred_proba_rf)

print(f"\nAccuracy:  {acc_rf:.4f}")
print(f"Precision: {prec_rf:.4f}")
print(f"Recall:    {rec_rf:.4f}")
print(f"F1-Score:  {f1_rf:.4f}")
print(f"ROC-AUC:   {roc_rf:.4f}")


Accuracy:  0.7828
Precision: 0.6241
Recall:    0.4572
F1-Score:  0.5278
ROC-AUC:   0.8097


In [20]:
# Confusion Matrix

cm_rf = confusion_matrix(y_val, y_pred_rf)
print(cm_rf)

TN, FP, FN, TP = cm_rf.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\nBusiness Impact:")
print(f"  Correctly caught: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"  MISSED churners: {FN}")

[[932 103]
 [203 171]]

                Predicted
              No      Yes
Actual  No    932     103
        Yes   203     171

Business Impact:
  Correctly caught: 171/374 (45.7%)
  MISSED churners: 203


In [21]:
# Store results
metrics_rf = {
    'Model': 'Random Forest',
    'Accuracy': acc_rf,
    'Precision': prec_rf,
    'Recall': rec_rf,
    'F1-Score': f1_rf,
    'ROC-AUC': roc_rf
}
results_baseline.append(metrics_rf)

In [22]:
# MODEL 4/4: K-NEAREST NEIGHBORS

# Create and train
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

In [23]:
# Predictions
y_pred_knn = knn.predict(X_val)
y_pred_proba_knn = knn.predict_proba(X_val)[:, 1]

In [24]:
# Metrics

acc_knn = accuracy_score(y_val, y_pred_knn)
prec_knn = precision_score(y_val, y_pred_knn)
rec_knn = recall_score(y_val, y_pred_knn)
f1_knn = f1_score(y_val, y_pred_knn)
roc_knn = roc_auc_score(y_val, y_pred_proba_knn)

print(f"\nAccuracy:  {acc_knn:.4f}")
print(f"Precision: {prec_knn:.4f}")
print(f"Recall:    {rec_knn:.4f}")
print(f"F1-Score:  {f1_knn:.4f}")
print(f"ROC-AUC:   {roc_knn:.4f}")


Accuracy:  0.7601
Precision: 0.5503
Recall:    0.5267
F1-Score:  0.5383
ROC-AUC:   0.7668


In [25]:
# Confusion Matrix
cm_knn = confusion_matrix(y_val, y_pred_knn)
print(cm_knn)

TN, FP, FN, TP = cm_knn.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\nBusiness Impact:")
print(f"  Correctly caught: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"  MISSED churners: {FN}")

[[874 161]
 [177 197]]

                Predicted
              No      Yes
Actual  No    874     161
        Yes   177     197

Business Impact:
  Correctly caught: 197/374 (52.7%)
  MISSED churners: 177


In [26]:
# Store results
metrics_knn = {
    'Model': 'KNN',
    'Accuracy': acc_knn,
    'Precision': prec_knn,
    'Recall': rec_knn,
    'F1-Score': f1_knn,
    'ROC-AUC': roc_knn
}
results_baseline.append(metrics_knn)

In [27]:
# PHASE 1: BASELINE MODELS COMPARISON

import pandas as pd

# Create comparison DataFrame
results_df = pd.DataFrame(results_baseline)
results_df = results_df.sort_values('F1-Score', ascending=False)

print(results_df.to_string(index=False))

              Model  Accuracy  Precision   Recall  F1-Score  ROC-AUC
Logistic Regression  0.805536   0.667785 0.532086  0.592262 0.834854
                KNN  0.760114   0.550279 0.526738  0.538251 0.766805
      Random Forest  0.782825   0.624088 0.457219  0.527778 0.809670
      Decision Tree  0.713982   0.464020 0.500000  0.481338 0.644686


In [28]:
# Best model

best = results_df.iloc[0]
print(f"Model:     {best['Model']}")
print(f"F1-Score:  {best['F1-Score']:.4f}")
print(f"Recall:    {best['Recall']:.4f}")
print(f"Precision: {best['Precision']:.4f}")
print(f"ROC-AUC:   {best['ROC-AUC']:.4f}")

Model:     Logistic Regression
F1-Score:  0.5923
Recall:    0.5321
Precision: 0.6678
ROC-AUC:   0.8349


### PHASE 2: CLASS BALANCING WITH SMOTE

In [29]:
# Apply SMOTE
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

print(f"   No Churn: {(y_train_balanced==0).sum()} ({(y_train_balanced==0).mean():.1%})")
print(f"   Churn:    {(y_train_balanced==1).sum()} ({(y_train_balanced==1).mean():.1%})")
print(f"   Ratio: {(y_train_balanced==0).sum()/(y_train_balanced==1).sum():.2f}:1")

   No Churn: 3104 (50.0%)
   Churn:    3104 (50.0%)
   Ratio: 1.00:1


In [30]:
# PHASE 2 - MODEL 1: LOGISTIC REGRESSION + SMOTE

# Train on BALANCED data
lr_smote = LogisticRegression(random_state=42, max_iter=1000)
lr_smote.fit(X_train_balanced, y_train_balanced)

In [31]:
# Predict on ORIGINAL imbalanced validation
y_pred_lr_smote = lr_smote.predict(X_val)
y_pred_proba_lr_smote = lr_smote.predict_proba(X_val)[:, 1]

In [32]:
# Metrics

acc_lr_smote = accuracy_score(y_val, y_pred_lr_smote)
prec_lr_smote = precision_score(y_val, y_pred_lr_smote)
rec_lr_smote = recall_score(y_val, y_pred_lr_smote)
f1_lr_smote = f1_score(y_val, y_pred_lr_smote)
roc_lr_smote = roc_auc_score(y_val, y_pred_proba_lr_smote)

print(f"\nAccuracy:  {acc_lr_smote:.4f}")
print(f"Precision: {prec_lr_smote:.4f}")
print(f"Recall:    {rec_lr_smote:.4f} ")
print(f"F1-Score:  {f1_lr_smote:.4f} ")
print(f"ROC-AUC:   {roc_lr_smote:.4f}")


Accuracy:  0.7715
Precision: 0.5563
Recall:    0.6872 
F1-Score:  0.6148 
ROC-AUC:   0.8274


In [33]:
# Confusion Matrix
print("\nConfusion Matrix:")
cm_lr_smote = confusion_matrix(y_val, y_pred_lr_smote)
print(cm_lr_smote)

TN, FP, FN, TP = cm_lr_smote.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\nBusiness Impact:")
print(f"  Correctly caught: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"  MISSED churners: {FN}")


Confusion Matrix:
[[830 205]
 [117 257]]

                Predicted
              No      Yes
Actual  No    830     205
        Yes   117     257

Business Impact:
  Correctly caught: 257/374 (68.7%)
  MISSED churners: 117


In [34]:
print("COMPARISON: Phase 1 vs Phase 2")

print("\nLogistic Regression - WITHOUT SMOTE:")
print(f"  Recall:    {rec_lr:.4f} (53.21%)")
print(f"  F1-Score:  {f1_lr:.4f}")
print(f"  Missed:    175 churners")

print("\nLogistic Regression - WITH SMOTE:")
print(f"  Recall:    {rec_lr_smote:.4f}")
print(f"  F1-Score:  {f1_lr_smote:.4f}")
print(f"  Missed:    {FN} churners")

print("\n Improvement:")
print(f"  Recall:    {rec_lr_smote-rec_lr:+.4f} ({(rec_lr_smote-rec_lr)*100:+.1f} points)")
print(f"  F1-Score:  {f1_lr_smote-f1_lr:+.4f}")
print(f"  Missed:    {FN-175:+d} churners")

COMPARISON: Phase 1 vs Phase 2

Logistic Regression - WITHOUT SMOTE:
  Recall:    0.5321 (53.21%)
  F1-Score:  0.5923
  Missed:    175 churners

Logistic Regression - WITH SMOTE:
  Recall:    0.6872
  F1-Score:  0.6148
  Missed:    117 churners

 Improvement:
  Recall:    +0.1551 (+15.5 points)
  F1-Score:  +0.0226
  Missed:    -58 churners


In [35]:
# Store results for Phase 2
results_smote = []

metrics_lr_smote = {
    'Model': 'Logistic Regression',
    'Accuracy': acc_lr_smote,
    'Precision': prec_lr_smote,
    'Recall': rec_lr_smote,
    'F1-Score': f1_lr_smote,
    'ROC-AUC': roc_lr_smote
}
results_smote.append(metrics_lr_smote)

In [36]:
# PHASE 2 - MODEL 2: DECISION TREE + SMOTE

# Train on BALANCED data
dt_smote = DecisionTreeClassifier(random_state=42)
dt_smote.fit(X_train_balanced, y_train_balanced)


In [37]:
# Predict
y_pred_dt_smote = dt_smote.predict(X_val)
y_pred_proba_dt_smote = dt_smote.predict_proba(X_val)[:, 1]

In [38]:
# Metrics
acc_dt_smote = accuracy_score(y_val, y_pred_dt_smote)
prec_dt_smote = precision_score(y_val, y_pred_dt_smote)
rec_dt_smote = recall_score(y_val, y_pred_dt_smote)
f1_dt_smote = f1_score(y_val, y_pred_dt_smote)
roc_dt_smote = roc_auc_score(y_val, y_pred_proba_dt_smote)

print(f"\nAccuracy:  {acc_dt_smote:.4f}")
print(f"Precision: {prec_dt_smote:.4f}")
print(f"Recall:    {rec_dt_smote:.4f} ")
print(f"F1-Score:  {f1_dt_smote:.4f} ")
print(f"ROC-AUC:   {roc_dt_smote:.4f}")


Accuracy:  0.7175
Precision: 0.4733
Recall:    0.5695 
F1-Score:  0.5170 
ROC-AUC:   0.6699


In [39]:
# Comparison
print(f"  Phase 1 Recall: {rec_dt:.4f}")
print(f"  Phase 2 Recall: {rec_dt_smote:.4f}")
print(f"  Improvement:    {rec_dt_smote-rec_dt:+.4f} ({(rec_dt_smote-rec_dt)*100:+.1f} points)")

# Store
metrics_dt_smote = {
    'Model': 'Decision Tree',
    'Accuracy': acc_dt_smote,
    'Precision': prec_dt_smote,
    'Recall': rec_dt_smote,
    'F1-Score': f1_dt_smote,
    'ROC-AUC': roc_dt_smote
}
results_smote.append(metrics_dt_smote)

  Phase 1 Recall: 0.5000
  Phase 2 Recall: 0.5695
  Improvement:    +0.0695 (+7.0 points)


In [40]:
# PHASE 2 - MODEL 3: RANDOM FOREST + SMOTE

# Train on BALANCED data
rf_smote = RandomForestClassifier(random_state=42, n_estimators=100)
rf_smote.fit(X_train_balanced, y_train_balanced)

In [41]:
# Predict
y_pred_rf_smote = rf_smote.predict(X_val)
y_pred_proba_rf_smote = rf_smote.predict_proba(X_val)[:, 1]

# Metrics
acc_rf_smote = accuracy_score(y_val, y_pred_rf_smote)
prec_rf_smote = precision_score(y_val, y_pred_rf_smote)
rec_rf_smote = recall_score(y_val, y_pred_rf_smote)
f1_rf_smote = f1_score(y_val, y_pred_rf_smote)
roc_rf_smote = roc_auc_score(y_val, y_pred_proba_rf_smote)

print(f"\nAccuracy:  {acc_rf_smote:.4f}")
print(f"Precision: {prec_rf_smote:.4f}")
print(f"Recall:    {rec_rf_smote:.4f}")
print(f"F1-Score:  {f1_rf_smote:.4f}")
print(f"ROC-AUC:   {roc_rf_smote:.4f}")


Accuracy:  0.7601
Precision: 0.5446
Recall:    0.5882
F1-Score:  0.5656
ROC-AUC:   0.8007


In [42]:
# Comparison
print("\nComparison:")
print(f"  Phase 1 Recall: {rec_rf:.4f}")
print(f"  Phase 2 Recall: {rec_rf_smote:.4f}")
print(f"  Improvement:    {rec_rf_smote-rec_rf:+.4f} ({(rec_rf_smote-rec_rf)*100:+.1f} points)")

# Store
metrics_rf_smote = {
    'Model': 'Random Forest',
    'Accuracy': acc_rf_smote,
    'Precision': prec_rf_smote,
    'Recall': rec_rf_smote,
    'F1-Score': f1_rf_smote,
    'ROC-AUC': roc_rf_smote
}
results_smote.append(metrics_rf_smote)


Comparison:
  Phase 1 Recall: 0.4572
  Phase 2 Recall: 0.5882
  Improvement:    +0.1310 (+13.1 points)


In [43]:
# PHASE 2 - MODEL 4: KNN + SMOTE

# Train on BALANCED data
knn_smote = KNeighborsClassifier(n_neighbors=5)
knn_smote.fit(X_train_balanced, y_train_balanced)

In [44]:
# Predict
y_pred_knn_smote = knn_smote.predict(X_val)
y_pred_proba_knn_smote = knn_smote.predict_proba(X_val)[:, 1]

# Metrics
acc_knn_smote = accuracy_score(y_val, y_pred_knn_smote)
prec_knn_smote = precision_score(y_val, y_pred_knn_smote)
rec_knn_smote = recall_score(y_val, y_pred_knn_smote)
f1_knn_smote = f1_score(y_val, y_pred_knn_smote)
roc_knn_smote = roc_auc_score(y_val, y_pred_proba_knn_smote)

print(f"\nAccuracy:  {acc_knn_smote:.4f}")
print(f"Precision: {prec_knn_smote:.4f}")
print(f"Recall:    {rec_knn_smote:.4f}")
print(f"F1-Score:  {f1_knn_smote:.4f}")
print(f"ROC-AUC:   {roc_knn_smote:.4f}")


Accuracy:  0.7069
Precision: 0.4651
Recall:    0.6952
F1-Score:  0.5573
ROC-AUC:   0.7683


In [45]:
# Comparison
print(f"  Phase 1 Recall: {rec_knn:.4f}")
print(f"  Phase 2 Recall: {rec_knn_smote:.4f}")
print(f"  Improvement:    {rec_knn_smote-rec_knn:+.4f} ({(rec_knn_smote-rec_knn)*100:+.1f} points)")

# Store
metrics_knn_smote = {
    'Model': 'KNN',
    'Accuracy': acc_knn_smote,
    'Precision': prec_knn_smote,
    'Recall': rec_knn_smote,
    'F1-Score': f1_knn_smote,
    'ROC-AUC': roc_knn_smote
}
results_smote.append(metrics_knn_smote)

  Phase 1 Recall: 0.5267
  Phase 2 Recall: 0.6952
  Improvement:    +0.1684 (+16.8 points)


In [46]:
# FINAL COMPARISON: PHASE 1 vs PHASE 2


import pandas as pd

# Phase 1 results
results_df_phase1 = pd.DataFrame(results_baseline)
results_df_phase1 = results_df_phase1.sort_values('Recall', ascending=False)
print(results_df_phase1.to_string(index=False))

# Phase 2 results
results_df_phase2 = pd.DataFrame(results_smote)
results_df_phase2 = results_df_phase2.sort_values('Recall', ascending=False)
print(results_df_phase2.to_string(index=False))

# Improvement summary

improvements = [
    ('Logistic Regression', rec_lr, rec_lr_smote, rec_lr_smote - rec_lr),
    ('KNN', rec_knn, rec_knn_smote, rec_knn_smote - rec_knn),
    ('Random Forest', rec_rf, rec_rf_smote, rec_rf_smote - rec_rf),
    ('Decision Tree', rec_dt, rec_dt_smote, rec_dt_smote - rec_dt)
]

improvements.sort(key=lambda x: x[3], reverse=True)

print(f"\n{'Model':<20} {'Phase 1':>10} {'Phase 2':>10} {'Improvement':>12}")
print("-" * 60)
for model, p1, p2, imp in improvements:
    print(f"{model:<20} {p1:>9.2%} {p2:>9.2%} {imp:>11.1%} ({imp*100:+.1f})")

# Best model overall
best = results_df_phase2.iloc[0]
print(f"Model:     {best['Model']}")
print(f"Recall:    {best['Recall']:.4f}")
print(f"F1-Score:  {best['F1-Score']:.4f}")
print(f"Precision: {best['Precision']:.4f}")
print(f"ROC-AUC:   {best['ROC-AUC']:.4f}")

              Model  Accuracy  Precision   Recall  F1-Score  ROC-AUC
Logistic Regression  0.805536   0.667785 0.532086  0.592262 0.834854
                KNN  0.760114   0.550279 0.526738  0.538251 0.766805
      Decision Tree  0.713982   0.464020 0.500000  0.481338 0.644686
      Random Forest  0.782825   0.624088 0.457219  0.527778 0.809670
              Model  Accuracy  Precision   Recall  F1-Score  ROC-AUC
                KNN  0.706884   0.465116 0.695187  0.557342 0.768316
Logistic Regression  0.771469   0.556277 0.687166  0.614833 0.827429
      Random Forest  0.760114   0.544554 0.588235  0.565553 0.800698
      Decision Tree  0.717530   0.473333 0.569519  0.516990 0.669851

Model                   Phase 1    Phase 2  Improvement
------------------------------------------------------------
KNN                     52.67%    69.52%       16.8% (+16.8)
Logistic Regression     53.21%    68.72%       15.5% (+15.5)
Random Forest           45.72%    58.82%       13.1% (+13.1)
Decision 

In [47]:
from sklearn.ensemble import GradientBoostingClassifier

In [48]:
# MODEL 5: GRADIENT BOOSTING (PHASE 1)

# Train
gb = GradientBoostingClassifier(random_state=42, n_estimators=100)
gb.fit(X_train, y_train)

# Predict
y_pred_gb = gb.predict(X_val)
y_pred_proba_gb = gb.predict_proba(X_val)[:, 1]

# Metrics
acc_gb = accuracy_score(y_val, y_pred_gb)
prec_gb = precision_score(y_val, y_pred_gb)
rec_gb = recall_score(y_val, y_pred_gb)
f1_gb = f1_score(y_val, y_pred_gb)
roc_gb = roc_auc_score(y_val, y_pred_proba_gb)

print(f"\nAccuracy:  {acc_gb:.4f}")
print(f"Precision: {prec_gb:.4f}")
print(f"Recall:    {rec_gb:.4f}")
print(f"F1-Score:  {f1_gb:.4f}")
print(f"ROC-AUC:   {roc_gb:.4f}")

# Confusion Matrix
cm_gb = confusion_matrix(y_val, y_pred_gb)
print(cm_gb)
TN, FP, FN, TP = cm_gb.ravel()
print(f"  Correctly caught: {TP}/{TP+FN} ({TP/(TP+FN):.1%})")
print(f"  MISSED churners: {FN}")

# Store
metrics_gb = {
    'Model': 'Gradient Boosting',
    'Accuracy': acc_gb,
    'Precision': prec_gb,
    'Recall': rec_gb,
    'F1-Score': f1_gb,
    'ROC-AUC': roc_gb
}
results_baseline.append(metrics_gb)



Accuracy:  0.8034
Precision: 0.6803
Recall:    0.4893
F1-Score:  0.5692
ROC-AUC:   0.8387
[[949  86]
 [191 183]]
  Correctly caught: 183/374 (48.9%)
  MISSED churners: 191


In [49]:
# GRADIENT BOOSTING + SMOTE (PHASE 2)

# Train on BALANCED data
gb_smote = GradientBoostingClassifier(random_state=42, n_estimators=100)
gb_smote.fit(X_train_balanced, y_train_balanced)

# Predict on IMBALANCED validation
y_pred_gb_smote = gb_smote.predict(X_val)
y_pred_proba_gb_smote = gb_smote.predict_proba(X_val)[:, 1]

# Metrics
acc_gb_smote = accuracy_score(y_val, y_pred_gb_smote)
prec_gb_smote = precision_score(y_val, y_pred_gb_smote)
rec_gb_smote = recall_score(y_val, y_pred_gb_smote)
f1_gb_smote = f1_score(y_val, y_pred_gb_smote)
roc_gb_smote = roc_auc_score(y_val, y_pred_proba_gb_smote)

print(f"\nAccuracy:  {acc_gb_smote:.4f}")
print(f"Precision: {prec_gb_smote:.4f}")
print(f"Recall:    {rec_gb_smote:.4f} ")
print(f"F1-Score:  {f1_gb_smote:.4f} ")
print(f"ROC-AUC:   {roc_gb_smote:.4f}")

# Comparison
print("\n Comparison:")
print(f"  Phase 1 Recall: {rec_gb:.4f}")
print(f"  Phase 2 Recall: {rec_gb_smote:.4f}")
print(f"  Improvement:    {rec_gb_smote-rec_gb:+.4f} ({(rec_gb_smote-rec_gb)*100:+.1f} points)")

# Store
metrics_gb_smote = {
    'Model': 'Gradient Boosting',
    'Accuracy': acc_gb_smote,
    'Precision': prec_gb_smote,
    'Recall': rec_gb_smote,
    'F1-Score': f1_gb_smote,
    'ROC-AUC': roc_gb_smote
}
results_smote.append(metrics_gb_smote)


Accuracy:  0.7566
Precision: 0.5308
Recall:    0.7139 
F1-Score:  0.6089 
ROC-AUC:   0.8316

 Comparison:
  Phase 1 Recall: 0.4893
  Phase 2 Recall: 0.7139
  Improvement:    +0.2246 (+22.5 points)


In [50]:
# FINAL RANKING WITH GRADIENT BOOSTING

# Update results
results_df_phase2_updated = pd.DataFrame(results_smote)
results_df_phase2_updated = results_df_phase2_updated.sort_values('F1-Score', ascending=False)

print(results_df_phase2_updated.to_string(index=False))

# Recall ranking
results_df_recall = results_df_phase2_updated.sort_values('Recall', ascending=False)
print(results_df_recall[['Model', 'Recall', 'F1-Score', 'Precision']].to_string(index=False))

# All improvements

improvements_all = [
    ('Gradient Boosting', rec_gb, rec_gb_smote, rec_gb_smote - rec_gb),
    ('KNN', rec_knn, rec_knn_smote, rec_knn_smote - rec_knn),
    ('Logistic Regression', rec_lr, rec_lr_smote, rec_lr_smote - rec_lr),
    ('Random Forest', rec_rf, rec_rf_smote, rec_rf_smote - rec_rf),
    ('Decision Tree', rec_dt, rec_dt_smote, rec_dt_smote - rec_dt)
]

improvements_all.sort(key=lambda x: x[3], reverse=True)

print(f"\n{'Model':<22} {'Phase 1':>10} {'Phase 2':>10} {'Improvement':>12}")
for model, p1, p2, imp in improvements_all:
    print(f"{model:<22} {p1:>9.2%} {p2:>9.2%} {imp:>11.1%} ({imp*100:+.1f})")

# Best model
best_f1 = results_df_phase2_updated.iloc[0]
best_recall = results_df_recall.iloc[0]

print(f"\n Best F1-Score: {best_f1['Model']}")
print(f"   F1-Score:  {best_f1['F1-Score']:.4f}")
print(f"   Recall:    {best_f1['Recall']:.4f}")
print(f"   Precision: {best_f1['Precision']:.4f}")

print(f"\n Best Recall: {best_recall['Model']}")
print(f"   Recall:    {best_recall['Recall']:.4f}")
print(f"   F1-Score:  {best_recall['F1-Score']:.4f}")
print(f"   Precision: {best_recall['Precision']:.4f}")

              Model  Accuracy  Precision   Recall  F1-Score  ROC-AUC
Logistic Regression  0.771469   0.556277 0.687166  0.614833 0.827429
  Gradient Boosting  0.756565   0.530815 0.713904  0.608894 0.831581
      Random Forest  0.760114   0.544554 0.588235  0.565553 0.800698
                KNN  0.706884   0.465116 0.695187  0.557342 0.768316
      Decision Tree  0.717530   0.473333 0.569519  0.516990 0.669851
              Model   Recall  F1-Score  Precision
  Gradient Boosting 0.713904  0.608894   0.530815
                KNN 0.695187  0.557342   0.465116
Logistic Regression 0.687166  0.614833   0.556277
      Random Forest 0.588235  0.565553   0.544554
      Decision Tree 0.569519  0.516990   0.473333

Model                     Phase 1    Phase 2  Improvement
Gradient Boosting         48.93%    71.39%       22.5% (+22.5)
KNN                       52.67%    69.52%       16.8% (+16.8)
Logistic Regression       53.21%    68.72%       15.5% (+15.5)
Random Forest             45.72%    58

## Initial Exploration - Key Findings

### Phase 1: Baseline Performance (No Balancing)
**Best Model:** Logistic Regression
- Recall: 53.21%
- F1-Score: 59.23%

**Challenge Identified:** All models struggle with low recall due to class imbalance (73/27 split)

### Phase 2: With SMOTE Class Balancing
**MASSIVE IMPROVEMENT across all models!**

| Model | Phase 1 Recall | Phase 2 Recall | Improvement |
|-------|----------------|----------------|-------------|
| Gradient Boosting | 48.93% | 72.46% | +23.5 points |
| Logistic Regression | 53.21% | 68.72% | +15.5 points |
| KNN | 52.67% | 69.52% | +16.8 points |
| Random Forest | 45.72% | 57.49% | +11.8 points |
| Decision Tree | 50.00% | 56.15% | +6.1 points |

### Champion Candidates Identified
1. **Gradient Boosting + SMOTE:** F1: 62.16%, Recall: 72.46%
2. **Logistic Regression + SMOTE:** F1: 61.48%, Recall: 68.72%

### Critical Insight
**Class balancing (SMOTE) is the key driver of performance improvement**, not model choice alone.

**Next Step:** Systematic comparison of top 2 models with rigorous optimization.

# 6-systematic-model-comparison

## Objective
Rigorously compare Gradient Boosting vs Logistic Regression to understand what truly drives performance improvement.

## Methodology
Test 4 configurations for each model to isolate the impact of:
1. **Hyperparameter tuning**
2. **Class balancing (SMOTE)**
3. **Decision threshold optimization**

## Configuration Testing

### Configuration 1: Baseline
- Hyperparameters: Default
- Class balancing: No
- Threshold: 0.5

### Configuration 2: Hyperparameter Tuning
- Hyperparameters: Optimized via GridSearchCV
- Class balancing: No
- Threshold: 0.5

### Configuration 3: Tuned + SMOTE
- Hyperparameters: Optimized
- Class balancing: Yes (SMOTE on train set only)
- Threshold: 0.5

### Configuration 4: All Optimized
- Hyperparameters: Optimized
- Class balancing: Yes (SMOTE)
- Threshold: Optimized via TunedThresholdClassifierCV

## Models Compared
1. **Gradient Boosting** (complex, ensemble)
2. **Logistic Regression** (simple, interpretable)


In [51]:
# CONFIG 1: BASELINE (Default + No Balancing)
print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_gb:.4f}")
print(f"   Precision: {prec_gb:.4f}")
print(f"   Recall:    {rec_gb:.4f}")
print(f"   F1-Score:  {f1_gb:.4f}")
print(f"   ROC-AUC:   {roc_gb:.4f}")

cm = confusion_matrix(y_val, y_pred_gb)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Store for comparison
results_systematic = []
results_systematic.append({
    'Config': '1. Baseline',
    'Hyperparams': 'Default',
    'Balancing': 'No',
    'Threshold': '0.5',
    'Recall': rec_gb,
    'F1-Score': f1_gb,
    'Precision': prec_gb
})


   Validation Results:
   Accuracy:  0.8034
   Precision: 0.6803
   Recall:    0.4893
   F1-Score:  0.5692
   ROC-AUC:   0.8387

   Business Impact:
   Detected: 183/374 churners (48.9%)
   Missed: 191 churners


In [52]:
# CONFIG 2: TUNED HYPERPARAMETERS (No Balancing)

from sklearn.model_selection import GridSearchCV

# Reduced parameters to avoid overfitting (professor's advice)
param_grid = {
    'n_estimators': [50, 100],       
    'learning_rate': [0.05, 0.1],
    'max_depth': [3, 4]              
}

print(f"\n   Parameter Grid:")
print(f"   n_estimators: {param_grid['n_estimators']}")
print(f"   learning_rate: {param_grid['learning_rate']}")
print(f"   max_depth: {param_grid['max_depth']}")

# Grid Search
grid_gb_c2 = GridSearchCV(
    GradientBoostingClassifier(random_state=42),
    param_grid,
    cv=3,
    scoring='f1',
    verbose=1
)

grid_gb_c2.fit(X_train, y_train)

print(f"\n   Best Parameters:")
for param, value in grid_gb_c2.best_params_.items():
    print(f"   {param}: {value}")

print(f"\n   Best CV F1-Score: {grid_gb_c2.best_score_:.4f}")

# Get best model
gb_c2 = grid_gb_c2.best_estimator_

# Predict on validation
y_pred_gb_c2 = gb_c2.predict(X_val)
y_proba_gb_c2 = gb_c2.predict_proba(X_val)[:, 1]

# Metrics
acc_gb_c2 = accuracy_score(y_val, y_pred_gb_c2)
prec_gb_c2 = precision_score(y_val, y_pred_gb_c2)
rec_gb_c2 = recall_score(y_val, y_pred_gb_c2)
f1_gb_c2 = f1_score(y_val, y_pred_gb_c2)
roc_gb_c2 = roc_auc_score(y_val, y_proba_gb_c2)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_gb_c2:.4f}")
print(f"   Precision: {prec_gb_c2:.4f}")
print(f"   Recall:    {rec_gb_c2:.4f}")
print(f"   F1-Score:  {f1_gb_c2:.4f}")
print(f"   ROC-AUC:   {roc_gb_c2:.4f}")

# Overfitting check
print(f"\n   Overfitting Check:")
print(f"   CV F1:  {grid_gb_c2.best_score_:.4f}")
print(f"   Val F1: {f1_gb_c2:.4f}")
print(f"   Gap:    {grid_gb_c2.best_score_ - f1_gb_c2:.4f}")
if grid_gb_c2.best_score_ - f1_gb_c2 > 0.15:
    print("Possible overfitting!")
else:
    print("Good generalization")

cm = confusion_matrix(y_val, y_pred_gb_c2)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Compare with Config 1
print(f"\n   Impact of Hyperparameter Tuning:")
print(f"   Recall: {rec_gb:.4f} → {rec_gb_c2:.4f} ({rec_gb_c2-rec_gb:+.4f})")
print(f"   F1: {f1_gb:.4f} → {f1_gb_c2:.4f} ({f1_gb_c2-f1_gb:+.4f})")

# Store
results_systematic.append({
    'Config': '2. Tuned',
    'Hyperparams': 'Tuned',
    'Balancing': 'No',
    'Threshold': '0.5',
    'Recall': rec_gb_c2,
    'F1-Score': f1_gb_c2,
    'Precision': prec_gb_c2
})


   Parameter Grid:
   n_estimators: [50, 100]
   learning_rate: [0.05, 0.1]
   max_depth: [3, 4]
Fitting 3 folds for each of 8 candidates, totalling 24 fits

   Best Parameters:
   learning_rate: 0.1
   max_depth: 3
   n_estimators: 100

   Best CV F1-Score: 0.5836

   Validation Results:
   Accuracy:  0.8034
   Precision: 0.6803
   Recall:    0.4893
   F1-Score:  0.5692
   ROC-AUC:   0.8387

   Overfitting Check:
   CV F1:  0.5836
   Val F1: 0.5692
   Gap:    0.0144
Good generalization

   Business Impact:
   Detected: 183/374 churners (48.9%)
   Missed: 191 churners

   Impact of Hyperparameter Tuning:
   Recall: 0.4893 → 0.4893 (+0.0000)
   F1: 0.5692 → 0.5692 (+0.0000)


In [53]:
# CONFIG 3: TUNED + SMOTE

# Train with best hyperparameters on BALANCED data
gb_c3 = GradientBoostingClassifier(**grid_gb_c2.best_params_, random_state=42)
gb_c3.fit(X_train_balanced, y_train_balanced)

# Predict on IMBALANCED validation
y_pred_gb_c3 = gb_c3.predict(X_val)
y_proba_gb_c3 = gb_c3.predict_proba(X_val)[:, 1]

# Metrics
acc_gb_c3 = accuracy_score(y_val, y_pred_gb_c3)
prec_gb_c3 = precision_score(y_val, y_pred_gb_c3)
rec_gb_c3 = recall_score(y_val, y_pred_gb_c3)
f1_gb_c3 = f1_score(y_val, y_pred_gb_c3)
roc_gb_c3 = roc_auc_score(y_val, y_proba_gb_c3)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_gb_c3:.4f}")
print(f"   Precision: {prec_gb_c3:.4f}")
print(f"   Recall:    {rec_gb_c3:.4f}")
print(f"   F1-Score:  {f1_gb_c3:.4f}")
print(f"   ROC-AUC:   {roc_gb_c3:.4f}")

cm = confusion_matrix(y_val, y_pred_gb_c3)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Compare with Config 2
print(f"\n   Impact of Class Balancing (SMOTE):")
print(f"   Recall: {rec_gb_c2:.4f} → {rec_gb_c3:.4f} ({rec_gb_c3-rec_gb_c2:+.4f})")
print(f"   F1: {f1_gb_c2:.4f} → {f1_gb_c3:.4f} ({f1_gb_c3-f1_gb_c2:+.4f})")

# Store
results_systematic.append({
    'Config': '3. Tuned+SMOTE',
    'Hyperparams': 'Tuned',
    'Balancing': 'Yes',
    'Threshold': '0.5',
    'Recall': rec_gb_c3,
    'F1-Score': f1_gb_c3,
    'Precision': prec_gb_c3
})


   Validation Results:
   Accuracy:  0.7566
   Precision: 0.5308
   Recall:    0.7139
   F1-Score:  0.6089
   ROC-AUC:   0.8316

   Business Impact:
   Detected: 267/374 churners (71.4%)
   Missed: 107 churners

   Impact of Class Balancing (SMOTE):
   Recall: 0.4893 → 0.7139 (+0.2246)
   F1: 0.5692 → 0.6089 (+0.0397)


In [54]:
# CONFIG 2: TUNED HYPERPARAMETERS (No Balancing)

from sklearn.model_selection import GridSearchCV

# Reduced parameters to avoid overfitting (professor's advice)
param_grid = {
    'n_estimators': [50, 100],        
    'learning_rate': [0.05, 0.1],
    'max_depth': [3, 4]             
}

print(f"\n   Parameter Grid:")
print(f"   n_estimators: {param_grid['n_estimators']}")
print(f"   learning_rate: {param_grid['learning_rate']}")
print(f"   max_depth: {param_grid['max_depth']}")

# Grid Search
grid_gb_c2 = GridSearchCV(
    GradientBoostingClassifier(random_state=42),
    param_grid,
    cv=3,
    scoring='f1',
    verbose=1
)

grid_gb_c2.fit(X_train, y_train)

print(f"\n   Best Parameters Found:")
for param, value in grid_gb_c2.best_params_.items():
    print(f"   {param}: {value}")

print(f"\n   Best CV F1-Score: {grid_gb_c2.best_score_:.4f}")

# Get best model
gb_c2 = grid_gb_c2.best_estimator_

# Predict on validation
y_pred_gb_c2 = gb_c2.predict(X_val)
y_proba_gb_c2 = gb_c2.predict_proba(X_val)[:, 1]

# Metrics
acc_gb_c2 = accuracy_score(y_val, y_pred_gb_c2)
prec_gb_c2 = precision_score(y_val, y_pred_gb_c2)
rec_gb_c2 = recall_score(y_val, y_pred_gb_c2)
f1_gb_c2 = f1_score(y_val, y_pred_gb_c2)
roc_gb_c2 = roc_auc_score(y_val, y_proba_gb_c2)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_gb_c2:.4f}")
print(f"   Precision: {prec_gb_c2:.4f}")
print(f"   Recall:    {rec_gb_c2:.4f}")
print(f"   F1-Score:  {f1_gb_c2:.4f}")
print(f"   ROC-AUC:   {roc_gb_c2:.4f}")

# Overfitting check
print(f"\n   Overfitting Check:")
print(f"   CV F1:  {grid_gb_c2.best_score_:.4f}")
print(f"   Val F1: {f1_gb_c2:.4f}")
print(f"   Gap:    {grid_gb_c2.best_score_ - f1_gb_c2:.4f}")
if grid_gb_c2.best_score_ - f1_gb_c2 > 0.15:
    print("Possible overfitting!")
else:
    print("Good generalization")

cm = confusion_matrix(y_val, y_pred_gb_c2)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Compare with Config 1
print(f"\n   Impact of Hyperparameter Tuning:")
print(f"   Recall: {rec_gb:.4f} → {rec_gb_c2:.4f} ({rec_gb_c2-rec_gb:+.4f})")
print(f"   F1: {f1_gb:.4f} → {f1_gb_c2:.4f} ({f1_gb_c2-f1_gb:+.4f})")

# Store
results_systematic.append({
    'Config': '2. Tuned',
    'Hyperparams': 'Tuned',
    'Balancing': 'No',
    'Threshold': '0.5',
    'Recall': rec_gb_c2,
    'F1-Score': f1_gb_c2,
    'Precision': prec_gb_c2
})



   Parameter Grid:
   n_estimators: [50, 100]
   learning_rate: [0.05, 0.1]
   max_depth: [3, 4]
Fitting 3 folds for each of 8 candidates, totalling 24 fits

   Best Parameters Found:
   learning_rate: 0.1
   max_depth: 3
   n_estimators: 100

   Best CV F1-Score: 0.5836

   Validation Results:
   Accuracy:  0.8034
   Precision: 0.6803
   Recall:    0.4893
   F1-Score:  0.5692
   ROC-AUC:   0.8387

   Overfitting Check:
   CV F1:  0.5836
   Val F1: 0.5692
   Gap:    0.0144
Good generalization

   Business Impact:
   Detected: 183/374 churners (48.9%)
   Missed: 191 churners

   Impact of Hyperparameter Tuning:
   Recall: 0.4893 → 0.4893 (+0.0000)
   F1: 0.5692 → 0.5692 (+0.0000)


In [55]:
# CONFIG 4: TUNED + SMOTE + THRESHOLD TUNING

from sklearn.model_selection import TunedThresholdClassifierCV

# Tune threshold to maximize F1-Score (balance Recall/Precision)
tuned_threshold_model = TunedThresholdClassifierCV(
    estimator=gb_c3,
    scoring='f1',      
    cv='prefit',
    refit=False
)

tuned_threshold_model.fit(X_train_balanced, y_train_balanced)

print(f"\n   Optimal Threshold Found: {tuned_threshold_model.best_threshold_:.4f}")
print(f"   (Default threshold was: 0.5)")

# Predict with optimal threshold
y_pred_gb_c4 = tuned_threshold_model.predict(X_val)
y_proba_gb_c4 = tuned_threshold_model.predict_proba(X_val)[:, 1]

# Metrics
acc_gb_c4 = accuracy_score(y_val, y_pred_gb_c4)
prec_gb_c4 = precision_score(y_val, y_pred_gb_c4)
rec_gb_c4 = recall_score(y_val, y_pred_gb_c4)
f1_gb_c4 = f1_score(y_val, y_pred_gb_c4)
roc_gb_c4 = roc_auc_score(y_val, y_proba_gb_c4)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_gb_c4:.4f}")
print(f"   Precision: {prec_gb_c4:.4f}")
print(f"   Recall:    {rec_gb_c4:.4f}")
print(f"   F1-Score:  {f1_gb_c4:.4f}")
print(f"   ROC-AUC:   {roc_gb_c4:.4f}")

cm = confusion_matrix(y_val, y_pred_gb_c4)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")
print(f"   False Alarms: {FP} (unnecessary campaigns)")

# Compare with Config 3
print(f"\n   Impact of Threshold Tuning (F1-optimized):")
print(f"   Recall: {rec_gb_c3:.4f} → {rec_gb_c4:.4f} ({rec_gb_c4-rec_gb_c3:+.4f})")
print(f"   F1: {f1_gb_c3:.4f} → {f1_gb_c4:.4f} ({f1_gb_c4-f1_gb_c3:+.4f})")
print(f"   Precision: {prec_gb_c3:.4f} → {prec_gb_c4:.4f} ({prec_gb_c4-prec_gb_c3:+.4f})")

# Store (update the existing Config 4)
results_systematic[-1] = {
    'Config': '4. All Optimized',
    'Hyperparams': 'Tuned',
    'Balancing': 'Yes',
    'Threshold': f'{tuned_threshold_model.best_threshold_:.2f}',
    'Recall': rec_gb_c4,
    'F1-Score': f1_gb_c4,
    'Precision': prec_gb_c4
}


   Optimal Threshold Found: 0.4735
   (Default threshold was: 0.5)

   Validation Results:
   Accuracy:  0.7509
   Precision: 0.5217
   Recall:    0.7380
   F1-Score:  0.6113
   ROC-AUC:   0.8316

   Business Impact:
   Detected: 276/374 churners (73.8%)
   Missed: 98 churners
   False Alarms: 253 (unnecessary campaigns)

   Impact of Threshold Tuning (F1-optimized):
   Recall: 0.7139 → 0.7380 (+0.0241)
   F1: 0.6089 → 0.6113 (+0.0024)
   Precision: 0.5308 → 0.5217 (-0.0091)


In [56]:
# FINAL COMPARISON - ALL 4 CONFIGURATIONS


import pandas as pd

# Remove duplicates (keep only unique configs)
configs_unique = []
seen = set()
for config in results_systematic:
    config_name = config['Config']
    if config_name not in seen:
        configs_unique.append(config)
        seen.add(config_name)

# Create comparison DataFrame
df_comparison = pd.DataFrame(configs_unique)

print("\n Complete Results (4 Configurations):")
print(df_comparison.to_string(index=False))

# Sort by F1-Score
df_sorted = df_comparison.sort_values('F1-Score', ascending=False)

print("\n Ranked by F1-Score:")
print(df_sorted[['Config', 'Recall', 'F1-Score', 'Precision']].to_string(index=False))

# Impact Analysis

c1 = df_comparison.iloc[0]  # Baseline
c2 = df_comparison.iloc[1]  # Tuned
c3 = df_comparison.iloc[2]  # Tuned+SMOTE
c4 = df_comparison.iloc[3]  # All Optimized

print("\n 1. Impact of Hyperparameter Tuning (Config 1 → 2):")
print(f"   Recall: {c1['Recall']:.4f} → {c2['Recall']:.4f} ({c2['Recall']-c1['Recall']:+.4f})")
print(f"   F1: {c1['F1-Score']:.4f} → {c2['F1-Score']:.4f} ({c2['F1-Score']-c1['F1-Score']:+.4f})")

print("\n 2. Impact of Class Balancing - SMOTE (Config 2 → 3):")
print(f"   Recall: {c2['Recall']:.4f} → {c3['Recall']:.4f} ({c3['Recall']-c2['Recall']:+.4f})")
print(f"   Improvement: +{(c3['Recall']-c2['Recall'])*100:.1f} percentage points")
print(f"   F1: {c2['F1-Score']:.4f} → {c3['F1-Score']:.4f} ({c3['F1-Score']-c2['F1-Score']:+.4f})")

print("\n 3. Impact of Threshold Tuning (Config 3 → 4):")
print(f"   Recall: {c3['Recall']:.4f} → {c4['Recall']:.4f} ({c4['Recall']-c3['Recall']:+.4f})")
print(f"   F1: {c3['F1-Score']:.4f} → {c4['F1-Score']:.4f} ({c4['F1-Score']-c3['F1-Score']:+.4f})")
print(f"   Precision: {c3['Precision']:.4f} → {c4['Precision']:.4f} ({c4['Precision']-c3['Precision']:+.4f})")

print("\n Total Improvement (Baseline → Best):")
best_config = df_sorted.iloc[0]
print(f"   Best Configuration: {best_config['Config']}")
print(f"   Recall: {c1['Recall']:.4f} → {best_config['Recall']:.4f} ({best_config['Recall']-c1['Recall']:+.4f})")
print(f"   F1: {c1['F1-Score']:.4f} → {best_config['F1-Score']:.4f} ({best_config['F1-Score']-c1['F1-Score']:+.4f})")



 Complete Results (4 Configurations):
          Config Hyperparams Balancing Threshold   Recall  F1-Score  Precision
     1. Baseline     Default        No       0.5 0.489305  0.569207   0.680297
        2. Tuned       Tuned        No       0.5 0.489305  0.569207   0.680297
  3. Tuned+SMOTE       Tuned       Yes       0.5 0.713904  0.608894   0.530815
4. All Optimized       Tuned       Yes      0.47 0.737968  0.611296   0.521739

 Ranked by F1-Score:
          Config   Recall  F1-Score  Precision
4. All Optimized 0.737968  0.611296   0.521739
  3. Tuned+SMOTE 0.713904  0.608894   0.530815
     1. Baseline 0.489305  0.569207   0.680297
        2. Tuned 0.489305  0.569207   0.680297

 1. Impact of Hyperparameter Tuning (Config 1 → 2):
   Recall: 0.4893 → 0.4893 (+0.0000)
   F1: 0.5692 → 0.5692 (+0.0000)

 2. Impact of Class Balancing - SMOTE (Config 2 → 3):
   Recall: 0.4893 → 0.7139 (+0.2246)
   Improvement: +22.5 percentage points
   F1: 0.5692 → 0.6089 (+0.0397)

 3. Impact of Thresh

1. Hyperparameter Tuning: NO IMPACT
   → sklearn defaults (lr=0.1, depth=3, n=100) were already optimal
   → Grid Search confirmed these are the best parameters

2. Class Balancing (SMOTE)
   → +23.5 percentage points Recall improvement
   → This is THE main driver of performance
   → Without SMOTE: 48.9% Recall (191 churners missed)
   → With SMOTE: 72.5% Recall (103 churners missed)
   → Result: 88 additional churners detected!

3. Threshold Tuning: MARGINAL BENEFIT
   → +2.9% Recall but -0.6% F1-Score
   → Detects 11 more churners but 57 more false alarms
   → Trade-off depends on business cost/benefit

In [57]:
# LOGISTIC REGRESSION - CONFIG 1: BASELINE


from sklearn.linear_model import LogisticRegression

# Initialize results storage
results_lr_systematic = []

# Train with default parameters
lr_c1 = LogisticRegression(random_state=42, max_iter=1000)
lr_c1.fit(X_train, y_train)

# Predict
y_pred_lr_c1 = lr_c1.predict(X_val)
y_proba_lr_c1 = lr_c1.predict_proba(X_val)[:, 1]

# Metrics
acc_lr_c1 = accuracy_score(y_val, y_pred_lr_c1)
prec_lr_c1 = precision_score(y_val, y_pred_lr_c1)
rec_lr_c1 = recall_score(y_val, y_pred_lr_c1)
f1_lr_c1 = f1_score(y_val, y_pred_lr_c1)
roc_lr_c1 = roc_auc_score(y_val, y_proba_lr_c1)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_lr_c1:.4f}")
print(f"   Precision: {prec_lr_c1:.4f}")
print(f"   Recall:    {rec_lr_c1:.4f}")
print(f"   F1-Score:  {f1_lr_c1:.4f}")
print(f"   ROC-AUC:   {roc_lr_c1:.4f}")

cm = confusion_matrix(y_val, y_pred_lr_c1)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Store
results_lr_systematic.append({
    'Config': '1. Baseline',
    'Hyperparams': 'Default',
    'Balancing': 'No',
    'Threshold': '0.5',
    'Recall': rec_lr_c1,
    'F1-Score': f1_lr_c1,
    'Precision': prec_lr_c1
})



   Validation Results:
   Accuracy:  0.8055
   Precision: 0.6678
   Recall:    0.5321
   F1-Score:  0.5923
   ROC-AUC:   0.8349

   Business Impact:
   Detected: 199/374 churners (53.2%)
   Missed: 175 churners


In [58]:

# LOGISTIC REGRESSION - CONFIG 2: TUNED

# Parameter grid for Logistic Regression
param_grid_lr = {
    'C': [0.01, 0.1, 1.0, 10.0],           # Regularization strength
    'penalty': ['l2'],                      # L2 regularization
    'solver': ['lbfgs'],                    # Solver
    'max_iter': [1000]
}

print(f"\n   Parameter Grid:")
print(f"   C (regularization): {param_grid_lr['C']}")
print(f"   Total combinations: 4")

# Grid Search
grid_lr_c2 = GridSearchCV(
    LogisticRegression(random_state=42),
    param_grid_lr,
    cv=3,
    scoring='f1',
    verbose=1
)

grid_lr_c2.fit(X_train, y_train)

print(f"\n   Best Parameters:")
for param, value in grid_lr_c2.best_params_.items():
    print(f"   {param}: {value}")

print(f"\n   Best CV F1-Score: {grid_lr_c2.best_score_:.4f}")

# Best model
lr_c2 = grid_lr_c2.best_estimator_

# Predict
y_pred_lr_c2 = lr_c2.predict(X_val)
y_proba_lr_c2 = lr_c2.predict_proba(X_val)[:, 1]

# Metrics
acc_lr_c2 = accuracy_score(y_val, y_pred_lr_c2)
prec_lr_c2 = precision_score(y_val, y_pred_lr_c2)
rec_lr_c2 = recall_score(y_val, y_pred_lr_c2)
f1_lr_c2 = f1_score(y_val, y_pred_lr_c2)
roc_lr_c2 = roc_auc_score(y_val, y_proba_lr_c2)

print(f"\n   Validation Results:")
print(f"   Recall:    {rec_lr_c2:.4f}")
print(f"   F1-Score:  {f1_lr_c2:.4f}")
print(f"   Precision: {prec_lr_c2:.4f}")

# Overfitting check
print(f"\n   Overfitting Check:")
print(f"   CV F1:  {grid_lr_c2.best_score_:.4f}")
print(f"   Val F1: {f1_lr_c2:.4f}")
print(f"   Gap:    {grid_lr_c2.best_score_ - f1_lr_c2:.4f}")

cm = confusion_matrix(y_val, y_pred_lr_c2)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Compare with Config 1
print(f"\n   Impact of Hyperparameter Tuning:")
print(f"   Recall: {rec_lr_c1:.4f} → {rec_lr_c2:.4f} ({rec_lr_c2-rec_lr_c1:+.4f})")
print(f"   F1: {f1_lr_c1:.4f} → {f1_lr_c2:.4f} ({f1_lr_c2-f1_lr_c1:+.4f})")

# Store
results_lr_systematic.append({
    'Config': '2. Tuned',
    'Hyperparams': 'Tuned',
    'Balancing': 'No',
    'Threshold': '0.5',
    'Recall': rec_lr_c2,
    'F1-Score': f1_lr_c2,
    'Precision': prec_lr_c2
})


   Parameter Grid:
   C (regularization): [0.01, 0.1, 1.0, 10.0]
   Total combinations: 4
Fitting 3 folds for each of 4 candidates, totalling 12 fits

   Best Parameters:
   C: 10.0
   max_iter: 1000
   penalty: l2
   solver: lbfgs

   Best CV F1-Score: 0.5988

   Validation Results:
   Recall:    0.5348
   F1-Score:  0.5961
   Precision: 0.6734

   Overfitting Check:
   CV F1:  0.5988
   Val F1: 0.5961
   Gap:    0.0027

   Business Impact:
   Detected: 200/374 churners (53.5%)
   Missed: 174 churners

   Impact of Hyperparameter Tuning:
   Recall: 0.5321 → 0.5348 (+0.0027)
   F1: 0.5923 → 0.5961 (+0.0039)


In [59]:
# LOGISTIC REGRESSION - CONFIG 3: TUNED + SMOTE

# Train with best hyperparameters on BALANCED data
lr_c3 = LogisticRegression(**grid_lr_c2.best_params_, random_state=42)
lr_c3.fit(X_train_balanced, y_train_balanced)

# Predict on IMBALANCED validation
y_pred_lr_c3 = lr_c3.predict(X_val)
y_proba_lr_c3 = lr_c3.predict_proba(X_val)[:, 1]

# Metrics
acc_lr_c3 = accuracy_score(y_val, y_pred_lr_c3)
prec_lr_c3 = precision_score(y_val, y_pred_lr_c3)
rec_lr_c3 = recall_score(y_val, y_pred_lr_c3)
f1_lr_c3 = f1_score(y_val, y_pred_lr_c3)
roc_lr_c3 = roc_auc_score(y_val, y_proba_lr_c3)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_lr_c3:.4f}")
print(f"   Precision: {prec_lr_c3:.4f}")
print(f"   Recall:    {rec_lr_c3:.4f}")
print(f"   F1-Score:  {f1_lr_c3:.4f}")
print(f"   ROC-AUC:   {roc_lr_c3:.4f}")

cm = confusion_matrix(y_val, y_pred_lr_c3)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")

# Compare with Config 2
print(f"\n   Impact of Class Balancing (SMOTE):")
print(f"   Recall: {rec_lr_c2:.4f} → {rec_lr_c3:.4f} ({rec_lr_c3-rec_lr_c2:+.4f})")
print(f"   F1: {f1_lr_c2:.4f} → {f1_lr_c3:.4f} ({f1_lr_c3-f1_lr_c2:+.4f})")

# Store
results_lr_systematic.append({
    'Config': '3. Tuned+SMOTE',
    'Hyperparams': 'Tuned',
    'Balancing': 'Yes',
    'Threshold': '0.5',
    'Recall': rec_lr_c3,
    'F1-Score': f1_lr_c3,
    'Precision': prec_lr_c3
})


   Validation Results:
   Accuracy:  0.7779
   Precision: 0.5688
   Recall:    0.6738
   F1-Score:  0.6169
   ROC-AUC:   0.8276

   Business Impact:
   Detected: 252/374 churners (67.4%)
   Missed: 122 churners

   Impact of Class Balancing (SMOTE):
   Recall: 0.5348 → 0.6738 (+0.1390)
   F1: 0.5961 → 0.6169 (+0.0208)


In [60]:
# LOGISTIC REGRESSION - CONFIG 4: ALL OPTIMIZED

from sklearn.model_selection import TunedThresholdClassifierCV

# Tune threshold to maximize F1-Score
tuned_threshold_lr = TunedThresholdClassifierCV(
    estimator=lr_c3,
    scoring='f1',      # Optimize for F1 (balanced)
    cv='prefit',
    refit=False
)

tuned_threshold_lr.fit(X_train_balanced, y_train_balanced)

print(f"\n   Optimal Threshold: {tuned_threshold_lr.best_threshold_:.4f}")
print(f"   (Default was: 0.5)")

# Predict with optimal threshold
y_pred_lr_c4 = tuned_threshold_lr.predict(X_val)
y_proba_lr_c4 = tuned_threshold_lr.predict_proba(X_val)[:, 1]

# Metrics
acc_lr_c4 = accuracy_score(y_val, y_pred_lr_c4)
prec_lr_c4 = precision_score(y_val, y_pred_lr_c4)
rec_lr_c4 = recall_score(y_val, y_pred_lr_c4)
f1_lr_c4 = f1_score(y_val, y_pred_lr_c4)
roc_lr_c4 = roc_auc_score(y_val, y_proba_lr_c4)

print(f"\n   Validation Results:")
print(f"   Accuracy:  {acc_lr_c4:.4f}")
print(f"   Precision: {prec_lr_c4:.4f}")
print(f"   Recall:    {rec_lr_c4:.4f}")
print(f"   F1-Score:  {f1_lr_c4:.4f}")
print(f"   ROC-AUC:   {roc_lr_c4:.4f}")

cm = confusion_matrix(y_val, y_pred_lr_c4)
TN, FP, FN, TP = cm.ravel()
print(f"\n   Business Impact:")
print(f"   Detected: {TP}/{TP+FN} churners ({TP/(TP+FN):.1%})")
print(f"   Missed: {FN} churners")
print(f"   False Alarms: {FP}")

# Compare with Config 3
print(f"\n   Impact of Threshold Tuning:")
print(f"   Recall: {rec_lr_c3:.4f} → {rec_lr_c4:.4f} ({rec_lr_c4-rec_lr_c3:+.4f})")
print(f"   F1: {f1_lr_c3:.4f} → {f1_lr_c4:.4f} ({f1_lr_c4-f1_lr_c3:+.4f})")
print(f"   Precision: {prec_lr_c3:.4f} → {prec_lr_c4:.4f} ({prec_lr_c4-prec_lr_c3:+.4f})")

# Store
results_lr_systematic.append({
    'Config': '4. All Optimized',
    'Hyperparams': 'Tuned',
    'Balancing': 'Yes',
    'Threshold': f'{tuned_threshold_lr.best_threshold_:.2f}',
    'Recall': rec_lr_c4,
    'F1-Score': f1_lr_c4,
    'Precision': prec_lr_c4
})



   Optimal Threshold: 0.4445
   (Default was: 0.5)

   Validation Results:
   Accuracy:  0.7516
   Precision: 0.5233
   Recall:    0.7193
   F1-Score:  0.6059
   ROC-AUC:   0.8276

   Business Impact:
   Detected: 269/374 churners (71.9%)
   Missed: 105 churners
   False Alarms: 245

   Impact of Threshold Tuning:
   Recall: 0.6738 → 0.7193 (+0.0455)
   F1: 0.6169 → 0.6059 (-0.0110)
   Precision: 0.5688 → 0.5233 (-0.0455)


## Systematic Comparison - Results Summary

### Gradient Boosting Performance

| Config | Hyperparams | Balancing | Threshold | Recall | F1-Score |
|--------|-------------|-----------|-----------|--------|----------|
| 1. Baseline | Default | No | 0.5 | 48.93% | 56.92% |
| 2. Tuned | Tuned | No | 0.5 | 48.93% | 56.92% |
| 3. Tuned+SMOTE | Tuned | Yes | 0.5 | 72.46% | 62.16% |
| 4. All Optimized | Tuned | Yes | 0.46 | 75.40% | 61.57% |

### Logistic Regression Performance

| Config | Hyperparams | Balancing | Threshold | Recall | F1-Score |
|--------|-------------|-----------|-----------|--------|----------|
| 1. Baseline | Default | No | 0.5 | 53.21% | 59.23% |
| 2. Tuned | Tuned | No | 0.5 | 53.48% | 59.61% |
| 3. Tuned+SMOTE | Tuned | Yes | 0.5 | 67.38% | 61.69% |
| 4. All Optimized | Tuned | Yes | 0.44 | 71.93% | 60.59% |

### Impact Analysis

**1. Hyperparameter Tuning Impact:**
- Gradient Boosting: 0% improvement (defaults already optimal)
- Logistic Regression: 0.4% improvement (minimal)
- **Conclusion:** sklearn defaults are well-calibrated

**2. Class Balancing (SMOTE) Impact:**
- Gradient Boosting: +23.5 percentage points
- Logistic Regression: +13.9 percentage points
- **Conclusion:** SMOTE is THE critical factor

**3. Threshold Tuning Impact:**
- Gradient Boosting: +2.9% Recall, -0.6% F1
- Logistic Regression: +4.6% Recall, -1.1% F1
- **Conclusion:** Marginal benefit with precision trade-off

### Champion Model Selected

**Winner: Gradient Boosting + SMOTE (Config 3)**
- F1-Score: 62.16% (best)
- Recall: 72.46% (+5.08 points vs Logistic Regression)
- Better at detecting churners while maintaining acceptable precision

**Next Step:** Final evaluation on test set (never seen before)

# 7-final-evaluation-on-test-set

## Objective
Evaluate the champion model on completely unseen data to assess true generalization performance.

## Champion Model
**Gradient Boosting + SMOTE**

In [61]:
# FINAL COMPARISON: GRADIENT BOOSTING vs LOGISTIC REGRESSION

import pandas as pd

# Clean duplicates
configs_gb_unique = []
seen = set()
for config in results_systematic:
    if config['Config'] not in seen:
        configs_gb_unique.append(config)
        seen.add(config['Config'])

# Create DataFrames
df_gb = pd.DataFrame(configs_gb_unique)
df_gb['Model'] = 'Gradient Boosting'

df_lr = pd.DataFrame(results_lr_systematic)
df_lr['Model'] = 'Logistic Regression'

# Combine
df_all = pd.concat([df_gb, df_lr], ignore_index=True)

print(df_all[['Model', 'Config', 'Recall', 'F1-Score', 'Precision']].to_string(index=False))

# Best of each model

df_gb_sorted = df_gb.sort_values('F1-Score', ascending=False)
df_lr_sorted = df_lr.sort_values('F1-Score', ascending=False)

best_gb = df_gb_sorted.iloc[0]
best_lr = df_lr_sorted.iloc[0]

print("\nGRADIENT BOOSTING - Best F1-Score:")
print(f"   Configuration: {best_gb['Config']}")
print(f"   Recall:    {best_gb['Recall']:.4f}")
print(f"   F1-Score:  {best_gb['F1-Score']:.4f}")
print(f"   Precision: {best_gb['Precision']:.4f}")

print("\nLOGISTIC REGRESSION - Best F1-Score:")
print(f"   Configuration: {best_lr['Config']}")
print(f"   Recall:    {best_lr['Recall']:.4f}")
print(f"   F1-Score:  {best_lr['F1-Score']:.4f}")
print(f"   Precision: {best_lr['Precision']:.4f}")

# Direct comparison

print(f"\nF1-Score:")
print(f"   Gradient Boosting: {best_gb['F1-Score']:.4f}")
print(f"   Logistic Regression: {best_lr['F1-Score']:.4f}")
print(f"   Winner: {'Gradient Boosting' if best_gb['F1-Score'] > best_lr['F1-Score'] else 'Logistic Regression'} (+{abs(best_gb['F1-Score']-best_lr['F1-Score']):.4f})")

print(f"\nRecall:")
print(f"   Gradient Boosting: {best_gb['Recall']:.4f}")
print(f"   Logistic Regression: {best_lr['Recall']:.4f}")
print(f"   Winner: {'Gradient Boosting' if best_gb['Recall'] > best_lr['Recall'] else 'Logistic Regression'} (+{abs(best_gb['Recall']-best_lr['Recall']):.4f})")

print(f"\nPrecision:")
print(f"   Gradient Boosting: {best_gb['Precision']:.4f}")
print(f"   Logistic Regression: {best_lr['Precision']:.4f}")
print(f"   Winner: {'Gradient Boosting' if best_gb['Precision'] > best_lr['Precision'] else 'Logistic Regression'} (+{abs(best_gb['Precision']-best_lr['Precision']):.4f})")

# Overall winner

if best_gb['F1-Score'] > best_lr['F1-Score']:
    winner_model = 'Gradient Boosting'
    winner_config = best_gb
    print(f"\n CHAMPION: GRADIENT BOOSTING")
    print(f"   Configuration: {best_gb['Config']}")
    print(f"   F1-Score: {best_gb['F1-Score']:.4f}")
    print(f"   Recall: {best_gb['Recall']:.4f}")
    print(f"   Precision: {best_gb['Precision']:.4f}")
else:
    winner_model = 'Logistic Regression'
    winner_config = best_lr
    print(f"\n CHAMPION: LOGISTIC REGRESSION")
    print(f"   Configuration: {best_lr['Config']}")
    print(f"   F1-Score: {best_lr['F1-Score']:.4f}")
    print(f"   Recall: {best_lr['Recall']:.4f}")
    print(f"   Precision: {best_lr['Precision']:.4f}")

              Model           Config   Recall  F1-Score  Precision
  Gradient Boosting      1. Baseline 0.489305  0.569207   0.680297
  Gradient Boosting         2. Tuned 0.489305  0.569207   0.680297
  Gradient Boosting   3. Tuned+SMOTE 0.713904  0.608894   0.530815
  Gradient Boosting 4. All Optimized 0.737968  0.611296   0.521739
Logistic Regression      1. Baseline 0.532086  0.592262   0.667785
Logistic Regression         2. Tuned 0.534759  0.596125   0.673401
Logistic Regression   3. Tuned+SMOTE 0.673797  0.616891   0.568849
Logistic Regression 4. All Optimized 0.719251  0.605856   0.523346

GRADIENT BOOSTING - Best F1-Score:
   Configuration: 4. All Optimized
   Recall:    0.7380
   F1-Score:  0.6113
   Precision: 0.5217

LOGISTIC REGRESSION - Best F1-Score:
   Configuration: 3. Tuned+SMOTE
   Recall:    0.6738
   F1-Score:  0.6169
   Precision: 0.5688

F1-Score:
   Gradient Boosting: 0.6113
   Logistic Regression: 0.6169
   Winner: Logistic Regression (+0.0056)

Recall:
   Gradi

In [62]:
# FINAL EVALUATION ON TEST SET

# Predict on TEST set
y_pred_test = gb_c3.predict(X_test)
y_proba_test = gb_c3.predict_proba(X_test)[:, 1]

# Calculate metrics
acc_test = accuracy_score(y_test, y_pred_test)
prec_test = precision_score(y_test, y_pred_test)
rec_test = recall_score(y_test, y_pred_test)
f1_test = f1_score(y_test, y_pred_test)
roc_test = roc_auc_score(y_test, y_proba_test)

print(f"\nTest Set Performance:")
print(f"  Accuracy:  {acc_test:.4f}")
print(f"  Precision: {prec_test:.4f}")
print(f"  Recall:    {rec_test:.4f} ")
print(f"  F1-Score:  {f1_test:.4f} ")
print(f"  ROC-AUC:   {roc_test:.4f}")

# Confusion Matrix
cm_test = confusion_matrix(y_test, y_pred_test)
print(cm_test)

TN, FP, FN, TP = cm_test.ravel()
print(f"\n                Predicted")
print(f"              No      Yes")
print(f"Actual  No   {TN:4d}    {FP:4d}")
print(f"        Yes  {FN:4d}    {TP:4d}")

print(f"\n Business Impact (Test Set):")
print(f"   Total churners: {TP+FN}")
print(f"   Correctly detected: {TP} ({TP/(TP+FN):.1%})")
print(f"   MISSED: {FN} ({FN/(TP+FN):.1%})")
print(f"   False alarms: {FP}")

# Compare Validation vs Test

print(f"\n                Validation    Test       Difference")
print(f"Recall:         {rec_gb_c3:.4f}      {rec_test:.4f}    {rec_test-rec_gb_c3:+.4f}")
print(f"F1-Score:       {f1_gb_c3:.4f}      {f1_test:.4f}    {f1_test-f1_gb_c3:+.4f}")
print(f"Precision:      {prec_gb_c3:.4f}      {prec_test:.4f}    {prec_test-prec_gb_c3:+.4f}")
print(f"ROC-AUC:        {roc_gb_c3:.4f}      {roc_test:.4f}    {roc_test-roc_gb_c3:+.4f}")

# Generalization check
diff = abs(f1_test - f1_gb_c3)
if diff < 0.03:
    print("\n EXCELLENT GENERALIZATION!")
    print(f"   Test performance very close to validation (F1 diff: {diff:.4f})")
elif diff < 0.05:
    print("\n GOOD GENERALIZATION")
    print(f"   Test performance acceptable (F1 diff: {diff:.4f})")
else:
    print("\nPerformance gap detected")
    print(f"   Test vs Validation F1 difference: {diff:.4f}")


Test Set Performance:
  Accuracy:  0.7530
  Precision: 0.5255
  Recall:    0.7166 
  F1-Score:  0.6063 
  ROC-AUC:   0.8358
[[793 242]
 [106 268]]

                Predicted
              No      Yes
Actual  No    793     242
        Yes   106     268

 Business Impact (Test Set):
   Total churners: 374
   Correctly detected: 268 (71.7%)
   MISSED: 106 (28.3%)
   False alarms: 242

                Validation    Test       Difference
Recall:         0.7139      0.7166    +0.0027
F1-Score:       0.6089      0.6063    -0.0026
Precision:      0.5308      0.5255    -0.0053
ROC-AUC:        0.8316      0.8358    +0.0042

 EXCELLENT GENERALIZATION!
   Test performance very close to validation (F1 diff: 0.0026)


In [63]:
print(f"""
Model: Gradient Boosting + SMOTE
Hyperparameters: learning_rate=0.1, max_depth=3, n_estimators=100
Training: SMOTE-balanced data (50/50 split)

FINAL TEST PERFORMANCE:
  F1-Score:  {f1_test:.2%}
  Recall:    {rec_test:.2%}
  Precision: {prec_test:.2%}
  ROC-AUC:   {roc_test:.2%}

BUSINESS IMPACT:
  Churners detected: {TP}/{TP+FN} ({TP/(TP+FN):.1%})
  Churners missed: {FN}
  Campaign efficiency: {TP/(TP+FP):.1%} true positives
""")



Model: Gradient Boosting + SMOTE
Hyperparameters: learning_rate=0.1, max_depth=3, n_estimators=100
Training: SMOTE-balanced data (50/50 split)

FINAL TEST PERFORMANCE:
  F1-Score:  60.63%
  Recall:    71.66%
  Precision: 52.55%
  ROC-AUC:   83.58%

BUSINESS IMPACT:
  Churners detected: 268/374 (71.7%)
  Churners missed: 106
  Campaign efficiency: 52.5% true positives



## Final Evaluation - Summary

### Test Set Performance
**Champion Model: Gradient Boosting + SMOTE**

| Metric | Score |
|--------|-------|
| F1-Score | 61.24% |
| Recall | 72.46% |
| Precision | 53.03% |
| ROC-AUC | 83.67% |
| Accuracy | 75.66% |

### Confusion Matrix Analysis
```
              Predicted
              No    Yes
Actual  No    795   240
        Yes   103   271
```

**Business Impact:**
- Churners detected: 271/374 (72.5%)
- Churners missed: 103 (27.5%)
- False alarms: 240 campaigns

### Generalization Check

| Metric | Validation | Test | Difference |
|--------|------------|------|------------|
| Recall | 72.46% | 72.46% | 0.00%  |
| F1-Score | 62.16% | 61.24% | -0.92%  |
| Precision | 54.42% | 53.03% | -1.39%  |
| ROC-AUC | 83.18% | 83.67% | +0.49%  |

** EXCELLENT GENERALIZATION**
- Test performance nearly identical to validation
- No overfitting detected
- Model is stable and reliable

### Final Performance vs Baseline

| Model | Churners Detected | Improvement |
|-------|-------------------|-------------|
| Baseline (no optimization) | 183/374 (48.9%) | - |
| **Final Model (GB+SMOTE)** | **271/374 (72.5%)** | **+88 customers** |

**Model is validated and ready for deployment.**


# 8-feature-importance-analysis

## Objective
Identify which features have the most influence on churn predictions to guide business recommendations.

## Method
Gradient Boosting models provide built-in feature importance scores based on how often and how effectively each feature is used for splitting decisions.

## Interpretation
- **Higher importance:** Feature is critical for predictions
- **Lower importance:** Feature has minimal impact

## Business Value
Understanding feature importance allows us to:
1. Focus retention efforts on key risk factors
2. Prioritize operational improvements
3. Validate model decisions with business logic

---

In [64]:

# FEATURE IMPORTANCE ANALYSIS

# Get feature importance from the champion model (gb_c3)
feature_importance = gb_c3.feature_importances_
feature_names = X_train.columns

# Create DataFrame
importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': feature_importance
}).sort_values('Importance', ascending=False)

print("\n Top 10 Most Important Features:")
print(importance_df.head(10).to_string(index=False))

print("\n Top 5 Features (Detailed):")
for i, row in importance_df.head(5).iterrows():
    print(f"\n{i+1}. {row['Feature']}")
    print(f"   Importance: {row['Importance']:.4f} ({row['Importance']/importance_df['Importance'].sum():.1%} of total)")

# Visualize (text-based)
print("\n Feature Importance Distribution:")
print("\nTop 10 Features:")
for i, row in importance_df.head(10).iterrows():
    bar_length = int(row['Importance'] * 100)
    bar = '█' * bar_length
    print(f"{row['Feature']:<30} {bar} {row['Importance']:.4f}")

# Categorize features
contract_features = [f for f in feature_names if 'Contract' in f or 'tenure' in f]
service_features = [f for f in feature_names if any(x in f for x in ['Internet', 'Phone', 'Online', 'Tech', 'Streaming', 'total_services'])]
billing_features = [f for f in feature_names if any(x in f for x in ['Charges', 'Payment', 'Billing', 'Paperless'])]
demographic_features = [f for f in feature_names if any(x in f for x in ['gender', 'Senior', 'Partner', 'Dependents'])]

categories = {
    'Contract & Tenure': contract_features,
    'Services': service_features,
    'Billing & Charges': billing_features,
    'Demographics': demographic_features
}

for category, features in categories.items():
    cat_importance = importance_df[importance_df['Feature'].isin(features)]['Importance'].sum()
    print(f"\n{category}:")
    print(f"   Total Importance: {cat_importance:.4f} ({cat_importance/importance_df['Importance'].sum():.1%})")
    
    # Top features in category
    cat_df = importance_df[importance_df['Feature'].isin(features)].head(3)
    for _, row in cat_df.iterrows():
        print(f"   - {row['Feature']}: {row['Importance']:.4f}")



 Top 10 Most Important Features:
                       Feature  Importance
                        tenure    0.278356
             Contract_Two year    0.153759
   InternetService_Fiber optic    0.137482
PaymentMethod_Electronic check    0.124101
             Contract_One year    0.078911
                total_services    0.057017
                MonthlyCharges    0.037940
           StreamingMovies_Yes    0.020183
                    Dependents    0.017540
               StreamingTV_Yes    0.014091

 Top 5 Features (Detailed):

5. tenure
   Importance: 0.2784 (27.8% of total)

27. Contract_Two year
   Importance: 0.1538 (15.4% of total)

12. InternetService_Fiber optic
   Importance: 0.1375 (13.7% of total)

29. PaymentMethod_Electronic check
   Importance: 0.1241 (12.4% of total)

26. Contract_One year
   Importance: 0.0789 (7.9% of total)

 Feature Importance Distribution:

Top 10 Features:
tenure                         ███████████████████████████ 0.2784
Contract_Two year        

---

## Feature Importance - Key Insights

### Top 5 Most Predictive Features

1. **tenure (27.7%)** - Customer loyalty duration
   - Short tenure = high churn risk
   - First 12 months are critical

2. **Contract_Two year (15.4%)** - Long-term contract commitment
   - Month-to-month contracts have ~43% churn rate
   - Contract type is strongest retention lever

3. **InternetService_Fiber optic (13.8%)** - Service type
   - Fiber customers churn more than DSL
   - May indicate service quality or pricing issues

4. **PaymentMethod_Electronic check (12.5%)** - Payment method
   - Manual payment associated with higher churn
   - Friction in payment process increases risk

5. **total_services (5.8%)** - Service engagement
   - More services = higher switching costs
   - Cross-selling reduces churn

### Feature Categories

| Category | Importance | Key Insight |
|----------|------------|-------------|
| Contract & Tenure | 51.1% | Contract commitment is paramount |
| Services | 25.9% | Service bundle matters |
| Billing & Charges | 18.1% | Payment experience impacts retention |
| Demographics | 2.6% | Age/gender have minimal impact |

### Business Implications

**Priority 1:** Contract conversion programs (51% of prediction power)
**Priority 2:** Service bundle optimization (26% of prediction power)
**Priority 3:** Payment experience improvements (18% of prediction power)

**Demographics are NOT key drivers** - focus retention on behavioral factors, not customer profiles.

---

In [65]:
import pickle

with open('churn_model_final_gb_c3.pkl', 'wb') as f:
    pickle.dump(gb_c3, f)

In [66]:
import os
size = os.path.getsize('churn_model_final_gb_c3.pkl')
print(f"Taille du fichier: {size} bytes")

Taille du fichier: 139935 bytes


In [67]:
import pickle
import shutil

# Copier DIRECTEMENT vers flask_api (pas de téléchargement)
shutil.copy('churn_model_final_gb_c3.pkl', '/Users/axmbe/Desktop/flask_api/churn_model_final_gb_c3.pkl')

print("✓ Modèle copié directement!")

# Vérifier dans flask_api
import os
size = os.path.getsize('/Users/axmbe/Desktop/flask_api/churn_model_final_gb_c3.pkl')
print(f"Taille dans flask_api: {size} bytes")

✓ Modèle copié directement!
Taille dans flask_api: 139935 bytes
