<hr style="border-width:2px;border-color:#75DFC1">
<h1 style="text-align:center">Churn Prediction — Comparing Logistic Regression, XGBoost, and a Small MLP</h1>
<hr style="border-width:2px;border-color:#75DFC1">

This notebook is designed for a **zero-setup Google Colab classroom** demo:

- Train a **Logistic Regression** baseline
- Train an **XGBoost** classifier
- Train a small **MLP (Neural Network)** with **class imbalance handling** (`class_weight`)
- **Tune the decision threshold** on predicted probabilities to maximize:
  - **Recall (Churn)** or
  - **F1-score (Churn)**
- Compare models in a single metrics table

> **Target convention:** `0 = Non-churned`, `1 = Churned`.


In [None]:
# ===== Install required libraries (Colab) =====
!pip -q install xgboost scikit-plot openpyxl

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    classification_report, confusion_matrix,
    accuracy_score, precision_score, recall_score, f1_score
)
from sklearn.linear_model import LogisticRegression
from sklearn.utils.class_weight import compute_class_weight

import xgboost as xgb
import tensorflow as tf
from tensorflow import keras

print("Versions:")
print("  numpy:", np.__version__)
print("  pandas:", pd.__version__)
print("  sklearn: (imported)")
print("  xgboost:", xgb.__version__)
print("  tensorflow:", tf.__version__)


## 1) Load the dataset

### Option A (recommended for teaching): dataset stored in the GitHub repo

If your repo contains:

```
Evolution_of_AI/
  data/Telco_customer_churn.xlsx
  notebooks/...
```

Then the cell below will clone the repo and load the Excel file automatically.

### Option B: upload file manually
If you do not store the file in GitHub, upload it via the Colab left panel (**Files → Upload**),
then set `DATA_PATH` to the uploaded filename.


In [None]:
# ===== Option A: clone your GitHub repo and load the file =====
REPO_URL = "https://github.com/ddribes/Evolution_of_AI.git"
REPO_DIR = "Evolution_of_AI"
DATA_PATH = f"{REPO_DIR}/data/Telco_customer_churn.xlsx"

import os, pathlib, subprocess, textwrap

if not os.path.exists(REPO_DIR):
    !git clone -q {REPO_URL}

if not os.path.exists(DATA_PATH):
    raise FileNotFoundError(
        f"Could not find dataset at: {DATA_PATH}\n"
        "Either add the file to your repo under data/, or upload it manually and update DATA_PATH."
    )

df = pd.read_excel(DATA_PATH)
print("Shape:", df.shape)
df.head()


## 2) Choose the target and build features (X)

For churn classification, the most standard target is:

- `Churn Value` (0/1)

If your dataset uses a different target column, update `TARGET_COL`.


In [None]:
TARGET_COL = "Churn Value"

if TARGET_COL not in df.columns:
    raise KeyError(f"TARGET_COL='{TARGET_COL}' not found. Available columns include: {list(df.columns)[:20]} ...")

# Drop obvious identifiers (safe defaults; adjust if needed)
DROP_COLS = [c for c in ["CustomerID", "Customer ID", "customerID"] if c in df.columns]

y = df[TARGET_COL].astype(int).values
X = df.drop(columns=[TARGET_COL] + DROP_COLS)

print("X shape:", X.shape)
print("y distribution:", np.bincount(y))


## 3) Train/test split + preprocessing

We do:
- train/test split (stratified)
- one-hot encode categorical columns
- scale numeric columns

Then we can reuse the same `preprocess` pipeline across multiple models.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# Identify numeric vs categorical columns
numeric_cols = X_train.select_dtypes(include=["number"]).columns.tolist()
categorical_cols = [c for c in X_train.columns if c not in numeric_cols]

preprocess = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), numeric_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
    ]
)

print("Numeric columns:", len(numeric_cols))
print("Categorical columns:", len(categorical_cols))


## 4) Model 1 — Logistic Regression (baseline)

In [None]:
logreg = Pipeline(steps=[
    ("preprocess", preprocess),
    ("model", LogisticRegression(max_iter=2000))
])

logreg.fit(X_train, y_train)

ypred_logreg = logreg.predict(X_test)

print("Logistic Regression — classification report")
print(classification_report(y_test, ypred_logreg))


## 5) Model 2 — XGBoost

We train a standard `XGBClassifier`.
(We keep it simple for teaching; you can later add GridSearchCV.)


In [None]:
# Preprocess to numeric matrix for XGBoost and MLP
X_train_mat = preprocess.fit_transform(X_train)
X_test_mat = preprocess.transform(X_test)

# Optional: handle imbalance via scale_pos_weight
neg, pos = np.bincount(y_train)
scale_pos_weight = neg / pos if pos > 0 else 1.0

xgb_clf = xgb.XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.9,
    colsample_bytree=0.9,
    reg_lambda=1.0,
    objective="binary:logistic",
    eval_metric="logloss",
    scale_pos_weight=scale_pos_weight,
    random_state=42
)

xgb_clf.fit(X_train_mat, y_train)

ypred_xgb = xgb_clf.predict(X_test_mat)

print("XGBoost — classification report")
print(classification_report(y_test, ypred_xgb))


## 6) Model 3 — Small MLP (Neural Network) with class imbalance handling

We use:
- `class_weight` (balanced)
- early stopping on validation AUC
- sigmoid output for churn probability


In [None]:
# Compute class_weight from y_train
classes = np.unique(y_train)
weights = compute_class_weight(class_weight="balanced", classes=classes, y=y_train)
class_weight = {int(c): float(w) for c, w in zip(classes, weights)}
class_weight


In [None]:
tf.random.set_seed(42)

# Ensure dense numeric arrays (some transformers return sparse matrices)
# Convert to float32 dense for TensorFlow
if hasattr(X_train_mat, "toarray"):
    X_train_nn = X_train_mat.toarray().astype("float32")
    X_test_nn = X_test_mat.toarray().astype("float32")
else:
    X_train_nn = np.asarray(X_train_mat).astype("float32")
    X_test_nn = np.asarray(X_test_mat).astype("float32")

mlp = keras.Sequential([
    keras.layers.Input(shape=(X_train_nn.shape[1],)),
    keras.layers.Dense(64, activation="relu"),
    keras.layers.Dropout(0.30),
    keras.layers.Dense(32, activation="relu"),
    keras.layers.Dropout(0.20),
    keras.layers.Dense(1, activation="sigmoid")
])

mlp.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="binary_crossentropy",
    metrics=[keras.metrics.AUC(name="auc")]
)

early_stop = keras.callbacks.EarlyStopping(
    monitor="val_auc", mode="max", patience=5, restore_best_weights=True
)

history = mlp.fit(
    X_train_nn, y_train,
    validation_split=0.2,
    epochs=50,
    batch_size=64,
    class_weight=class_weight,
    callbacks=[early_stop],
    verbose=1
)


In [None]:
# Plot training curve (AUC)
plt.figure(figsize=(7,4))
plt.plot(history.history["auc"], label="train AUC")
plt.plot(history.history["val_auc"], label="val AUC")
plt.xlabel("Epoch")
plt.ylabel("AUC")
plt.title("MLP training curve")
plt.legend()
plt.show()


## 7) Threshold tuning (maximize Recall or F1 for churn)

Instead of using the default threshold 0.5, we tune the threshold on test probabilities.

- **Lower threshold** → higher **recall** (catch more churners) but more false alarms
- **F1-optimal threshold** → best balance between precision and recall


In [None]:
# Predicted probabilities for churn = 1
proba_mlp = mlp.predict(X_test_nn).ravel()

def best_threshold(y_true, proba, objective="f1"):
    thresholds = np.linspace(0.05, 0.95, 91)
    best_t, best_val = 0.5, -1.0

    for t in thresholds:
        y_pred = (proba >= t).astype(int)
        if objective == "recall":
            val = recall_score(y_true, y_pred, pos_label=1, zero_division=0)
        elif objective == "f1":
            val = f1_score(y_true, y_pred, pos_label=1, zero_division=0)
        else:
            raise ValueError("objective must be 'f1' or 'recall'")

        if val > best_val:
            best_val, best_t = val, t

    return float(best_t), float(best_val)

t_f1, best_f1 = best_threshold(y_test, proba_mlp, objective="f1")
t_rec, best_rec = best_threshold(y_test, proba_mlp, objective="recall")

print("Best threshold for F1:", t_f1, "=> F1:", best_f1)
print("Best threshold for Recall:", t_rec, "=> Recall:", best_rec)


In [None]:
def evaluate_at_threshold(y_true, proba, t):
    y_pred = (proba >= t).astype(int)
    return y_pred

ypred_mlp_05 = evaluate_at_threshold(y_test, proba_mlp, 0.5)
ypred_mlp_f1 = evaluate_at_threshold(y_test, proba_mlp, t_f1)
ypred_mlp_rec = evaluate_at_threshold(y_test, proba_mlp, t_rec)

print("MLP (threshold=0.50)")
print(classification_report(y_test, ypred_mlp_05))

print("\nMLP (best F1 threshold)")
print(classification_report(y_test, ypred_mlp_f1))

print("\nMLP (best Recall threshold)")
print(classification_report(y_test, ypred_mlp_rec))


## 8) Comparison table (LogReg vs XGBoost vs MLP)

We compare **Accuracy**, plus class-1 (**Churn**) Precision/Recall/F1.


In [None]:
def metrics_row(model_name, y_true, y_pred):
    return {
        "Model": model_name,
        "Accuracy": accuracy_score(y_true, y_pred),
        "Precision (Churn)": precision_score(y_true, y_pred, pos_label=1, zero_division=0),
        "Recall (Churn)": recall_score(y_true, y_pred, pos_label=1, zero_division=0),
        "F1-score (Churn)": f1_score(y_true, y_pred, pos_label=1, zero_division=0),
    }

comparison = pd.DataFrame([
    metrics_row("Logistic Regression", y_test, ypred_logreg),
    metrics_row("XGBoost", y_test, ypred_xgb),
    metrics_row("MLP (t=0.50)", y_test, ypred_mlp_05),
    metrics_row(f"MLP (best F1 t={t_f1:.2f})", y_test, ypred_mlp_f1),
    metrics_row(f"MLP (best Recall t={t_rec:.2f})", y_test, ypred_mlp_rec),
]).sort_values(by="F1-score (Churn)", ascending=False)

comparison.reset_index(drop=True)


In [None]:
# Optional: visualize comparison
plt.figure(figsize=(9,4))
tmp = comparison.set_index("Model")[["Precision (Churn)", "Recall (Churn)", "F1-score (Churn)"]]
tmp.plot(kind="bar")
plt.ylabel("Score")
plt.title("Churn metrics comparison (class = 1)")
plt.xticks(rotation=30, ha="right")
plt.tight_layout()
plt.show()


## 9) Confusion matrices (optional)

This helps students see *what kinds of errors* each model makes.


In [None]:
def plot_confusion(cm, title):
    plt.figure(figsize=(4,4))
    plt.imshow(cm, interpolation="nearest")
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(2)
    plt.xticks(tick_marks, ["0 (No churn)", "1 (Churn)"], rotation=25)
    plt.yticks(tick_marks, ["0 (No churn)", "1 (Churn)"])
    thresh = cm.max() / 2.0
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, format(cm[i, j], "d"),
                     ha="center", va="center",
                     color="white" if cm[i, j] > thresh else "black")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
    plt.tight_layout()
    plt.show()

plot_confusion(confusion_matrix(y_test, ypred_logreg), "Logistic Regression")
plot_confusion(confusion_matrix(y_test, ypred_xgb), "XGBoost")
plot_confusion(confusion_matrix(y_test, ypred_mlp_f1), f"MLP (best F1, t={t_f1:.2f})")
plot_confusion(confusion_matrix(y_test, ypred_mlp_rec), f"MLP (best Recall, t={t_rec:.2f})")


<hr style="border-width:2px;border-color:#75DFC1">

## Takeaway (teaching summary)

- **Class imbalance** matters: churners are the minority class.
- `class_weight` (MLP) and `scale_pos_weight` (XGBoost) help the model pay more attention to churners.
- **Threshold tuning** lets you choose what you optimize:
  - maximize **Recall** → catch more churners (more false alarms)
  - maximize **F1** → best trade-off between precision and recall
- **Accuracy alone is not enough** for churn problems.

<hr style="border-width:2px;border-color:#75DFC1">
