<a href="https://colab.research.google.com/github/c-marq/cap4767-data-mining/blob/main/demos/week04_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 4 Demo ‚Äî Customer Churn: EDA ‚Üí Logistic Regression ‚Üí Neural Networks
**CAP4767 Data Mining with Python** | Miami Dade College ‚Äî Kendall Campus

**Chapters 4 & 5** | Competencies: 1.3, 1.4, 1.5, 1.6, 6 (partial)

**What we're building today:**

| Session | Content | Chapter |
|---------|---------|---------|
| **Session 1** | Statistical EDA + Logistic Regression Baseline | Ch. 4 |
| **Session 2** | Neural Networks + Model Comparison + Risk Scoring | Ch. 5 |

**The business question:** Of 7,032 telecom customers, which ones are about to cancel ‚Äî and what's the dollar cost of getting it wrong?

**Pipeline position:** Week 3 taught you regression on continuous targets. This week we shift to **binary classification** (churn/stay) and introduce **neural networks** as an alternative to logistic regression.

---
## Setup

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Run this cell to load all libraries and suppress TensorFlow warnings. Do not modify.
</div>

In [None]:
# ============================================================
# Setup ‚Äî Run this cell. Do not modify.
# ============================================================
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

from scipy.stats import chi2_contingency, mannwhitneyu, pointbiserialr
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (classification_report, confusion_matrix,
                             ConfusionMatrixDisplay, roc_curve, roc_auc_score,
                             accuracy_score, precision_score, recall_score, f1_score)

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

np.random.seed(42)
tf.random.set_seed(42)

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["figure.dpi"] = 100
sns.set_style("whitegrid")

print(f"TensorFlow version: {tf.__version__}")
print("‚úÖ All libraries loaded")

---
## Load the Telco Churn Dataset

In [None]:
# Load from GitHub
url = "https://raw.githubusercontent.com/c-marq/cap4767-data-mining/refs/heads/main/data/WA_Fn-UseC_-Telco-Customer-Churn.csv"
df = pd.read_csv(url)

print(f"Shape: {df.shape[0]:,} rows √ó {df.shape[1]} columns")
print(f"\nChurn distribution:")
print(df["Churn"].value_counts())
print(f"\nChurn rate: {df['Churn'].value_counts(normalize=True)['Yes']:.1%}")
df.head()

In [None]:
# Data quality: TotalCharges has blanks (new customers with tenure=0)
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")
print(f"Blank TotalCharges: {df['TotalCharges'].isna().sum()} rows (tenure=0 new customers)")
df = df.dropna(subset=["TotalCharges"])
df = df.drop(columns=["customerID"])
print(f"After cleanup: {df.shape[0]:,} rows √ó {df.shape[1]} columns")

---
# SESSION 1 ‚Äî Chapter 4: EDA + Logistic Regression

---
# Example 1 ‚Äî Statistical EDA for Classification

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
  In regression (Week 3), we used correlation to find predictors. With a <strong>binary target</strong> (Yes/No), correlation doesn't work for categorical features. We need different tools: <strong>Cram√©r's V</strong> for categorical features and <strong>Mann-Whitney U + Cohen's d</strong> for continuous features. These tell us which features are worth putting in the model.
</div>

### Cram√©r's V ‚Äî Measuring Association Between Categories

In [None]:
# Helper function: Cram√©r's V
def cramers_v(x, y):
    """Calculate Cram√©r's V between two categorical Series."""
    ct = pd.crosstab(x, y)
    chi2 = chi2_contingency(ct)[0]
    n = ct.sum().sum()
    r, k = ct.shape
    return np.sqrt(chi2 / (n * (min(r, k) - 1)))

# Identify categorical columns (object type + SeniorCitizen which is 0/1)
cat_cols = df.select_dtypes(include=["object"]).columns.tolist()
cat_cols = [c for c in cat_cols if c != "Churn"]

# Compute Cram√©r's V for each categorical feature vs Churn
cramers_results = pd.DataFrame({
    "Feature": cat_cols,
    "Cram√©r's V": [cramers_v(df[col], df["Churn"]) for col in cat_cols]
}).sort_values("Cram√©r's V", ascending=False)

plt.figure(figsize=(10, 6))
bars = plt.barh(cramers_results["Feature"], cramers_results["Cram√©r's V"], color="steelblue")
plt.xlabel("Cram√©r's V (0 = no association, 1 = perfect association)")
plt.title("Categorical Features vs Churn ‚Äî Cram√©r's V")
plt.axvline(x=0.1, color="orange", linestyle="--", alpha=0.7, label="Weak threshold (0.1)")
plt.axvline(x=0.3, color="red", linestyle="--", alpha=0.7, label="Moderate threshold (0.3)")
plt.legend()
plt.tight_layout()
plt.show()

print("Top 5 categorical churn drivers:")
print(cramers_results.head(5).to_string(index=False))

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° READING CRAM√âR'S V</strong><br>
  <ul>
    <li><strong>< 0.1:</strong> Negligible ‚Äî this feature probably doesn't help predict churn</li>
    <li><strong>0.1‚Äì0.3:</strong> Weak to moderate ‚Äî worth including in the model</li>
    <li><strong>> 0.3:</strong> Strong ‚Äî this feature is a major churn driver</li>
  </ul>
  <code>Contract</code> and <code>InternetService</code> should be near the top. Month-to-month contracts and fiber optic internet are the biggest churn signals.
</div>

### Mann-Whitney U + Cohen's d ‚Äî Continuous Features vs Churn

In [None]:
# Helper: Cohen's d
def cohens_d(group1, group2):
    """Effect size: how far apart are the two groups in standard deviations?"""
    n1, n2 = len(group1), len(group2)
    pooled_std = np.sqrt(((n1 - 1) * group1.std()**2 + (n2 - 1) * group2.std()**2) / (n1 + n2 - 2))
    return (group1.mean() - group2.mean()) / pooled_std if pooled_std > 0 else 0

# Continuous features
num_cols = ["tenure", "MonthlyCharges", "TotalCharges"]
churn_yes = df[df["Churn"] == "Yes"]
churn_no = df[df["Churn"] == "No"]

mw_results = []
for col in num_cols:
    u_stat, p_val = mannwhitneyu(churn_yes[col], churn_no[col], alternative="two-sided")
    d = cohens_d(churn_yes[col], churn_no[col])
    mw_results.append({"Feature": col, "U Statistic": f"{u_stat:,.0f}",
                        "p-value": f"{p_val:.2e}", "Cohen's d": f"{d:.3f}",
                        "Effect": "Large" if abs(d) > 0.8 else "Medium" if abs(d) > 0.5 else "Small"})

mw_df = pd.DataFrame(mw_results)
print("Mann-Whitney U Test + Cohen's d (Churned vs Stayed):")
print(mw_df.to_string(index=False))

In [None]:
# Visualize: distribution of continuous features by churn status
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, col in zip(axes, num_cols):
    for label, color in [("No", "steelblue"), ("Yes", "salmon")]:
        subset = df[df["Churn"] == label][col]
        ax.hist(subset, bins=30, alpha=0.6, color=color, label=f"Churn={label}")
    ax.set_title(f"{col} by Churn Status")
    ax.set_xlabel(col)
    ax.legend()
plt.tight_layout()
plt.show()

In [None]:
# Quick check: point-biserial correlation
churn_binary = (df["Churn"] == "Yes").astype(int)
print("Point-Biserial Correlation with Churn:")
for col in num_cols:
    r, p = pointbiserialr(churn_binary, df[col])
    print(f"  {col:20s} r = {r:+.3f}  (p = {p:.2e})")

<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #922B21;">üõë STOP AND CHECK ‚Äî Checkpoint 1</strong><br>
  <ul>
    <li><strong>Cram√©r's V:</strong> Contract and InternetService should be the top 2 categorical drivers</li>
    <li><strong>Cohen's d:</strong> tenure should show a large negative effect (churners have shorter tenure)</li>
    <li><strong>Point-biserial:</strong> tenure should have a negative correlation with churn</li>
  </ul>
  The story is forming: <em>new customers on month-to-month contracts with fiber optic internet are the highest risk.</em>
</div>

---
# Example 2 ‚Äî Logistic Regression Baseline + Business Cost of Churn

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
  A model is only useful if leadership understands <strong>what it costs to get it wrong</strong>. Before building anything, we put a dollar figure on churn. Then we build logistic regression ‚Äî the interpretable baseline that can tell leadership <em>why</em> customers are leaving, not just <em>which</em> ones.
</div>

In [None]:
# Business cost of churn
avg_monthly = df["MonthlyCharges"].mean()
avg_tenure = df[df["Churn"] == "No"]["tenure"].mean()
churned_count = (df["Churn"] == "Yes").sum()
acquisition_cost = 300  # Industry benchmark for telecom

# Annual revenue lost from churned customers
annual_revenue_lost = churned_count * avg_monthly * 12
# Lifetime value lost (remaining months they would have stayed)
avg_remaining = avg_tenure - df[df["Churn"] == "Yes"]["tenure"].mean()
lifetime_lost = churned_count * avg_monthly * avg_remaining
# Replacement cost
replacement_cost = churned_count * acquisition_cost

print(f"{'='*50}")
print(f"  BUSINESS COST OF CHURN ‚Äî Telco Dataset")
print(f"{'='*50}")
print(f"  Churned customers:        {churned_count:,}")
print(f"  Avg monthly charge:       ${avg_monthly:,.2f}")
print(f"  Avg tenure (stayed):      {avg_tenure:.0f} months")
print(f"  Acquisition cost/customer: ${acquisition_cost}")
print(f"{'='*50}")
print(f"  Annual revenue at risk:   ${annual_revenue_lost:,.0f}")
print(f"  Lifetime value lost:      ${lifetime_lost:,.0f}")
print(f"  Replacement cost:         ${replacement_cost:,.0f}")
print(f"  TOTAL ESTIMATED IMPACT:   ${lifetime_lost + replacement_cost:,.0f}")
print(f"{'='*50}")

### Preprocessing Pipeline

In [None]:
# Full preprocessing pipeline
df_model = df.copy()

# Simplify "No internet service" / "No phone service" ‚Üí "No"
replace_cols = ["OnlineSecurity", "OnlineBackup", "DeviceProtection",
                "TechSupport", "StreamingTV", "StreamingMovies", "MultipleLines"]
for col in replace_cols:
    df_model[col] = df_model[col].replace({"No internet service": "No", "No phone service": "No"})

# Binary encode Yes/No columns
binary_cols = ["Partner", "Dependents", "PhoneService", "PaperlessBilling", "Churn"]
for col in binary_cols:
    df_model[col] = df_model[col].map({"Yes": 1, "No": 0})

# Encode gender
df_model["gender"] = df_model["gender"].map({"Male": 1, "Female": 0})

# Encode remaining binary Yes/No columns
for col in replace_cols:
    df_model[col] = df_model[col].map({"Yes": 1, "No": 0})

# One-hot encode multi-category columns
df_model = pd.get_dummies(df_model, columns=["InternetService", "Contract", "PaymentMethod"],
                           drop_first=True, dtype=int)

# Separate features and target
X = df_model.drop(columns=["Churn"])
y = df_model["Churn"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale continuous features
scaler = StandardScaler()
continuous = ["tenure", "MonthlyCharges", "TotalCharges"]
X_train[continuous] = scaler.fit_transform(X_train[continuous])
X_test[continuous] = scaler.transform(X_test[continuous])

feature_names = X_train.columns.tolist()
n_features = len(feature_names)

print(f"Features: {n_features}")
print(f"Train: {X_train.shape[0]:,} | Test: {X_test.shape[0]:,}")
print(f"Churn rate ‚Äî Train: {y_train.mean():.1%} | Test: {y_test.mean():.1%}")

### Logistic Regression Baseline

In [None]:
# Logistic Regression
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train, y_train)

lr_predictions = lr_model.predict(X_test)
lr_probabilities = lr_model.predict_proba(X_test)[:, 1]

print("Logistic Regression ‚Äî Classification Report:")
print(classification_report(y_test, lr_predictions, target_names=["Stayed", "Churned"]))

lr_accuracy = accuracy_score(y_test, lr_predictions)
lr_auc = roc_auc_score(y_test, lr_probabilities)
print(f"Accuracy: {lr_accuracy:.4f}")
print(f"AUC:      {lr_auc:.4f}")

In [None]:
# Confusion matrix
cm_lr = confusion_matrix(y_test, lr_predictions)
fig, ax = plt.subplots(figsize=(6, 5))
sns.heatmap(cm_lr, annot=True, fmt="d", cmap="Blues",
            xticklabels=["Stayed", "Churned"],
            yticklabels=["Stayed", "Churned"], ax=ax)
ax.set_xlabel("Predicted")
ax.set_ylabel("Actual")
ax.set_title("Confusion Matrix ‚Äî Logistic Regression")
plt.tight_layout()
plt.show()

tn, fp, fn, tp = cm_lr.ravel()
print(f"True Negatives  (correctly predicted stayed):  {tn}")
print(f"False Positives (predicted churn, actually stayed): {fp}")
print(f"False Negatives (predicted stayed, actually churned): {fn}  ‚Üê COSTLY")
print(f"True Positives  (correctly predicted churned):  {tp}")
print(f"\nüí∞ Each False Negative = a churner we MISSED ‚Äî they leave without intervention")

In [None]:
# Top churn drivers ‚Äî coefficient interpretation
coef_df = pd.DataFrame({
    "Feature": feature_names,
    "Coefficient": lr_model.coef_[0]
}).sort_values("Coefficient", ascending=False)

fig, ax = plt.subplots(figsize=(10, 8))
top_pos = coef_df.head(5)
top_neg = coef_df.tail(5)
display_df = pd.concat([top_pos, top_neg])

colors = ["salmon" if c > 0 else "steelblue" for c in display_df["Coefficient"]]
ax.barh(display_df["Feature"], display_df["Coefficient"], color=colors)
ax.set_xlabel("Coefficient (positive = increases churn probability)")
ax.set_title("Top 5 Positive & Negative Churn Drivers")
ax.axvline(x=0, color="black", linewidth=0.5)
plt.tight_layout()
plt.show()

print("Top 5 features INCREASING churn risk:")
for _, row in top_pos.iterrows():
    print(f"  {row['Feature']:45s} {row['Coefficient']:+.4f}")
print("\nTop 5 features DECREASING churn risk:")
for _, row in top_neg.iterrows():
    print(f"  {row['Feature']:45s} {row['Coefficient']:+.4f}")

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° READING THE COEFFICIENTS</strong><br>
  <ul>
    <li><strong>Month-to-month contract</strong> (positive, large): Biggest churn driver. No commitment = easy to leave.</li>
    <li><strong>Fiber optic internet</strong> (positive): Higher churn than DSL ‚Äî possibly price sensitivity or service issues.</li>
    <li><strong>Electronic check payment</strong> (positive): Less "sticky" than auto-pay ‚Äî no friction to stop paying.</li>
    <li><strong>Tenure</strong> (negative, large): Longer tenure = less likely to churn. Loyalty builds over time.</li>
    <li><strong>Two-year contract</strong> (negative): Lock-in reduces churn. The business implication is clear: <em>get customers onto contracts.</em></li>
  </ul>
</div>

In [None]:
# Risk scoring ‚Äî rank customers by churn probability
risk_df = X_test.copy()
risk_df["churn_probability"] = lr_probabilities
risk_df["actual_churn"] = y_test.values
risk_df = risk_df.sort_values("churn_probability", ascending=False)

high_risk = risk_df[risk_df["churn_probability"] >= 0.5]
print(f"High-risk customers (prob ‚â• 0.5): {len(high_risk):,}")
print(f"Of those, actually churned: {high_risk['actual_churn'].sum():,} ({high_risk['actual_churn'].mean():.1%})")

plt.figure(figsize=(10, 4))
plt.hist(lr_probabilities[y_test == 0], bins=30, alpha=0.6, color="steelblue", label="Stayed")
plt.hist(lr_probabilities[y_test == 1], bins=30, alpha=0.6, color="salmon", label="Churned")
plt.xlabel("Predicted Churn Probability")
plt.ylabel("Count")
plt.title("Distribution of Churn Probabilities by Actual Outcome")
plt.legend()
plt.axvline(x=0.5, color="red", linestyle="--", alpha=0.7, label="Threshold (0.5)")
plt.tight_layout()
plt.show()

<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #922B21;">üõë STOP AND CHECK ‚Äî Checkpoint 2 (End of Session 1)</strong><br>
  <ul>
    <li>Logistic regression accuracy ‚âà 80%, recall on churners ‚âà 54%</li>
    <li>AUC ‚âà 0.84 ‚Äî good but not great</li>
    <li>The model catches about half of actual churners ‚Äî the other half slip through</li>
    <li>Coefficients tell a clear story: month-to-month + fiber optic + electronic check = highest risk</li>
  </ul>
  <strong>Session 1 ends here. Session 2 picks up with neural networks.</strong>
</div>

---
# SESSION 2 ‚Äî Chapter 5: Neural Networks

---
# Example 3 ‚Äî Single-Neuron Neural Network (The Bridge)

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
  Before adding layers and complexity, we prove something powerful: a neural network with <strong>one neuron and sigmoid activation</strong> is mathematically identical to logistic regression. Same inputs, same activation function, same output. The results should be nearly identical ‚Äî and that's the point.
</div>

In [None]:
# Single-neuron neural network = logistic regression
model_single = Sequential([
    Dense(1, activation="sigmoid", input_shape=(n_features,))
])

model_single.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model_single.summary()
print(f"\nTotal parameters: {n_features + 1} (one weight per feature + 1 bias)")

In [None]:
# Train the single neuron
history_single = model_single.fit(
    X_train, y_train,
    epochs=100, batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
single_loss, single_acc = model_single.evaluate(X_test, y_test, verbose=0)
single_pred = (model_single.predict(X_test, verbose=0) > 0.5).astype(int).ravel()

print(f"\nSingle-Neuron ANN:")
print(f"  Accuracy: {single_acc:.4f}")
print(f"  LR Accuracy: {lr_accuracy:.4f}")
print(f"  Difference: {abs(single_acc - lr_accuracy):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, single_pred, target_names=["Stayed", "Churned"]))

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ TEACHING MOMENT</strong><br>
  The numbers are nearly identical. A neural network with one neuron <strong>IS</strong> logistic regression. The power of neural networks comes from <strong>adding hidden layers</strong> ‚Äî that's what lets them learn nonlinear patterns that logistic regression cannot.
</div>

---
# Example 4 ‚Äî Three-Layer ANN (Overfitting Demo)

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
  More neurons = more power, but also more risk of <strong>overfitting</strong> (memorizing the training data instead of learning patterns). This example deliberately shows what overfitting looks like in the loss curves so you can diagnose it in your own models.
</div>

In [None]:
# Three-layer ANN ‚Äî no regularization
model_overfit = Sequential([
    Dense(n_features, activation="relu", input_shape=(n_features,)),
    Dense(15, activation="relu"),
    Dense(1, activation="sigmoid")
])

model_overfit.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model_overfit.summary()
print(f"\nParameters: {model_overfit.count_params():,} (vs {n_features + 1} in single neuron)")

In [None]:
# Train WITHOUT regularization ‚Äî watch for overfitting
history_overfit = model_overfit.fit(
    X_train, y_train,
    epochs=100, batch_size=32,
    validation_split=0.2,
    verbose=0
)

print(f"Training complete: 100 epochs (no early stopping)")

In [None]:
# Plot loss curves ‚Äî diagnose overfitting
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history_overfit.history["loss"], label="Training Loss", color="steelblue")
axes[0].plot(history_overfit.history["val_loss"], label="Validation Loss", color="salmon")
axes[0].set_title("Loss Curves ‚Äî 3-Layer ANN (No Regularization)")
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss")
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy
axes[1].plot(history_overfit.history["accuracy"], label="Training Accuracy", color="steelblue")
axes[1].plot(history_overfit.history["val_accuracy"], label="Validation Accuracy", color="salmon")
axes[1].set_title("Accuracy Curves ‚Äî 3-Layer ANN (No Regularization)")
axes[1].set_xlabel("Epoch")
axes[1].set_ylabel("Accuracy")
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate ‚Äî likely WORSE than logistic regression
overfit_loss, overfit_acc = model_overfit.evaluate(X_test, y_test, verbose=0)
overfit_pred = (model_overfit.predict(X_test, verbose=0) > 0.5).astype(int).ravel()

print(f"3-Layer ANN (no regularization):")
print(f"  Accuracy: {overfit_acc:.4f} (LR was {lr_accuracy:.4f})")
print(f"\nClassification Report:")
print(classification_report(y_test, overfit_pred, target_names=["Stayed", "Churned"]))

<div style="background-color: #FEF9E7; border-left: 5px solid #F1C40F; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #7D6608;">‚ö†Ô∏è THE OVERFITTING DIAGNOSIS</strong><br>
  Look at the loss curves: training loss drops smoothly, but <strong>validation loss rises after ~30 epochs</strong>. The model is memorizing the training data instead of learning generalizable patterns. More parameters didn't help ‚Äî they made things worse. We need two tools: <strong>Dropout</strong> (randomly disable neurons during training) and <strong>Early Stopping</strong> (stop training when validation loss starts rising).
</div>

---
# Example 5 ‚Äî Full Pipeline: Tuned ANN + Model Comparison

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
  This is the closing case study. We fix the overfitting with <strong>Dropout + Early Stopping</strong>, then compare the tuned ANN against logistic regression head-to-head. The first half runs pre-filled. The second half is your turn ‚Äî you'll evaluate the model and build the comparison.
</div>

In [None]:
# Tuned ANN ‚Äî Dropout + Early Stopping
model_tuned = Sequential([
    Dense(n_features, activation="relu", input_shape=(n_features,)),
    Dropout(0.3),
    Dense(15, activation="relu"),
    Dropout(0.2),
    Dense(1, activation="sigmoid")
])

model_tuned.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model_tuned.summary()

In [None]:
# Early Stopping callback
early_stop = EarlyStopping(
    monitor="val_loss",
    patience=10,
    restore_best_weights=True,
    verbose=1
)

# Train with early stopping
history_tuned = model_tuned.fit(
    X_train, y_train,
    epochs=200,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=0
)

actual_epochs = len(history_tuned.history["loss"])
print(f"\n‚úÖ Training stopped at epoch {actual_epochs} (max was 200)")

In [None]:
# Improved loss curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(history_tuned.history["loss"], label="Training Loss", color="steelblue")
axes[0].plot(history_tuned.history["val_loss"], label="Validation Loss", color="salmon")
axes[0].axvline(x=actual_epochs - 1, color="green", linestyle="--", alpha=0.5, label=f"Early Stop (epoch {actual_epochs})")
axes[0].set_title("Loss Curves ‚Äî Tuned ANN (Dropout + Early Stopping)")
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss")
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(history_tuned.history["accuracy"], label="Training Accuracy", color="steelblue")
axes[1].plot(history_tuned.history["val_accuracy"], label="Validation Accuracy", color="salmon")
axes[1].axvline(x=actual_epochs - 1, color="green", linestyle="--", alpha=0.5, label=f"Early Stop (epoch {actual_epochs})")
axes[1].set_title("Accuracy Curves ‚Äî Tuned ANN (Dropout + Early Stopping)")
axes[1].set_xlabel("Epoch")
axes[1].set_ylabel("Accuracy")
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #922B21;">üõë STOP AND CHECK ‚Äî Checkpoint 3</strong><br>
  <ul>
    <li>Training should stop between epochs 30‚Äì60</li>
    <li>The gap between training and validation loss should be much smaller than Example 4</li>
    <li>If it ran all 200 epochs, EarlyStopping isn't configured correctly</li>
  </ul>
</div>

---
## Your Turn ‚Äî Evaluate and Compare

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS ‚Äî Live Class Participation</strong><br>
  Complete the cells below to evaluate the tuned ANN and compare it to logistic regression.
</div>

In [None]:
# YOUR CODE HERE ‚Äî Evaluate the tuned ANN
# 1. Generate predictions (threshold 0.5) ‚Üí store in: ann_predictions
# 2. Generate probabilities ‚Üí store in: ann_probabilities
# 3. Print the classification report



In [None]:
# YOUR CODE HERE ‚Äî ROC Curve Comparison
# 1. Calculate ROC curve for LR (lr_probabilities already exists)
# 2. Calculate ROC curve for ANN (ann_probabilities from above)
# 3. Plot BOTH on a single figure
# 4. Include AUC in the legend
# Colors: LR = "#0f3460" (navy), ANN = "#e94560" (coral)



In [None]:
# YOUR CODE HERE ‚Äî Side-by-side comparison table
# Build a DataFrame comparing:
#   Accuracy, Precision (Churned), Recall (Churned), F1 (Churned), AUC
# for both Logistic Regression and Tuned ANN



In [None]:
# YOUR CODE HERE ‚Äî Customers flagged by ANN but missed by LR
# Find customers where ANN predicted churn but LR did not
# How many additional customers does the ANN catch?



---
## Takeaway

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ WHAT WE BUILT TODAY</strong><br>
  A complete churn prediction pipeline ‚Äî from raw data through statistical analysis, logistic regression, and neural networks. The logistic regression gave us an interpretable baseline; the neural network found additional patterns at the cost of explainability. In practice, many teams deploy both.
</div>

**Regression vs Classification comparison:**

| | Week 3 (Regression) | Week 4 (Classification) |
|---|---|---|
| Target | Continuous (dollars) | Binary (churn yes/no) |
| EDA tools | Correlation, scatterplots | Cram√©r's V, Mann-Whitney U, Cohen's d |
| Baseline model | Linear Regression | Logistic Regression |
| Advanced model | Multiple Regression | Neural Network |
| Evaluation | R¬≤, RMSE | Accuracy, Precision, Recall, F1, AUC |

**Next chapter preview:** We shift from predicting *individual* outcomes to discovering *population-level* patterns ‚Äî from supervised to unsupervised learning.

---
<p style="color:#7F8C8D; font-size:0.85em;">
<em>CAP4767 Data Mining with Python | Miami Dade College | Spring 2026</em><br>
Week 4 Demo ‚Äî Customer Churn: EDA ‚Üí Logistic Regression ‚Üí Neural Networks
</p>