<a href="https://colab.research.google.com/github/c-marq/cap4767-data-mining/blob/main/solutions/labs/lab03_churn_neural_networks_solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3 ‚Äî SOLUTION KEY üîë
## Churn Prediction: Full Pipeline
**CAP4767 Data Mining with Python** | Miami Dade College ‚Äî Kendall Campus

**Points:** 20 | **Format:** Individual | **Due:** End of Week 4

| Part | Skills (Chapter) | Points |
|------|-----------------|--------|
| A: EDA | Cram√©r's V, Mann-Whitney U, business cost (Ch. 4) | 4 |
| B: Logistic Regression | Baseline model + coefficient interpretation (Ch. 4) | 3 |
| C: Neural Network | Keras ANN + dropout + early stopping (Ch. 5) | 4 |
| D: Model Comparison | ROC curves + metrics table (Ch. 5) | 3 |
| E: Written Analysis | Business recommendation (300+ words) | 4 |
| F: Preprocessing | Pipeline runs correctly | 2 |
| Bonus | Third model variant | +3 |

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° GRADING PHILOSOPHY</strong><br>
  This lab rewards <strong>process over perfection</strong>. If your ANN performs <em>worse</em> than logistic regression, that's a valid result ‚Äî your written analysis should explain why.
</div>

<div style="background-color: #FEF9E7; border-left: 5px solid #F1C40F; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #7D6608;">‚ö†Ô∏è IMPORTANT</strong><br>
  Do NOT use the Telco dataset from class. You must use one of the two options below. Using the Telco dataset = <strong>-5 point deduction</strong>.
</div>

### Student Information
- **Name:** SOLUTION KEY
- **Date:** Spring 2026
- **Dataset Chosen:** A (Bank Churn)

---
## Setup

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Run this cell. Do not modify.
</div>

In [None]:
# ============================================================
# Setup ‚Äî Run this cell. Do not modify.
# ============================================================
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

from scipy.stats import chi2_contingency, mannwhitneyu
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (classification_report, confusion_matrix,
                             ConfusionMatrixDisplay, roc_curve, roc_auc_score,
                             accuracy_score, precision_score, recall_score, f1_score)

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

np.random.seed(42)
tf.random.set_seed(42)

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["figure.dpi"] = 100
sns.set_style("whitegrid")

# Helper functions (pre-built ‚Äî use these in your EDA)
def cramers_v(x, y):
    """Cram√©r's V: association between two categorical variables (0‚Äì1)."""
    ct = pd.crosstab(x, y)
    chi2 = chi2_contingency(ct)[0]
    n = ct.sum().sum()
    r, k = ct.shape
    return np.sqrt(chi2 / (n * (min(r, k) - 1)))

def cohens_d(group1, group2):
    """Cohen's d: effect size between two groups."""
    n1, n2 = len(group1), len(group2)
    pooled = np.sqrt(((n1-1)*group1.std()**2 + (n2-1)*group2.std()**2) / (n1+n2-2))
    return (group1.mean() - group2.mean()) / pooled if pooled > 0 else 0

print(f"TensorFlow: {tf.__version__}")
print("‚úÖ Setup complete ‚Äî helper functions loaded: cramers_v(), cohens_d()")

---
## Choose Your Dataset + Run Preprocessing

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Uncomment <strong>ONE</strong> option below and run the cell. This handles all preprocessing and gives you clean train/test splits.
</div>

In [None]:
# ============================================================
# OPTION A ‚Äî Bank Customer Churn (~10,000 rows)
# Uncomment the lines below if choosing Option A
# ============================================================
url = "https://raw.githubusercontent.com/c-marq/cap4767-data-mining/refs/heads/main/data/Churn_Modelling.csv"
df_raw = pd.read_csv(url)
TARGET = "Exited"
DOMAIN = "Banking"

# Preprocessing
df = df_raw.drop(columns=["RowNumber", "CustomerId", "Surname"])
df["Gender"] = df["Gender"].map({"Male": 1, "Female": 0})
df = pd.get_dummies(df, columns=["Geography"], drop_first=True, dtype=int)

# Feature lists for EDA
cat_features = ["Gender", "HasCrCard", "IsActiveMember", "NumOfProducts",
                "Geography_Germany", "Geography_Spain"]
num_features = ["CreditScore", "Age", "Tenure", "Balance", "EstimatedSalary"]

# ============================================================
# OPTION B ‚Äî Credit Card Customer Attrition (~10,000 rows)
# Uncomment the lines below if choosing Option B
# ============================================================
# url = "https://raw.githubusercontent.com/c-marq/cap4767-data-mining/refs/heads/main/data/BankChurners.csv"
# df_raw = pd.read_csv(url)
# TARGET = "Attrition_Flag"
# DOMAIN = "Credit Card Services"
#
# # Preprocessing
# # Drop ID and the two Naive Bayes leakage columns
# leak_cols = [c for c in df_raw.columns if c.startswith("Naive_Bayes")]
# df = df_raw.drop(columns=["CLIENTNUM"] + leak_cols)
#
# # Encode target: Attrited Customer = 1, Existing Customer = 0
# df[TARGET] = df[TARGET].map({"Attrited Customer": 1, "Existing Customer": 0})
#
# # Encode categoricals
# df["Gender"] = df["Gender"].map({"M": 1, "F": 0})
# df = pd.get_dummies(df, columns=["Education_Level", "Marital_Status",
#                                    "Income_Category", "Card_Category"],
#                      drop_first=True, dtype=int)
#
# # Feature lists for EDA
# cat_features = ["Gender"] + [c for c in df.columns if any(
#     c.startswith(p) for p in ["Education_Level_", "Marital_Status_",
#                                "Income_Category_", "Card_Category_"])]
# num_features = ["Customer_Age", "Dependent_count", "Months_on_book",
#                 "Total_Relationship_Count", "Months_Inactive_12_mon",
#                 "Contacts_Count_12_mon", "Credit_Limit", "Total_Revolving_Bal",
#                 "Avg_Open_To_Buy", "Total_Amt_Chng_Q4_Q1", "Total_Trans_Amt",
#                 "Total_Trans_Ct", "Total_Ct_Chng_Q4_Q1", "Avg_Utilization_Ratio"]

# ============================================================
# Common pipeline (runs for whichever option you chose)
# ============================================================
X = df.drop(columns=[TARGET])
y = df[TARGET]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns, index=X_train.index)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

feature_names = X_train.columns.tolist()
n_features = len(feature_names)

print(f"Dataset: {DOMAIN}")
print(f"Shape: {df.shape[0]:,} rows √ó {df.shape[1]} columns ‚Üí {n_features} features")
print(f"Train: {X_train.shape[0]:,} | Test: {X_test.shape[0]:,}")
print(f"Churn rate: {y.mean():.1%}")
print(f"\n‚úÖ Preprocessing complete ‚Äî X_train_scaled, X_test_scaled, y_train, y_test ready")

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° WHAT THE PREPROCESSING DID</strong><br>
  <ul>
    <li>Dropped non-predictive ID columns</li>
    <li>Encoded the target as binary (1 = churned, 0 = stayed)</li>
    <li>Converted categorical features to dummy variables with <code>drop_first=True</code></li>
    <li>Scaled all features with <code>StandardScaler</code> (fit on train, transform on test)</li>
    <li><strong>Option B only:</strong> Removed two columns that contained pre-computed model outputs ‚Äî using them would be <strong>data leakage</strong> (the model would "cheat" by seeing answers derived from the target)</li>
  </ul>
  <code>cat_features</code> and <code>num_features</code> lists are ready for your EDA.
</div>

---
# Part A ‚Äî Exploratory Data Analysis (4 points)

### Task 1 ‚Äî Data Inspection (1 pt)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Print the shape, <code>.info()</code>, churn rate, and first 5 rows. Describe the dataset in 2‚Äì3 sentences.
</div>

In [None]:
# Task 1: Data inspection
print(f"Shape: {df.shape}")
print(f"\nChurn rate: {y.mean():.1%}")
print(f"\nChurned: {y.sum():,} | Stayed: {(y==0).sum():,}")
df.info()
print()
df.head()

**Dataset description (2‚Äì3 sentences):**

**Sample:** The Bank Customer Churn dataset contains 10,000 customers with 11 features covering demographics (age, gender, geography), banking relationship (tenure, balance, products), and activity status. The churn rate is approximately 20%, meaning roughly 1 in 5 customers left the bank. The dataset is moderately imbalanced ‚Äî the majority class (stayed) is about 4x the minority class (churned).

### Task 2 ‚Äî Cram√©r's V Analysis (1 pt)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Compute Cram√©r's V between each feature in <code>cat_features</code> and the target. Display as a sorted bar chart.
</div>

In [None]:
# Task 2: Cram√©r's V
cv_results = pd.DataFrame({
    "Feature": cat_features,
    "Cram√©r's V": [cramers_v(df[col], df[TARGET]) for col in cat_features]
}).sort_values("Cram√©r's V", ascending=False)

plt.figure(figsize=(8, 5))
plt.barh(cv_results["Feature"], cv_results["Cram√©r's V"], color="steelblue")
plt.xlabel("Cram√©r's V")
plt.title(f"Categorical Features vs {TARGET} ‚Äî Cram√©r's V")
plt.axvline(x=0.1, color="orange", linestyle="--", alpha=0.7, label="Weak (0.1)")
plt.legend()
plt.tight_layout()
plt.show()
print(cv_results.to_string(index=False))

**Interpretation (2‚Äì3 sentences):** Which categorical features have the strongest association with churn?

**Sample:** IsActiveMember and Geography_Germany show the strongest associations with churn. The Germany geography effect is notable ‚Äî German customers churn at higher rates than French or Spanish customers, possibly reflecting different competitive dynamics or service levels in that market. Gender and HasCrCard show weak associations, suggesting they are not strong churn predictors.

### Task 3 ‚Äî Mann-Whitney U + Cohen's d (1 pt)

In [None]:
# Task 3: Mann-Whitney U + Cohen's d
churned = df[df[TARGET] == 1]
stayed = df[df[TARGET] == 0]

mw_results = []
for col in num_features:
    u_stat, p_val = mannwhitneyu(churned[col], stayed[col], alternative="two-sided")
    d = cohens_d(churned[col], stayed[col])
    mw_results.append({"Feature": col, "U Statistic": f"{u_stat:,.0f}",
                        "p-value": f"{p_val:.2e}", "Cohen's d": d,
                        "Effect": "Large" if abs(d)>0.8 else "Medium" if abs(d)>0.5 else "Small"})

mw_df = pd.DataFrame(mw_results).sort_values("Cohen's d", key=abs, ascending=False)

# Optional: Format as string for display AFTER sorting
# mw_df["Cohen's d"] = mw_df["Cohen's d"].map("{:.3f}".format)

print("Mann-Whitney U + Cohen's d:")
print(mw_df.to_string(index=False))

**Interpretation (2‚Äì3 sentences):** Which numerical features show the largest effect sizes?

**Sample:** Age shows the largest Cohen's d ‚Äî churners are significantly older on average, suggesting the bank may be losing its more established customers. Balance also shows a meaningful effect size, with churners having higher balances on average, which is counterintuitive ‚Äî these are valuable customers the bank should be working hardest to retain. Tenure shows a smaller effect than expected.

### Task 4 ‚Äî Business Cost Estimate (1 pt)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Estimate the annual cost of churn. State your assumptions clearly in comments.<br>
  Use reasonable estimates for your domain (banking or credit card services).
</div>

In [None]:
# Task 4: Business cost estimate
# Assumptions for banking:
# - Average annual revenue per customer: ~$1,200 (fees, interest, products)
# - Customer acquisition cost: ~$400 (marketing, onboarding)
# - Average remaining lifetime: 5 years for retained customers

churned_count = y.sum()
avg_annual_revenue = 1200   # Conservative banking estimate
acquisition_cost = 400      # Industry benchmark
remaining_years = 5

lifetime_lost = churned_count * avg_annual_revenue * remaining_years
replacement = churned_count * acquisition_cost

print(f"Churned customers: {churned_count:,}")
print(f"Lifetime value lost: ${lifetime_lost:,.0f}")
print(f"Replacement cost:    ${replacement:,.0f}")
print(f"TOTAL IMPACT:        ${lifetime_lost + replacement:,.0f}")

---
# Part B ‚Äî Logistic Regression (3 points)

### Task 5 ‚Äî Build and Evaluate (1.5 pts)

In [None]:
# Task 5: Logistic regression
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train_scaled, y_train)
lr_predictions = lr_model.predict(X_test_scaled)
lr_probabilities = lr_model.predict_proba(X_test_scaled)[:, 1]

lr_auc = roc_auc_score(y_test, lr_probabilities)
print("Logistic Regression:")
print(classification_report(y_test, lr_predictions, target_names=["Stayed", "Churned"]))
print(f"AUC: {lr_auc:.4f}")

### Task 6 ‚Äî Coefficient Interpretation (1.5 pts)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Display top 5 positive and top 5 negative coefficients. Explain the top 3 churn drivers in business terms.
</div>

In [None]:
# Task 6: Coefficient interpretation
coef_df = pd.DataFrame({
    "Feature": feature_names,
    "Coefficient": lr_model.coef_[0]
}).sort_values("Coefficient", ascending=False)

print("Top 5 INCREASING churn risk:")
print(coef_df.head(5).to_string(index=False))
print("\nTop 5 DECREASING churn risk:")
print(coef_df.tail(5).to_string(index=False))

# Visualization
display_df = pd.concat([coef_df.head(5), coef_df.tail(5)])
colors = ["salmon" if c > 0 else "steelblue" for c in display_df["Coefficient"]]
plt.figure(figsize=(10, 6))
plt.barh(display_df["Feature"], display_df["Coefficient"], color=colors)
plt.xlabel("Coefficient")
plt.title("Top Churn Drivers (Logistic Regression)")
plt.axvline(x=0, color="black", linewidth=0.5)
plt.tight_layout()
plt.show()

**Interpretation (3‚Äì4 sentences):** What does the model say drives churn in this business? Would these findings surprise company leadership?

**Sample:** The model reveals that Age and Geography_Germany are among the strongest churn drivers ‚Äî older customers and German-market customers are most at risk. IsActiveMember has a strong negative coefficient, confirming that engaged customers stay. The Balance finding might surprise leadership: customers with higher balances are more likely to churn, suggesting these high-value customers may be getting better offers from competitors. This isn't just a service problem ‚Äî it's a competitive positioning problem.

<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #922B21;">üõë CHECKPOINT</strong><br>
  LR should show reasonable accuracy (70‚Äì85%) and an AUC above 0.70. If accuracy equals the majority class rate exactly, the model may be predicting all one class.
</div>

---
# Part C ‚Äî Neural Network (4 points)

### Task 7 ‚Äî Build and Train a Keras ANN (2 pts)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  Build a Sequential model with at least 2 hidden layers, dropout, and early stopping. Train and capture history.
</div>

In [None]:
# Task 7: Build and train ANN
model = Sequential([
    Dense(n_features, activation="relu", input_shape=(n_features,)),
    Dropout(0.3),
    Dense(15, activation="relu"),
    Dropout(0.2),
    Dense(1, activation="sigmoid")
])
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

early_stop = EarlyStopping(monitor="val_loss", patience=10,
                            restore_best_weights=True, verbose=1)

history = model.fit(
    X_train_scaled, y_train,
    epochs=200, batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=0
)
print(f"Training stopped at epoch {len(history.history['loss'])}")
model.summary()

### Task 8 ‚Äî Training Curves (1 pt)

In [None]:
# Task 8: Training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
epochs_run = len(history.history["loss"])

axes[0].plot(history.history["loss"], label="Training Loss", color="steelblue")
axes[0].plot(history.history["val_loss"], label="Validation Loss", color="salmon")
axes[0].set_title("Loss Curves")
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss")
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(history.history["accuracy"], label="Training Accuracy", color="steelblue")
axes[1].plot(history.history["val_accuracy"], label="Validation Accuracy", color="salmon")
axes[1].set_title("Accuracy Curves")
axes[1].set_xlabel("Epoch")
axes[1].set_ylabel("Accuracy")
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

**Interpretation (2‚Äì3 sentences):** What epoch did early stopping trigger? Is there evidence of overfitting?

**Sample:** Early stopping triggered around epoch 35-50, well before the maximum 200 epochs. The training and validation loss curves track fairly close together with only a small gap, suggesting that dropout is effectively preventing severe overfitting. The gap is smaller than what we'd see without regularization.

### Task 9 ‚Äî Evaluate the ANN (1 pt)

In [None]:
# Task 9: Evaluate ANN
ann_probabilities = model.predict(X_test_scaled, verbose=0).ravel()
ann_predictions = (ann_probabilities > 0.5).astype(int)
ann_auc = roc_auc_score(y_test, ann_probabilities)

print("Neural Network:")
print(classification_report(y_test, ann_predictions, target_names=["Stayed", "Churned"]))
print(f"AUC: {ann_auc:.4f}")

---
# Part D ‚Äî Model Comparison (3 points)

### Task 10 ‚Äî ROC + Metrics Table (3 pts)

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
  <ol>
    <li>Plot ROC curves for both models on a single figure (LR = navy, ANN = coral)</li>
    <li>Build a comparison table: accuracy, precision, recall, F1, AUC for both</li>
    <li>Count customers flagged by ANN but missed by LR</li>
  </ol>
</div>

In [None]:
# Task 10: ROC curves + comparison
# ROC curves
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probabilities)
ann_fpr, ann_tpr, _ = roc_curve(y_test, ann_probabilities)

plt.figure(figsize=(8, 6))
plt.plot(lr_fpr, lr_tpr, color="#0f3460", linewidth=2, label=f"Logistic Regression (AUC={lr_auc:.3f})")
plt.plot(ann_fpr, ann_tpr, color="#e94560", linewidth=2, label=f"Neural Network (AUC={ann_auc:.3f})")
plt.plot([0, 1], [0, 1], "k--", alpha=0.5, label="Random Guess")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve Comparison")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Comparison table
comparison = pd.DataFrame({
    "Metric": ["Accuracy", "Precision (Churned)", "Recall (Churned)", "F1 (Churned)", "AUC"],
    "Logistic Regression": [
        f"{accuracy_score(y_test, lr_predictions):.4f}",
        f"{precision_score(y_test, lr_predictions):.4f}",
        f"{recall_score(y_test, lr_predictions):.4f}",
        f"{f1_score(y_test, lr_predictions):.4f}",
        f"{lr_auc:.4f}"
    ],
    "Neural Network": [
        f"{accuracy_score(y_test, ann_predictions):.4f}",
        f"{precision_score(y_test, ann_predictions):.4f}",
        f"{recall_score(y_test, ann_predictions):.4f}",
        f"{f1_score(y_test, ann_predictions):.4f}",
        f"{ann_auc:.4f}"
    ]
})
print(comparison.to_string(index=False))

# Additional catches
ann_only = ((ann_predictions == 1) & (lr_predictions == 0)).sum()
print(f"\nCustomers flagged by ANN but missed by LR: {ann_only}")

---
# Part E ‚Äî Written Analysis (4 points)

### Task 11 ‚Äî Model Recommendation (minimum 300 words)

Write a recommendation addressed to the business leadership of your chosen domain. Address ALL five points:

1. Which model should they deploy for their retention campaign, and why?
2. What are the top 3 features driving churn, and what can the business do about each one?
3. How many high-risk customers did your models identify? What's the estimated value of retaining them?
4. What are the tradeoffs between the two models (accuracy vs interpretability)?
5. Is there a scenario where deploying both models makes sense?

**To: Senior Leadership, Retail Banking Division**

**Recommendation: Deploy logistic regression as the primary model, with the neural network as a secondary screening tool.**

After analyzing 10,000 customer records, both models successfully identify customers at risk of leaving the bank. The logistic regression achieves approximately 80% accuracy with an AUC of ~0.77, while the neural network shows a marginal improvement in AUC but sacrifices interpretability. For a banking environment where regulatory compliance and explainability are critical, we recommend logistic regression as the primary deployment.

**Top 3 churn drivers and recommended actions:**
1. **Age** ‚Äî Older customers churn more, possibly due to changing needs or competitor targeting. Action: develop a premium service tier for customers over 45 with dedicated relationship managers.
2. **Geography (Germany)** ‚Äî The German market shows significantly higher churn. Action: conduct a competitive analysis of German banking offerings and consider market-specific retention programs.
3. **IsActiveMember status** ‚Äî Inactive customers are far more likely to leave. Action: implement an engagement program that flags customers whose activity drops below baseline and triggers proactive outreach within 30 days.

**Risk assessment:** Our models identified approximately 400 high-risk customers in the test set alone. Extrapolating to the full customer base, we estimate 2,000 customers are at elevated risk. At an estimated lifetime value of $6,000 per customer, retaining even 25% of these through targeted intervention represents $3 million in preserved revenue.

**Model tradeoffs:** The neural network catches approximately 20-30 additional at-risk customers that logistic regression misses, but it cannot explain *why* they're flagged. In banking, regulators and compliance teams need to understand model decisions. Logistic regression provides that transparency.

**Dual deployment scenario:** Use logistic regression to generate the primary target list with clear explanations for each flagged customer. Then run the neural network as a secondary screen to catch additional at-risk customers who didn't score high enough in the primary model. Relationship managers can use the LR coefficients to personalize their retention conversations, while the ANN ensures fewer customers fall through the cracks.

---
# Bonus Challenge (+3 points)

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
  <strong style="color: #1A5276;">üí° OPTIONAL</strong><br>
  Train a <strong>third model</strong> with a meaningfully different architecture. Change at least TWO of: number of layers, neurons per layer, dropout rate, optimizer. Add it to your ROC plot and comparison table.
</div>

In [None]:
# Bonus: Third model ‚Äî wider architecture with SGD optimizer
model_v2 = Sequential([
    Dense(64, activation="relu", input_shape=(n_features,)),
    Dropout(0.4),
    Dense(32, activation="relu"),
    Dropout(0.3),
    Dense(16, activation="relu"),
    Dropout(0.2),
    Dense(1, activation="sigmoid")
])
model_v2.compile(optimizer="sgd", loss="binary_crossentropy", metrics=["accuracy"])

early_stop_v2 = EarlyStopping(monitor="val_loss", patience=15, restore_best_weights=True, verbose=1)
history_v2 = model_v2.fit(X_train_scaled, y_train, epochs=200, batch_size=32,
                           validation_split=0.2, callbacks=[early_stop_v2], verbose=0)

v2_probs = model_v2.predict(X_test_scaled, verbose=0).ravel()
v2_preds = (v2_probs > 0.5).astype(int)
v2_auc = roc_auc_score(y_test, v2_probs)

v2_fpr, v2_tpr, _ = roc_curve(y_test, v2_probs)

plt.figure(figsize=(8, 6))
plt.plot(lr_fpr, lr_tpr, color="#0f3460", linewidth=2, label=f"LR (AUC={lr_auc:.3f})")
plt.plot(ann_fpr, ann_tpr, color="#e94560", linewidth=2, label=f"ANN v1 (AUC={ann_auc:.3f})")
plt.plot(v2_fpr, v2_tpr, color="#2ecc71", linewidth=2, label=f"ANN v2 - SGD (AUC={v2_auc:.3f})")
plt.plot([0, 1], [0, 1], "k--", alpha=0.5)
plt.title("Three-Model ROC Comparison")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"ANN v2 Accuracy: {accuracy_score(y_test, v2_preds):.4f}")
print(f"ANN v2 AUC: {v2_auc:.4f}")

**Bonus interpretation (3‚Äì4 sentences):**

**Sample:** The SGD-optimized model with a wider architecture (64‚Üí32‚Üí16) and higher dropout rates performed comparably to the Adam-optimized model, though it may have converged more slowly. The difference in AUC between the two ANN variants is likely within random variation, suggesting this dataset doesn't have enough complexity to reward a deeper architecture. This tells us that for this particular churn problem, the bottleneck is the features, not the model ‚Äî more sophisticated architectures can't extract signal that isn't in the data.

---
## Troubleshooting

| Problem | Fix |
|---------|-----|
| ANN predicts all one class (accuracy = churn rate) | Check architecture ‚Äî may need more neurons or different learning rate |
| `ValueError: shapes not aligned` | Verify `input_shape=(n_features,)` matches your feature count |
| Option B accuracy is suspiciously high (>95%) | Check that Naive Bayes columns were dropped |
| ROC curve is a straight line | Using predictions (0/1) instead of probabilities |
| Training runs all 200 epochs | EarlyStopping not in callbacks list |

---
<p style="color:#7F8C8D; font-size:0.85em;">
<em>CAP4767 Data Mining with Python | Miami Dade College | Spring 2026</em><br>
Lab 3 ‚Äî Churn Prediction: Full Pipeline | 20 Points (+3 Bonus)
</p>