# Supervised Churn Prediction Modeling

**Author:** Harpinder Singh  
**Dataset:** UK E-Commerce Customer Features  
**Objective:** Build and compare supervised models for churn prediction

---

## Table of Contents
1. [Environment Setup](#1.-Environment-Setup)
2. [Load Features & Prepare Data](#2.-Load-Features-&-Prepare-Data)
3. [Train-Test Split](#3.-Train-Test-Split)
4. [Baseline Models](#4.-Baseline-Models)
5. [Advanced Models](#5.-Advanced-Models)
6. [Model Comparison](#6.-Model-Comparison)
7. [Hyperparameter Tuning](#7.-Hyperparameter-Tuning)
8. [Final Model Selection](#8.-Final-Model-Selection)
9. [Export Results](#9.-Export-Results)

---

## 1. Environment Setup

╔════════════════════════════════════════════════════════════════╗
║                    ENVIRONMENT CONFIGURATION                    ║
╚════════════════════════════════════════════════════════════════╝

In [1]:
# Standard imports
import warnings
from pathlib import Path
import time

# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# ML imports
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score,
    confusion_matrix, classification_report, precision_recall_curve, roc_curve
)

# Models
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from xgboost import XGBClassifier

# Model persistence
import pickle

# Configuration
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:.4f}')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

print("✅ Environment configured")

✅ Environment configured


In [2]:
# ┌────────────────────────────────────────────────────────────┐
# │ Directory Setup                                            │
# └────────────────────────────────────────────────────────────┘

PROJECT_ROOT = Path.cwd()
if PROJECT_ROOT.name == 'notebooks':
    PROJECT_ROOT = PROJECT_ROOT.parent

DIR_DATA_PROCESSED = PROJECT_ROOT / 'data' / 'processed'
DIR_MODELS = PROJECT_ROOT / 'models'
DIR_RESULTS = PROJECT_ROOT / 'results'
DIR_FIGURES = PROJECT_ROOT / 'results' / 'figures'

for directory in [DIR_MODELS, DIR_RESULTS, DIR_FIGURES]:
    directory.mkdir(parents=True, exist_ok=True)

print("✅ Directories ready")

✅ Directories ready


---

## 2. Load Features & Prepare Data

╔════════════════════════════════════════════════════════════════╗
║                      DATA LOADING                               ║
╚════════════════════════════════════════════════════════════════╝

In [3]:
# ┌────────────────────────────────────────────────────────────┐
# │ Load Engineered Features                                   │
# └────────────────────────────────────────────────────────────┘

# Load features from Phase 3
data = pd.read_csv(DIR_DATA_PROCESSED / 'churn_features.csv')

print("Dataset Overview:")
print("="*80)
print(f"Shape: {data.shape}")
print(f"Churn rate: {data['churned'].mean()*100:.1f}%")
print(f"\nClass distribution:")
print(data['churned'].value_counts())

# Separate features and target
X = data.drop(['CustomerID', 'churned'], axis=1)
y = data['churned']

print(f"\nFeatures shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"\nFeature columns:")
for i, col in enumerate(X.columns, 1):
    print(f"  {i:2d}. {col}")

Dataset Overview:
Shape: (815, 32)
Churn rate: 26.9%

Class distribution:
churned
0    596
1    219
Name: count, dtype: int64

Features shape: (815, 30)
Target shape: (815,)

Feature columns:
   1. Recency
   2. Frequency
   3. Monetary
   4. Tenure
   5. AvgOrderValue
   6. AvgBasketSize
   7. prob_alive
   8. predicted_purchases_30d
   9. predicted_purchases_90d
  10. predicted_purchases_180d
  11. predicted_avg_value
  12. CLV_90d
  13. CLV_180d
  14. CLV_365d
  15. revenue_velocity
  16. quantity_velocity
  17. purchase_gap_velocity
  18. early_period_revenue
  19. late_period_revenue
  20. revenue_trend
  21. day_of_week_diversity
  22. weekend_purchase_ratio
  23. purchase_gap_mean
  24. purchase_gap_std
  25. purchase_gap_cv
  26. purchase_regularity
  27. unique_products
  28. avg_items_per_order
  29. product_diversity_ratio
  30. product_exploration_rate


---

## 3. Train-Test Split

╔════════════════════════════════════════════════════════════════╗
║                   TRAIN-TEST SPLIT                              ║
╚════════════════════════════════════════════════════════════════╝

In [4]:
# ┌────────────────────────────────────────────────────────────┐
# │ Split Data                                                 │
# └────────────────────────────────────────────────────────────┘

# 70-30 split with stratification
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    random_state=RANDOM_STATE,
    stratify=y
)

print("Data Split:")
print("="*80)
print(f"Training set: {X_train.shape[0]} samples")
print(f"  Active: {(y_train == 0).sum()} ({(y_train == 0).sum()/len(y_train)*100:.1f}%)")
print(f"  Churned: {(y_train == 1).sum()} ({(y_train == 1).sum()/len(y_train)*100:.1f}%)")

print(f"\nTest set: {X_test.shape[0]} samples")
print(f"  Active: {(y_test == 0).sum()} ({(y_test == 0).sum()/len(y_test)*100:.1f}%)")
print(f"  Churned: {(y_test == 1).sum()} ({(y_test == 1).sum()/len(y_test)*100:.1f}%)")

Data Split:
Training set: 570 samples
  Active: 417 (73.2%)
  Churned: 153 (26.8%)

Test set: 245 samples
  Active: 179 (73.1%)
  Churned: 66 (26.9%)


In [5]:
# ┌────────────────────────────────────────────────────────────┐
# │ Feature Scaling                                            │
# └────────────────────────────────────────────────────────────┘

# Standardize features (fit on train only)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert back to DataFrame for easier handling
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)

print("✅ Features scaled (StandardScaler)")
print(f"   Mean: {X_train_scaled.mean().mean():.4f}")
print(f"   Std: {X_train_scaled.std().mean():.4f}")

✅ Features scaled (StandardScaler)
   Mean: 0.0000
   Std: 1.0009


---

## 4. Baseline Models

╔════════════════════════════════════════════════════════════════╗
║                    BASELINE MODELS                              ║
╚════════════════════════════════════════════════════════════════╝

### Models:
1. **Logistic Regression** - Linear baseline
2. **Decision Tree** - Simple non-linear baseline

In [6]:
# ┌────────────────────────────────────────────────────────────┐
# │ Model Evaluation Helper Function                          │
# └────────────────────────────────────────────────────────────┘

def evaluate_model(model, X_train, X_test, y_train, y_test, model_name):
    """
    Train and evaluate a classification model.
    
    Parameters
    ----------
    model : estimator
        Sklearn-compatible classifier
    X_train : array-like
        Training features
    X_test : array-like
        Test features
    y_train : array-like
        Training labels
    y_test : array-like
        Test labels
    model_name : str
        Name of the model
    
    Returns
    -------
    dict
        Dictionary containing metrics and predictions
    """
    # Train
    start_time = time.time()
    model.fit(X_train, y_train)
    train_time = time.time() - start_time
    
    # Predict
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    # Metrics
    results = {
        'model_name': model_name,
        'model': model,
        'train_time': train_time,
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred),
        'recall': recall_score(y_test, y_pred),
        'f1': f1_score(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred_proba),
        'y_pred': y_pred,
        'y_pred_proba': y_pred_proba
    }
    
    return results

print("✅ Evaluation function defined")

✅ Evaluation function defined


In [7]:
# ┌────────────────────────────────────────────────────────────┐
# │ Model 1: Logistic Regression                              │
# └────────────────────────────────────────────────────────────┘

print("Training Logistic Regression...")

lr_model = LogisticRegression(
    random_state=RANDOM_STATE,
    max_iter=1000,
    class_weight='balanced'  # Handle class imbalance
)

lr_results = evaluate_model(
    lr_model, X_train_scaled, X_test_scaled, y_train, y_test, 
    'Logistic Regression'
)

print("\n✅ Logistic Regression Results:")
print(f"   Accuracy: {lr_results['accuracy']:.4f}")
print(f"   Precision: {lr_results['precision']:.4f}")
print(f"   Recall: {lr_results['recall']:.4f}")
print(f"   F1-Score: {lr_results['f1']:.4f}")
print(f"   ROC-AUC: {lr_results['roc_auc']:.4f}")
print(f"   Train time: {lr_results['train_time']:.2f}s")

Training Logistic Regression...

✅ Logistic Regression Results:
   Accuracy: 0.5633
   Precision: 0.3306
   Recall: 0.6061
   F1-Score: 0.4278
   ROC-AUC: 0.6332
   Train time: 0.06s


In [8]:
# ┌────────────────────────────────────────────────────────────┐
# │ Model 2: Decision Tree                                     │
# └────────────────────────────────────────────────────────────┘

print("Training Decision Tree...")

dt_model = DecisionTreeClassifier(
    random_state=RANDOM_STATE,
    max_depth=5,
    min_samples_split=20,
    min_samples_leaf=10,
    class_weight='balanced'
)

dt_results = evaluate_model(
    dt_model, X_train, X_test, y_train, y_test,  # No scaling needed for trees
    'Decision Tree'
)

print("\n✅ Decision Tree Results:")
print(f"   Accuracy: {dt_results['accuracy']:.4f}")
print(f"   Precision: {dt_results['precision']:.4f}")
print(f"   Recall: {dt_results['recall']:.4f}")
print(f"   F1-Score: {dt_results['f1']:.4f}")
print(f"   ROC-AUC: {dt_results['roc_auc']:.4f}")
print(f"   Train time: {dt_results['train_time']:.2f}s")

Training Decision Tree...

✅ Decision Tree Results:
   Accuracy: 0.4735
   Precision: 0.3179
   Recall: 0.8333
   F1-Score: 0.4603
   ROC-AUC: 0.5747
   Train time: 0.05s


---

## 5. Advanced Models

╔════════════════════════════════════════════════════════════════╗
║                    ADVANCED MODELS                              ║
╚════════════════════════════════════════════════════════════════╝

### Ensemble Models:
3. **Random Forest** - Bagging ensemble
4. **LightGBM** - Gradient boosting (primary)
5. **CatBoost** - Gradient boosting
6. **XGBoost** - Gradient boosting

In [9]:
# ┌────────────────────────────────────────────────────────────┐
# │ Model 3: Random Forest                                     │
# └────────────────────────────────────────────────────────────┘

print("Training Random Forest...")

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=10,
    min_samples_leaf=5,
    random_state=RANDOM_STATE,
    class_weight='balanced',
    n_jobs=-1
)

rf_results = evaluate_model(
    rf_model, X_train, X_test, y_train, y_test,
    'Random Forest'
)

print("\n✅ Random Forest Results:")
print(f"   Accuracy: {rf_results['accuracy']:.4f}")
print(f"   Precision: {rf_results['precision']:.4f}")
print(f"   Recall: {rf_results['recall']:.4f}")
print(f"   F1-Score: {rf_results['f1']:.4f}")
print(f"   ROC-AUC: {rf_results['roc_auc']:.4f}")
print(f"   Train time: {rf_results['train_time']:.2f}s")

Training Random Forest...

✅ Random Forest Results:
   Accuracy: 0.6816
   Precision: 0.4000
   Recall: 0.3636
   F1-Score: 0.3810
   ROC-AUC: 0.6423
   Train time: 0.33s


In [10]:
# ┌────────────────────────────────────────────────────────────┐
# │ Model 4: LightGBM                                          │
# └────────────────────────────────────────────────────────────┘

print("Training LightGBM...")

# Calculate scale_pos_weight for imbalance
scale_pos_weight = (y_train == 0).sum() / (y_train == 1).sum()

lgbm_model = LGBMClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    num_leaves=31,
    scale_pos_weight=scale_pos_weight,
    random_state=RANDOM_STATE,
    verbose=-1
)

lgbm_results = evaluate_model(
    lgbm_model, X_train, X_test, y_train, y_test,
    'LightGBM'
)

print("\n✅ LightGBM Results:")
print(f"   Accuracy: {lgbm_results['accuracy']:.4f}")
print(f"   Precision: {lgbm_results['precision']:.4f}")
print(f"   Recall: {lgbm_results['recall']:.4f}")
print(f"   F1-Score: {lgbm_results['f1']:.4f}")
print(f"   ROC-AUC: {lgbm_results['roc_auc']:.4f}")
print(f"   Train time: {lgbm_results['train_time']:.2f}s")

Training LightGBM...

✅ LightGBM Results:
   Accuracy: 0.6408
   Precision: 0.3036
   Recall: 0.2576
   F1-Score: 0.2787
   ROC-AUC: 0.5856
   Train time: 5.63s
