# CatBoost Baseline for Bank Customer Churn (English Version)

This notebook implements a **CatBoost baseline model** for bank customer churn prediction.
It follows the experimental requirements from the slides and reports the following metrics:

- ROC-AUC
- PR-AUC (Average Precision)
- F1-score
- Balanced Accuracy
- Brier Score
- Top-K metrics: Precision@K / Recall@K (default: top 10% highest-risk customers)

Adjust `DATA_PATH` to your actual dataset location. The example assumes a Kaggle-style bank churn dataset with a binary churn label.

In [25]:
# 0. Environment & dependencies
#!pip install catboost

import numpy as np
import pandas as pd
from pathlib import Path

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    roc_auc_score,
    average_precision_score,
    f1_score,
    balanced_accuracy_score,
    brier_score_loss,
)

from catboost import CatBoostClassifier

import joblib

RANDOM_STATE = 2025
np.random.seed(RANDOM_STATE)

# Default path; change if needed
DATA_PATH = Path('churn_clean.csv')

In [17]:
# 1. Load data
assert DATA_PATH.exists(), f"Data file not found: {DATA_PATH}. Please check the path."
df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head()

(10000, 11)


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## 2. Define features and target

- Assumed target column: `Exited` (0 = retained, 1 = churn), or similar.
- Drop ID-like columns such as `RowNumber`, `CustomerId`, `CustomerID`, `Surname` if they exist.
- Use CatBoost native categorical handling: pass categorical feature indices via `cat_features`.

In [18]:
# 2.1 Identify target column
target_col_candidates = [c for c in df.columns if c.lower() in ['exited', 'churn', 'attrition_flag']]
assert len(target_col_candidates) == 1, (
    'Unable to uniquely identify target column. '
    f'Candidates: {target_col_candidates}. Columns: {df.columns.tolist()}'
)
TARGET_COL = target_col_candidates[0]
print('Target column:', TARGET_COL)

# 2.2 Drop ID-like columns (if present)
drop_cols = [c for c in df.columns if c.lower() in ['rownumber', 'customerid', 'customer_id', 'surname']]
print('Dropping ID-like columns:', drop_cols)

df_model = df.drop(columns=drop_cols)

# 2.3 Split into features and label
X = df_model.drop(columns=[TARGET_COL])
y = df_model[TARGET_COL].astype(int)

# 2.4 Detect categorical features by dtype
cat_features = [i for i, c in enumerate(X.columns)
                if X[c].dtype == 'object' or str(X[c].dtype).startswith('category')]
print('Categorical feature indices:', cat_features)
print('Categorical feature names:', [X.columns[i] for i in cat_features])

Target column: Exited
Dropping ID-like columns: []
Categorical feature indices: [1, 2]
Categorical feature names: ['Geography', 'Gender']


## 3. Train / test split

- Stratified split to preserve churn rate.
- 60/20/20 split.

In [19]:
X_train, X_temp, y_train, y_temp = train_test_split(
    X,
    y,
    test_size=0.4,
    random_state=RANDOM_STATE,
    stratify=y,
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp,
    y_temp,
    test_size=0.5,
    random_state=RANDOM_STATE,
    stratify=y_temp,
)

print('Train shape:', X_train.shape, 'Target ratio:', y_train.mean().round(4))
print('Validation shape:', X_val.shape, 'Target ratio:', y_val.mean().round(4))
print('Test shape:', X_test.shape, 'Target ratio:', y_test.mean().round(4))


Train shape: (6000, 10) Target ratio: 0.2037
Validation shape: (2000, 10) Target ratio: 0.204
Test shape: (2000, 10) Target ratio: 0.2035


## 4. Evaluation utilities (including Top-K metrics)

Top-K metrics definition:
- Let **K** be a proportion (e.g., 10%) of the test set with the highest predicted churn probabilities.
- **Precision@K**: fraction of true churners among the top-K ranked customers.
- **Recall@K**: fraction of all churners covered by the top-K ranked customers.

In [20]:
def topk_precision_recall(y_true, y_proba, k_ratio: float = 0.1):
    """Compute Precision@K and Recall@K, where K = k_ratio * N (at least 1).

    Args:
        y_true: array-like of shape (n_samples,), binary ground truth (0/1).
        y_proba: array-like of shape (n_samples,), predicted probabilities for the positive class.
        k_ratio: float in (0, 1], fraction of samples to select as top-K.
    """
    assert 0 < k_ratio <= 1, "k_ratio must be in (0, 1]."
    y_true = np.asarray(y_true)
    y_proba = np.asarray(y_proba)

    n = len(y_true)
    k = max(1, int(np.floor(n * k_ratio)))

    # sort indices by predicted probability (descending)
    order = np.argsort(-y_proba)
    topk_idx = order[:k]

    y_topk = y_true[topk_idx]
    tp_at_k = y_topk.sum()
    total_positives = y_true.sum()

    precision_at_k = tp_at_k / k
    recall_at_k = tp_at_k / total_positives if total_positives > 0 else 0.0

    return {
        'K': k,
        'Precision@K': precision_at_k,
        'Recall@K': recall_at_k,
    }


def evaluate_classifier(y_true, y_proba, threshold: float = 0.5, k_ratio: float = 0.1):
    """Compute all required metrics for a binary classifier.

    Returns a dictionary containing:
    ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier Score, K, Precision@K, Recall@K.
    """
    y_true_arr = y_true.values if isinstance(y_true, pd.Series) else np.asarray(y_true)
    y_proba_arr = np.asarray(y_proba)

    # Threshold-based predictions
    y_pred = (y_proba_arr >= threshold).astype(int)

    # Core metrics
    roc = roc_auc_score(y_true_arr, y_proba_arr)
    pr_auc = average_precision_score(y_true_arr, y_proba_arr)
    f1 = f1_score(y_true_arr, y_pred)
    bal_acc = balanced_accuracy_score(y_true_arr, y_pred)
    brier = brier_score_loss(y_true_arr, y_proba_arr)

    # Top-K metrics
    topk = topk_precision_recall(y_true_arr, y_proba_arr, k_ratio=k_ratio)

    metrics = {
        'ROC-AUC': roc,
        'PR-AUC': pr_auc,
        'F1': f1,
        'BalancedAccuracy': bal_acc,
        'BrierScore': brier,
        'K': topk['K'],
        'Precision@K': topk['Precision@K'],
        'Recall@K': topk['Recall@K'],
    }

    return metrics

## 5. Train CatBoost baseline model

Design choices:
- Use class weights to handle label imbalance (computed from training set).
- Reasonable depth and number of estimators.
- Enable early stopping based on validation AUC.
- Keep it **strong but still a baseline**, without heavy tuning.

In [21]:
# 5.1 Compute class weight for positive class
pos_frac = y_train.mean()
neg_frac = 1 - pos_frac
scale_pos_weight = float(neg_frac / pos_frac)
print(f"Positive ratio = {pos_frac:.4f}, scale_pos_weight = {scale_pos_weight:.3f}")

# 5.2 Define CatBoostClassifier
cat_model = CatBoostClassifier(
    loss_function='Logloss',
    eval_metric='AUC',
    learning_rate=0.05,
    depth=6,
    n_estimators=2000,
    random_seed=RANDOM_STATE,
    class_weights=[1.0, scale_pos_weight],  # [negative, positive]
    od_type='Iter',         # early stopping
    od_wait=50,
    use_best_model=True,
    verbose=False,
)

# 5.3 Fit model using native categorical handling and validation set
cat_model.fit(
    X_train,
    y_train,
    cat_features=cat_features,
    eval_set=(X_val, y_val),
    verbose=False,
)

best_iter = getattr(cat_model, 'get_best_iteration', lambda: -1)()
if best_iter is not None and best_iter >= 0:
    print(f'CatBoost baseline model trained. Best iteration: {best_iter}')
else:
    print('CatBoost baseline model trained.')


Positive ratio = 0.2037, scale_pos_weight = 3.910
CatBoost baseline model trained. Best iteration: 127


## 6. Evaluate on the test set

- Use predicted probabilities from `predict_proba`.
- Default decision threshold: 0.5 (can be optimized later using cost/benefit analysis).
- Default Top-K ratio: 10% of test set.

In [22]:
y_val_proba = cat_model.predict_proba(X_val)[:, 1]
y_test_proba = cat_model.predict_proba(X_test)[:, 1]

metrics_val = evaluate_classifier(y_val, y_val_proba, threshold=0.5, k_ratio=0.1)
metrics_test = evaluate_classifier(y_test, y_test_proba, threshold=0.5, k_ratio=0.1)

metrics_df = pd.DataFrame([
    {'Dataset': 'Validation', **metrics_val},
    {'Dataset': 'Test', **metrics_test},
])
metrics_df


Unnamed: 0,Dataset,ROC-AUC,PR-AUC,F1,BalancedAccuracy,BrierScore,K,Precision@K,Recall@K
0,Validation,0.876752,0.715782,0.62173,0.791679,0.137171,200,0.83,0.406863
1,Test,0.869734,0.718839,0.611222,0.784925,0.137454,200,0.855,0.420147


The CatBoost baseline model achieved stable and strong performance across both validation and test sets.
ROC-AUC and PR-AUC indicate excellent discrimination ability, while F1 and Balanced Accuracy show consistent classification effectiveness.
The Top-K analysis (Precision@10% = 0.855) demonstrates high practical value for targeted retention strategies, identifying over 42% of potential churners among the top 10% highest-risk customers.

## 7. Export results 

To facilitate comparison with other baseline models (Logistic Regression, Random Forest, XGBoost, LightGBM, SVM):

- Save the metric summary as CSV.
- Save test-set probabilities for calibration plots, lift curves, and cost-sensitive analysis.

In [26]:
MODEL_DIR = Path('models')
MODEL_DIR.mkdir(parents=True, exist_ok=True)
MODEL_PATH = MODEL_DIR / 'catboost_baseline.joblib'
joblib.dump(cat_model, MODEL_PATH)
MODEL_PATH


PosixPath('models/catboost_baseline.joblib')

In [27]:
OUTPUT_DIR = Path('reports')
OUTPUT_DIR.mkdir(exist_ok=True, parents=True)

# 7.1 Save metric summary for validation and test splits
metrics_df.to_csv(OUTPUT_DIR / 'catboost_baseline_metrics_en.csv', index=False)

# 7.2 Save predictions (dataset label, y_true and probabilities)
pred_val = pd.DataFrame({
    'dataset': ['validation'] * len(y_val),
    'y_true': y_val.values,
    'y_proba': y_val_proba,
})
pred_test = pd.DataFrame({
    'dataset': ['test'] * len(y_test),
    'y_true': y_test.values,
    'y_proba': y_test_proba,
})
pred_df = pd.concat([pred_val, pred_test], ignore_index=True)
pred_df.to_csv(OUTPUT_DIR / 'catboost_baseline_predictions_en.csv', index=False)

print('Exported metrics and predictions to:', OUTPUT_DIR)


Exported metrics and predictions to: reports
