# Bank Churn Prediction:  
## Logistic Regression, K-Nearest Neighbors, and Random Forest Comparison

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
os.chdir(r'C:\Users\aarei\OneDrive - University of California, Davis\MSBA Work\Spring 2025\Application Domains\Individual Assignments\Individual Assignment 1')
os.getcwd()

'C:\\Users\\aarei\\OneDrive - University of California, Davis\\MSBA Work\\Spring 2025\\Application Domains\\Individual Assignments\\Individual Assignment 1'

In [6]:
bank_churn_data = pd.read_csv('Churn_Modelling_Dataset.csv')
bank_churn_data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


### Data Pre-processing and Feature Engineering

In [None]:
# Columns to drop
drop_cols = ['RowNumber', 'CustomerId', 'Surname']

# Columns to not touch
binary_cols = ['HasCrCard', 'IsActiveMember', 'Gender'] # Gender encoded below

# Numeric columns to scale
numeric_cols_to_scale = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

# Categorical column to encode
cat_cols_to_encode = ['Geography']

In [None]:
proc_data = bank_churn_data.drop(columns=drop_cols).copy()
proc_data['Gender'] =  proc_data['Gender'].map({'Male': 1, 'Female': 0})

In [9]:
preprocessor = ColumnTransformer(transformers=[
    ('scale', StandardScaler(), numeric_cols_to_scale),
    ('onehot', OneHotEncoder(drop='first', sparse_output=False), cat_cols_to_encode),
    ('pass', 'passthrough', binary_cols)
])

In [None]:
print(proc_data.head())

   CreditScore Geography  Gender  Age  Tenure    Balance  NumOfProducts  \
0          619    France       0   42       2       0.00              1   
1          608     Spain       0   41       1   83807.86              1   
2          502    France       0   42       8  159660.80              3   
3          699    France       0   39       1       0.00              2   
4          850     Spain       0   43       2  125510.82              1   

   HasCrCard  IsActiveMember  EstimatedSalary  Exited  
0          1               1        101348.88       1  
1          0               1        112542.58       0  
2          1               0        113931.57       1  
3          0               0         93826.63       0  
4          1               1         79084.10       0  


### Logistic Regression  
 - with balanced class weights: $\frac{n_{samples}}{(n_{classes})*(n_{i})}$
 - stratified sampling to preserve class balance in train and test sets
 - L2 (Ridge) regularization
 - Solver: Limited-memory Broyden-Fletcher-Goldfarb-Shanno

In [None]:
logreg_pipe = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('classifier', LogisticRegression(class_weight='balanced', max_iter=1000)) 
])

In [None]:
X = proc_data.drop(columns='Exited')
y = proc_data['Exited']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [13]:
logreg_pipe.fit(X_train, y_train)

y_pred = logreg_pipe.predict(X_test)

In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, digits=3))

Confusion Matrix:
 [[1142  451]
 [ 122  285]]

Classification Report:
               precision    recall  f1-score   support

           0      0.903     0.717     0.799      1593
           1      0.387     0.700     0.499       407

    accuracy                          0.714      2000
   macro avg      0.645     0.709     0.649      2000
weighted avg      0.798     0.714     0.738      2000



For grid search we try:
- $C=\frac{1}{\lambda}=\text{0.01, 0.1, 1, 10, or 100}$
- 5 cross-validation folds
- For scoring we use the ROC/AUC curve, and will report the best AUC score for all models

In [15]:
grid_search = GridSearchCV(
    estimator=logreg_pipe,
    param_grid={'classifier__C': [0.01, 0.1, 1, 10, 100]},
    cv=5, 
    scoring='roc_auc',
    n_jobs=-1,
    verbose=2
)

grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 5 candidates, totalling 25 fits


In [16]:
print("Best parameters:", grid_search.best_params_)
print("Best cross-validated AUC:", grid_search.best_score_)

# Predict on test set with best estimator
y_pred_best = grid_search.best_estimator_.predict(X_test)
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_best))
print("\nClassification Report:\n", classification_report(y_test, y_pred_best, digits=3))

Best parameters: {'classifier__C': 0.01}
Best cross-validated AUC: 0.7674899596459632
Confusion Matrix:
 [[1146  447]
 [ 118  289]]

Classification Report:
               precision    recall  f1-score   support

           0      0.907     0.719     0.802      1593
           1      0.393     0.710     0.506       407

    accuracy                          0.718      2000
   macro avg      0.650     0.715     0.654      2000
weighted avg      0.802     0.718     0.742      2000



**Logistic Regression**
- Precision = 0.393:
    - Many predicted churners were actually non-churners → high false positive rate

- Recall = 0.710:
    - Caught most of the true churners → strong for intervention

Interpretation:
Wide net — good at not missing churners, but not very selective

### K-Nearest Neighbors

For the grid search cross-validation, we will search over the following settings of the model hyperparamaters:
- 3, 5, 7, or 9 Neighbors
- Distance based weights applied to Neighbors, or no weights applied
- The hyperparameter P determines the distance type used via the Minkowski distance formula
    - we try P = 1, the Manhattan distance
    - and P = 2, the Euclidean distance

In [None]:
knn_pipe = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('classifier', KNeighborsClassifier())
])

In [18]:
grid_search_knn = GridSearchCV(
    estimator=knn_pipe,
    param_grid={
        'classifier__n_neighbors': [3, 5, 7, 9],
        'classifier__weights': ['uniform', 'distance'],
        'classifier__p': [1, 2]},
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1    
)

grid_search_knn.fit(X_train, y_train)

Fitting 5 folds for each of 16 candidates, totalling 80 fits


In [19]:
y_pred_knn = grid_search_knn.best_estimator_.predict(X_test)

print("Best KNN params:", grid_search_knn.best_params_)
print("Best cross-validated AUC:", grid_search_knn.best_score_)
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_knn))
print("\nClassification Report:\n", classification_report(y_test, y_pred_knn, digits=3))

Best KNN params: {'classifier__n_neighbors': 9, 'classifier__p': 2, 'classifier__weights': 'distance'}
Best cross-validated AUC: 0.8188301181727999
Confusion Matrix:
 [[1533   60]
 [ 250  157]]

Classification Report:
               precision    recall  f1-score   support

           0      0.860     0.962     0.908      1593
           1      0.724     0.386     0.503       407

    accuracy                          0.845      2000
   macro avg      0.792     0.674     0.706      2000
weighted avg      0.832     0.845     0.826      2000



**K-Nearest Neighbors**
- Precision = 0.724:
    - When it predicts churn, it’s usually correct → low false positives

- Recall = 0.386:
    - Missed many actual churners → high false negatives

Interpretation:
Very selective — only flags churn when very confident, but misses a lot

### Random Forest

For the grid search cross-validation, we will search over the following settings of the model hyperparamaters:
- 100 or 200 trees
- max depth of each tree's level 10, 20, or no restriction
- 2 or 5 as the minimum number of samples a node must have to allow a split
- Weights applied to correct for class imbalance, or no weights applied

In [20]:
rf_pipe = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

In [21]:
grid_search_rf = GridSearchCV(
    estimator=rf_pipe,
    param_grid={
    'classifier__n_estimators': [100, 200],
    'classifier__max_depth': [None, 10, 20],
    'classifier__min_samples_split': [2, 5],
    'classifier__class_weight': [None, 'balanced']  # optional: test if balancing helps
},
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

grid_search_rf.fit(X_train, y_train)

Fitting 5 folds for each of 24 candidates, totalling 120 fits


In [22]:
y_pred_rf = grid_search_rf.best_estimator_.predict(X_test)

print("Best RF params:", grid_search_rf.best_params_)
print("Best cross-validated AUC:", grid_search_rf.best_score_)
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_rf))
print("\nClassification Report:\n", classification_report(y_test, y_pred_rf, digits=3))

Best RF params: {'classifier__class_weight': None, 'classifier__max_depth': 10, 'classifier__min_samples_split': 5, 'classifier__n_estimators': 200}
Best cross-validated AUC: 0.863091947491597
Confusion Matrix:
 [[1553   40]
 [ 226  181]]

Classification Report:
               precision    recall  f1-score   support

           0      0.873     0.975     0.921      1593
           1      0.819     0.445     0.576       407

    accuracy                          0.867      2000
   macro avg      0.846     0.710     0.749      2000
weighted avg      0.862     0.867     0.851      2000



**Random Forest**
- Precision = 0.819:
    - Very few false positives

- Recall = 0.445:
    - Missed some churners, but better than K-NN

- F1 = 0.576 (best):
    - Best balance between precision and recall

Interpretation:
Best overall — confident, precise, and reasonably good at catching churners

### Conclusion  

In this scenario a bank wants to predict and address customer churn, as improving retention rates is known to bring significant increase in net profits. For a customer that is classified as likely to churn, the bank will perform some sort of intervention effort i.e. targeted marketing, special deals, rewards for loyalty etc. Here a type II error (false negative) would likely be deemed more costly and less desireable compared to a type I error (false positive). Under this framework we can safely choose the logistic regression over the K-nearest neighbors model, as the latter is far more reluctant to classify a customer as likely to churn and consequently misses many actual churners. When choosing between the logistic regression and random forest ensemble, we would have to take into consideration how critical interpretability is in this scenario, as it can certainly be argued the logistic regression is the most interpretable out of the three. However if we consider all other factors as well, it is much more likely to be the case that the random forest ensemble is the best choice. It performs the best overall out of the three, as revealed by comparing F1 scores, and if a technique like gradient boosting was applied, there is a good chance it could outperform the other two on all key metrics. Furthermore, while it could certainly be argued it is less interpretable compared to the logistic regression, interpretability is not out of reach if the Shapley Additive Explanations method is applied.