# Bank Churn Classifier (Neural Network)

Using a scikit-learn MLP to predict customer churn. This notebook includes:

1.  Data loading and preprocessing.
2.  Handling class imbalance by upsampling the minority class.
3.  Training an `MLPClassifier` with early stopping.
4.  Evaluating the model with accuracy and a confusion matrix.

In [12]:
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.utils import resample
import warnings

# We do not want to see warnings
warnings.filterwarnings("ignore")

In [13]:
RND = 42

## 1. Load and Pre-process Data

In [14]:
# 1. Read dataset
path = "7_bank_churn.csv"
df = pd.read_csv(path)

df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [15]:
# 2. Feature/target split and basic preprocessing
if 'Exited' not in df.columns:
    raise RuntimeError("Dataset must contain 'Exited' column (target).")

df = df.copy()
le = LabelEncoder()
df['Gender'] = le.fit_transform(df['Gender'])
df = pd.get_dummies(df, columns=['Geography'], drop_first=True)
df = df.drop(columns=[c for c in ['CustomerId', 'Surname', 'RowNumber'] if c in df.columns])

X = df.drop(columns=['Exited'])
y = df['Exited'].astype(int)

print("Feature columns:", X.columns.tolist())

Feature columns: ['CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Geography_Germany', 'Geography_Spain']


## 2. Train-Test Split and Normalization

In [16]:
# Stratified train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=RND, stratify=y
)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

X_train shape: (8000, 11)
X_test shape: (2000, 11)


In [17]:
# 3. Normalize train and test
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 3. Handle Class Imbalance (Upsampling)

In [18]:
# 4. Identify improvement: class imbalance handling -> upsample minority class in training set
# Combine X_train and y_train to perform resampling
train_df = pd.DataFrame(X_train, columns=X.columns)
train_df['Exited'] = y_train.values

majority = train_df[train_df['Exited'] == 0]
minority = train_df[train_df['Exited'] == 1]

print(f"Original training set shape: {train_df.shape}")
print(f"Original majority count: {len(majority)}")
print(f"Original minority count: {len(minority)}")

if len(minority) == 0:
    # if no minority present (unexpected), skip resampling
    X_train_res = X_train
    y_train_res = y_train
    print("No minority class found, skipping resampling.")
else:
    minority_upsampled = resample(minority,
                                  replace=True,
                                  n_samples=len(majority),
                                  random_state=RND)
    train_balanced = pd.concat([majority, minority_upsampled])
    train_balanced = train_balanced.sample(frac=1, random_state=RND).reset_index(drop=True)
    y_train_res = train_balanced['Exited'].astype(int)
    X_train_res = train_balanced.drop(columns=['Exited']).values
    print("\nAfter upsampling:")
    print(f"Balanced training set shape: {train_balanced.shape}")
    print(f"Balanced 'Exited' value counts:\n{pd.Series(y_train_res).value_counts()}")

Original training set shape: (8000, 12)
Original majority count: 6370
Original minority count: 1630

After upsampling:
Balanced training set shape: (12740, 12)
Balanced 'Exited' value counts:
Exited
1    6370
0    6370
Name: count, dtype: int64


## 4. Build and Train the Model

In [19]:
# 4. Initialize and build the model (MLP with early stopping)
mlp = MLPClassifier(
    hidden_layer_sizes=(64, 32),
    activation='relu',
    solver='adam',
    max_iter=500,
    early_stopping=True,
    n_iter_no_change=20,
    tol=1e-4,
    random_state=RND
)

print(mlp)

MLPClassifier(early_stopping=True, hidden_layer_sizes=(64, 32), max_iter=500,
              n_iter_no_change=20, random_state=42)


In [20]:
# Train on the (possibly) resampled training set
mlp.fit(X_train_res, y_train_res)

print("Model training complete.")

Model training complete.


## 5. Evaluate the Model

In [21]:
# 5. Evaluate: print accuracy score and confusion matrix
# Predictions are made on the original, un-resampled X_test
y_pred = mlp.predict(X_test)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

In [22]:
print(f"Accuracy: {acc:.4f}")
print("\nConfusion matrix:")
print(cm)

Accuracy: 0.7820

Confusion matrix:
[[1301  292]
 [ 144  263]]
