With Beta Bank's customers leaving, I've been tasked with creating a model that can predict whether a customer will leave the bank soon. I will build a model with the highest F1 score I can make, and also measure the ROC-AUC metric to compare to the F1 score.

First, I will import the necessary libraries.

In [19]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, roc_auc_score
from sklearn.utils import shuffle

I will download the dataset and then check for duplicates.

In [20]:
df = pd.read_csv('/datasets/Churn.csv')
display(df)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5.0,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10.0,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7.0,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3.0,75075.31,2,1,0,92888.52,1


In [21]:
duplicates = df.duplicated()
print(duplicates.sum())

0


Since there are no duplicates within the dataframe, I'll begin defining the features and target variables.

In [22]:
features = df.drop(columns=['Exited', 'RowNumber', 'CustomerId', 'Surname', 'Geography', 'Gender'])
target = df['Exited']

Since I have some NaNs and infite values, I will convernt the NaNs into the mean of the columns, and drop the infinite values.

In [23]:
features = features.fillna(features.mean())
features.replace([np.inf, -np.inf], np.nan, inplace=True)
features = features.dropna()

I'll split the data into training and validation sets.

In [24]:
features_train, features_valid, target_train, target_valid = train_test_split(
    features, target, test_size=0.2, random_state=54321)

features_remain, features_test, target_remain, target_test = train_test_split(
    features_valid, target_valid, test_size=0.5, random_state=54321)

I'll standardize the numerical features next.

In [25]:
scaler = StandardScaler()
features_train_scaled = scaler.fit_transform(features_train)
features_valid_scaled = scaler.transform(features_valid)
features_test_scaled = scaler.transform(features_test)

I'll now investigate the balance of classes.

In [26]:
classes_balance = df['Exited'].value_counts()
print("Balance of Classes: ", classes_balance)

Balance of Classes:  0    7963
1    2037
Name: Exited, dtype: int64


This shows that there are nearly 4x as many customers who have not exited (Class 0) than those who have exited (Class 1). This could potentially cause the model to develop a bias towards Class 0, and it may even get a high accuracy score by predicting the majority for each instance.

I will now train the model without fixing the class imbalances.

Next, we'll create our model.

In [27]:
model = RandomForestClassifier(random_state=54321)
model.fit(features_train_scaled, target_train)

RandomForestClassifier(random_state=54321)

We'll calculate the f1 and roc_auc values on the validation sets before handling the imbalances.

In [28]:
target_pred_val = model.predict(features_valid_scaled)
f1_valid = f1_score(target_valid, target_pred_val)
prob_valid = model.predict_proba(features_valid_scaled)
prob_valid_one = prob_valid[:, 1]
roc_auc_valid = roc_auc_score(target_valid, prob_valid_one)

In [29]:
print("F1 Score on validation set before fixing imbalances:", f1_valid)
print("ROC-AUC Score on validation set before fixing imbalances:", roc_auc_valid)

F1 Score on validation set before fixing imbalances: 0.5737439222042139
ROC-AUC Score on validation set before fixing imbalances: 0.8501051122790255


Already before adjusting the imbalances, the F1 score is close to 0.59. Now I'll deal with the imbalance of classes.
First, I'll make an upsample function.

In [30]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]
    
    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)
    
    features_upsampled, target_upsampled = shuffle(features_upsampled, target_upsampled, random_state=12345)
    return features_upsampled, target_upsampled

Now, I'll use the upsample function on the validation sets to get a more balanced result.

In [31]:
features_upsampled, target_upsampled = upsample(features_train, target_train, 10)

model = RandomForestClassifier(random_state=54321)
model.fit(features_upsampled, target_upsampled)

predicted_valid_upsampled = model.predict(features_valid)
f1_upsampled = f1_score(target_valid, predicted_valid_upsampled)
roc_auc_upsampled = roc_auc_score(target_valid, model.predict_proba(features_valid)[:, 1])
print("F1 Score on validation set after fixing imbalances:", f1_upsampled)
print("ROC-AUC Score on validation set after fixing imbalances:", roc_auc_upsampled)

F1 Score on validation set after fixing imbalances: 0.6022727272727273
ROC-AUC Score on validation set after fixing imbalances: 0.8522973403408186


I'll do the same for the test set.

In [32]:
features_test_up, target_test_up = upsample(features_test, target_test, 10)

predicted_test = model.predict(features_test_up)

f1_test = f1_score(target_test_up, predicted_test)
roc_auc_test = roc_auc_score(target_test_up, model.predict_proba(features_test_up)[:, 1])

print("F1 Score on the test set:", f1_test)
print("ROC-AUC Score on the test set:", roc_auc_test)

F1 Score on the test set: 0.695994747209455
ROC-AUC Score on the test set: 0.864057583587015


Now, I will make a function to downsample the data and see if this provides a better F1 and AUC-ROC score.

In [33]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]
    
    features_downsampled = pd.concat(
        [features_zeros.sample(frac=fraction, random_state=12345)] + [features_ones])
    
    target_downsampled = pd.concat(
        [target_zeros.sample(frac=fraction, random_state=12345)] + [target_ones])
    
    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=12345)
    
    return features_downsampled, target_downsampled

In [34]:
features_downsampled, target_downsampled = downsample(features_train, target_train, 0.2)

model = RandomForestClassifier(random_state=54321)
model.fit(features_downsampled, target_downsampled)

predicted_valid_downsampled = model.predict(features_valid)
f1_downsampled = f1_score(target_valid, predicted_valid_downsampled)
roc_auc_downsampled = roc_auc_score(target_valid, model.predict_proba(features_valid)[:, 1])
print("F1 Score on validation set after fixing imbalances:", f1_downsampled)
print("ROC-AUC Score on validation set after fixing imbalances:", roc_auc_downsampled)

F1 Score on validation set after fixing imbalances: 0.542608695652174
ROC-AUC Score on validation set after fixing imbalances: 0.8502341137123746


Now I will use the test set with the downsample function to see if it improves the F1 score any.

In [35]:
features_test_down, target_test_down = downsample(features_test, target_test, 0.2)

predicted_test = model.predict(features_test_down)

f1_test = f1_score(target_test_down, predicted_test)
roc_auc_test = roc_auc_score(target_test_down, model.predict_proba(features_test_down)[:, 1])

print("F1 Score on the test set:", f1_test)
print("ROC-AUC Score on the test set:", roc_auc_test)

F1 Score on the test set: 0.7857142857142858
ROC-AUC Score on the test set: 0.8592399308445924


So there are a few things we can comment on with these models. The F1 Score and the ROC-AUC Score were higher on the upsampled training sets than with the downsampled training sets. But when we decided to test the model, while the ROC-AUC Scores were very close to one another, the F1 Score was drastically higher in the downsampled test set. 

Based on the F1 Score, we can conclude that each model is working reasonably well at identifying customers at risk of leaving, with the downsampled testing set coming out on top. The ROC-AUC Score is quite high for both models, showing that it has a good chance of distinguishing between customers who will, and who will not, leave.