**Bank Customer Churn Model**

Objective: To predict which customers are likely to leave the bank (churn).

Data Source: https://github.com/YBI-Foundation/Dataset/blob/main/Bank%20Churn%20Modelling.csv

Import Libraries:

In [160]:
import pandas as pd

In [161]:
import numpy as np

In [162]:
import matplotlib.pyplot as plt

In [163]:
import seaborn as sns

Import Data:

In [164]:
df=pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Bank%20Churn%20Modelling.csv')

Describe Data:

In [165]:
df.head()

Unnamed: 0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,Num Of Products,Has Credit Card,Is Active Member,Estimated Salary,Churn
0,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [166]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   CustomerId        10000 non-null  int64  
 1   Surname           10000 non-null  object 
 2   CreditScore       10000 non-null  int64  
 3   Geography         10000 non-null  object 
 4   Gender            10000 non-null  object 
 5   Age               10000 non-null  int64  
 6   Tenure            10000 non-null  int64  
 7   Balance           10000 non-null  float64
 8   Num Of Products   10000 non-null  int64  
 9   Has Credit Card   10000 non-null  int64  
 10  Is Active Member  10000 non-null  int64  
 11  Estimated Salary  10000 non-null  float64
 12  Churn             10000 non-null  int64  
dtypes: float64(2), int64(8), object(3)
memory usage: 1015.8+ KB


Data Preprocessing:

In [167]:
df.isna().sum()

CustomerId          0
Surname             0
CreditScore         0
Geography           0
Gender              0
Age                 0
Tenure              0
Balance             0
Num Of Products     0
Has Credit Card     0
Is Active Member    0
Estimated Salary    0
Churn               0
dtype: int64

In [168]:
df.columns

Index(['CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age',
       'Tenure', 'Balance', 'Num Of Products', 'Has Credit Card',
       'Is Active Member', 'Estimated Salary', 'Churn'],
      dtype='object')

In [169]:
df=df.drop(['CustomerId','Surname'], axis=1)

In [170]:
df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,Num Of Products,Has Credit Card,Is Active Member,Estimated Salary,Churn
0,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...
9995,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [171]:
df1=pd.get_dummies(df)

In [172]:
df1.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,Num Of Products,Has Credit Card,Is Active Member,Estimated Salary,Churn,Geography_France,Geography_Germany,Geography_Spain,Gender_Female,Gender_Male
0,619,42,2,0.0,1,1,1,101348.88,1,True,False,False,True,False
1,608,41,1,83807.86,1,0,1,112542.58,0,False,False,True,True,False
2,502,42,8,159660.8,3,1,0,113931.57,1,True,False,False,True,False
3,699,39,1,0.0,2,0,0,93826.63,0,True,False,False,True,False
4,850,43,2,125510.82,1,1,1,79084.1,0,False,False,True,True,False


In [173]:
df1.Churn.value_counts()

Churn
0    7963
1    2037
Name: count, dtype: int64

In [174]:
df1.columns

Index(['CreditScore', 'Age', 'Tenure', 'Balance', 'Num Of Products',
       'Has Credit Card', 'Is Active Member', 'Estimated Salary', 'Churn',
       'Geography_France', 'Geography_Germany', 'Geography_Spain',
       'Gender_Female', 'Gender_Male'],
      dtype='object')

Define Target Variable(y) and Feature Variable(x):

In [175]:
y=df1['Churn']
x=df1[['CreditScore', 'Age', 'Tenure', 'Balance', 'Num Of Products',
       'Has Credit Card', 'Is Active Member', 'Estimated Salary','Geography_France', 'Geography_Germany', 'Geography_Spain',
       'Gender_Female', 'Gender_Male']]

In [176]:
from imblearn.under_sampling import RandomUnderSampler

In [177]:
rus=RandomUnderSampler()

In [178]:
x_rus,y_rus=rus.fit_resample(x,y)

Train Test Split:

In [179]:
from sklearn.model_selection import train_test_split

In [180]:
x_train, x_test, y_train, y_test= train_test_split(x_rus,y_rus, random_state=2529)

Modeling:

In [181]:
from sklearn.ensemble import RandomForestClassifier

In [182]:
rfc=RandomForestClassifier()

In [183]:
rfc.fit(x_train,y_train)

In [184]:
y_pred= rfc.predict(x_test)

Model Evaluation:

In [185]:
from sklearn.metrics import classification_report, confusion_matrix

In [186]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.76      0.77      0.76       523
           1       0.75      0.74      0.75       496

    accuracy                           0.76      1019
   macro avg       0.76      0.76      0.76      1019
weighted avg       0.76      0.76      0.76      1019



In [187]:
confusion_matrix(y_test,y_pred)

array([[403, 120],
       [129, 367]])

Prediction:


In [188]:
y_pred

array([0, 0, 1, ..., 0, 0, 0])

Explaination: Overall the model is 76% accurate. It predicts 74% of the time correctly about the customer who is going to churn the bank.
