# Bank Churn Prediction


# Objective:
Given a Bank customer, build a neural network based classifier that can determine whether they will leave or not in the next 6 months.

Context:
Businesses like banks which provide service have to worry about problem of 'Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

Data Description:
The case study is from an open-source dataset from Kaggle.
The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance etc.
Link to the Kaggle project site: https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

Points Distribution:
The points distribution for this case is as follows:
1. Read the dataset
2. Drop the columns which are unique for all users like IDs (5 points)
3. Distinguish the features and target variable (5 points)
4. Divide the data set into training and test sets (5 points)
5. Normalize the train and test data (10 points)
6. Initialize & build the model. Identify the points of improvement and implement the same.(20)
7. Predict the results using 0.5 as a threshold (10 points)
8. Print the Accuracy score and confusion matrix (5 points)

In [1]:
!pip install tensorflow==2.0

Collecting tensorflow==2.0
  Downloading https://files.pythonhosted.org/packages/54/5f/e1b2d83b808f978f51b7ce109315154da3a3d4151aa59686002681f2e109/tensorflow-2.0.0-cp37-cp37m-win_amd64.whl (48.1MB)
Collecting gast==0.2.2 (from tensorflow==2.0)
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting keras-applications>=1.0.8 (from tensorflow==2.0)
  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
Collecting protobuf>=3.6.1 (from tensorflow==2.0)
  Downloading https://files.pythonhosted.org/packages/d2/e8/2510f78a3759e8e8ac5f433fa10a0f582ee13932e3a5e07b9a5067b00dfd/protobuf-3.12.4-cp37-cp37m-win_amd64.whl (1.0MB)
Collecting grpcio>=1.8.6 (from tensorflow==2.0)
  Downloading https://files.pythonhosted.org/packages/e4/23/15fe2dff7163f3191d4d74eaddbd3e241dea185e96447b192068523556

In [2]:
#import key libraries

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt # this is for visualization
import seaborn as sns # for visualization
%matplotlib inline
import statsmodels.api as sm

In [3]:
import tensorflow as tf

# Step 1
Read the dataset & drop the columns which are unique for all users like IDs

In [5]:
df = pd.read_csv('bank.csv')

In [6]:
df.head(10)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4,142051.07,2,0,1,74940.5,0
9,10,15592389,H?,684,France,Male,27,2,134603.88,1,1,1,71725.73,0


In [8]:
df.shape

(10000, 14)

In [9]:
df.describe()

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


In [10]:
#Number of unique in each column?
df.nunique()

RowNumber          10000
CustomerId         10000
Surname             2932
CreditScore          460
Geography              3
Gender                 2
Age                   70
Tenure                11
Balance             6382
NumOfProducts          4
HasCrCard              2
IsActiveMember         2
EstimatedSalary     9999
Exited                 2
dtype: int64

In [11]:
df.drop(columns = {'RowNumber', 'CustomerId', 'Surname'}, inplace=True)

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB


In [18]:
#one hot encoding for the Geography and Gender columns

df1 = pd.get_dummies(df, columns=['Geography', 'Gender'])

In [20]:
# no need to keep both male and female columns as both are highly correlated. dropping male column.
df1.drop(columns = {'Gender_Male'}, inplace=True)

In [23]:
df1.head(10)

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain,Gender_Female
0,619,42,2,0.0,1,1,1,101348.88,1,1,0,0,1
1,608,41,1,83807.86,1,0,1,112542.58,0,0,0,1,1
2,502,42,8,159660.8,3,1,0,113931.57,1,1,0,0,1
3,699,39,1,0.0,2,0,0,93826.63,0,1,0,0,1
4,850,43,2,125510.82,1,1,1,79084.1,0,0,0,1,1
5,645,44,8,113755.78,2,1,0,149756.71,1,0,0,1,0
6,822,50,7,0.0,2,1,1,10062.8,0,1,0,0,0
7,376,29,4,115046.74,4,1,0,119346.88,1,0,1,0,1
8,501,44,4,142051.07,2,0,1,74940.5,0,1,0,0,0
9,684,27,2,134603.88,1,1,1,71725.73,0,1,0,0,0


In [26]:
df1 = df1 [['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Geography_France', 'Geography_Germany', 'Geography_Spain', 'Gender_Female', 'Exited']]

In [27]:
df1.head(10)

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Geography_Spain,Gender_Female,Exited
0,619,42,2,0.0,1,1,1,101348.88,1,0,0,1,1
1,608,41,1,83807.86,1,0,1,112542.58,0,0,1,1,0
2,502,42,8,159660.8,3,1,0,113931.57,1,0,0,1,1
3,699,39,1,0.0,2,0,0,93826.63,1,0,0,1,0
4,850,43,2,125510.82,1,1,1,79084.1,0,0,1,1,0
5,645,44,8,113755.78,2,1,0,149756.71,0,0,1,0,1
6,822,50,7,0.0,2,1,1,10062.8,1,0,0,0,0
7,376,29,4,115046.74,4,1,0,119346.88,0,1,0,1,1
8,501,44,4,142051.07,2,0,1,74940.5,1,0,0,0,0
9,684,27,2,134603.88,1,1,1,71725.73,1,0,0,0,0


# Step 2
Distinguish the features and target variable.

In [28]:
# Import `train_test_split` from `sklearn.model_selection`
from sklearn.model_selection import train_test_split


In [30]:
# Specify the features and target variables data 
X=df1.iloc[:,0:11]

# Specify the target labels and flatten array
y= df1.Exited


# Step 3
Divide the data set into training and test sets.

In [323]:
# Split the data up in train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

y_train =  np.array(y_train)
y_test =  np.array(y_test)

# Step 4
Normalize the train and test data.

In [324]:
from sklearn.preprocessing import StandardScaler

# Define the scaler 
scaler = StandardScaler().fit(X_train)

# Scale the train set
X_train = scaler.transform(X_train)

# Scale the test set
X_test = scaler.transform(X_test)

# Step 5
Initialize & build the model. Identify the points of improvement and implement the same.

In [325]:
# Using Tensorflow Keras instead of the original Keras

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras import optimizers


In [326]:
import matplotlib.pyplot as plt

In [327]:

# Initialize the constructor
model = Sequential()


In [332]:
# Add an first hidden layer 
model.add(Dense(100, activation='relu', kernel_initializer='normal'))

model.add(Dense(1))


In [333]:
sgd = optimizers.Adam(lr = 0.001)

In [334]:
model.compile(loss='binary_crossentropy',optimizer= 'RMSprop' ,metrics=['accuracy'])

In [335]:
model.fit(X_train, y_train, epochs=20, batch_size=200, verbose=1) 

Train on 7000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x2be42f75748>

In [336]:
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Accuracy: %.3f'  % acc)
print('Loss: %.3f' % loss)

Accuracy: 0.870
Loss: 0.360


In [337]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_97 (Dense)             multiple                  9408      
_________________________________________________________________
dense_98 (Dense)             multiple                  78500     
_________________________________________________________________
dense_99 (Dense)             multiple                  1010      
_________________________________________________________________
dense_100 (Dense)            multiple                  1100      
_________________________________________________________________
dense_101 (Dense)            multiple                  101       
Total params: 90,119
Trainable params: 90,119
Non-trainable params: 0
_________________________________________________________________


# Step 6
Predict the results using 0.5 as a threshold

In [367]:
y_predict = model.predict(X_test)

In [368]:
print(y_predict)

[[ 0.05713705]
 [-0.01217546]
 [ 0.03573955]
 ...
 [ 0.05584715]
 [-0.02578628]
 [ 0.1059306 ]]


In [364]:
y_predict[0]

array([0.05713705], dtype=float32)

In [354]:
np.argmax(y_predict[0])

0

# Step 7
Print the Accuracy score and confusion matrix.

In [341]:
from sklearn import metrics

In [365]:
y_pred = []
for val in y_predict:
    y_pred.append(np.argmax(val))
cm = metrics.confusion_matrix(y_test,y_pred)
print(cm)

[[2416    0]
 [ 584    0]]


In [366]:
cr=metrics.classification_report(y_test,y_pred)
print(cr)

              precision    recall  f1-score   support

           0       0.81      1.00      0.89      2416
           1       0.00      0.00      0.00       584

    accuracy                           0.81      3000
   macro avg       0.40      0.50      0.45      3000
weighted avg       0.65      0.81      0.72      3000

