Github link: https://github.com/angels21/churn

## Bank Churn Prediction
Businesses like banks which provide service have to worry about problem of 'Churn' i.e. customers
leaving and joining another service provider. It is important to understand which aspects of the service
influence a customer's decision in this regard. Management can concentrate efforts on improvement of
service, keeping in mind these priorities.

**Objective:**Given a Bank customer, build a neural network based classifier that can determine whether they will leave or not in the next 6 months.

###Data Description:
The case study is from an open-source dataset from Kaggle.
The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore,
Geography, Gender, Age, Tenure, Balance etc.
Link to the Kaggle project site:
https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

Install tensorflow

In [None]:
!pip install tensorflow==2.0



Import tensorflow library

In [None]:
import tensorflow as tf
print(tf.__version__)

2.0.0


Import libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential#To initialize neural network
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization #Used to create neural network layers
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, auc
import matplotlib.pyplot as plt
from tensorflow.keras import optimizers

1. Read the dataset

In [None]:
#Import google drive library
from google.colab import drive

In [None]:
#Mount the drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [None]:
#Point drive to project directory
project_path = '/content/drive/My Drive/Colab Notebooks/'

In [None]:
#Set path to the project directory
dataset_file = project_path + 'Churn_Modelling.csv'

In [None]:
#Read file from the project directory
data = pd.read_csv(dataset_file)

In [None]:
#Display first five rows of the dataset
data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


Majority of features in the dataset are numeric and only five are categorical. Two(Gender and Geography) out of the five need to be encoded. Also, the features "RowNumber","CustomerId" and "Surname" are unique and adds little value to this analysis hence will be dropped in the next step. 

In [None]:
#Check for non-null objects and data types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


From the above table,it can be seen that there are no non-number values hence we can continue with this analysis.

2. Drop the columns which are unique for all users like IDs (5 points)

In [None]:
#Dropping unique features and display first five rows
data = data.drop(["RowNumber","CustomerId","Surname"], axis = 1)
data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


3. Distinguish the features and target variable (5 points)

In [None]:
#Assign predictor columns to predictor variable X_data
X_data = data.iloc[:, :-1]

In [None]:
#Display number of rows and columns of predictor variables
X_data.shape

(10000, 10)

In [None]:
#Display first five rows of predictor variables 
X_data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,619,France,Female,42,2,0.0,1,1,1,101348.88
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58
2,502,France,Female,42,8,159660.8,3,1,0,113931.57
3,699,France,Female,39,1,0.0,2,0,0,93826.63
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1


In [None]:
#Assign target column to target variable y_data
y_data = data.iloc[:, -1]

In [None]:
#Display number of rows and columns of target variable
y_data.shape

(10000,)

In [None]:
#Display first 5 rows of predictor and target variables in form of an array
X_data1=X_data.values
y_data1=y_data.values
print(X_data1[:10,:], '\n')
print(y_data1[:10])

[[619 'France' 'Female' 42 2 0.0 1 1 1 101348.88]
 [608 'Spain' 'Female' 41 1 83807.86 1 0 1 112542.58]
 [502 'France' 'Female' 42 8 159660.8 3 1 0 113931.57]
 [699 'France' 'Female' 39 1 0.0 2 0 0 93826.63]
 [850 'Spain' 'Female' 43 2 125510.82 1 1 1 79084.1]
 [645 'Spain' 'Male' 44 8 113755.78 2 1 0 149756.71]
 [822 'France' 'Male' 50 7 0.0 2 1 1 10062.8]
 [376 'Germany' 'Female' 29 4 115046.74 4 1 0 119346.88]
 [501 'France' 'Male' 44 4 142051.07 2 0 1 74940.5]
 [684 'France' 'Male' 27 2 134603.88 1 1 1 71725.73]] 

[1 0 1 0 0 1 0 1 0 0]


In step 1, we mentioned Geography and Gender need to be encoded. That will be done in the next cell

In [None]:
#One hot encoding for gender and geography
X_data = pd.get_dummies(X_data)
#Dropping one of each encoded feature will improve computation time and stability without affecting the result
X_data = X_data.drop(['Geography_Spain','Gender_Male'], axis=1)
#Convert dataset to float
X_data = X_data.astype('float32')
X_data

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Gender_Female
0,619.0,42.0,2.0,0.000000,1.0,1.0,1.0,101348.882812,1.0,0.0,1.0
1,608.0,41.0,1.0,83807.859375,1.0,0.0,1.0,112542.578125,0.0,0.0,1.0
2,502.0,42.0,8.0,159660.796875,3.0,1.0,0.0,113931.570312,1.0,0.0,1.0
3,699.0,39.0,1.0,0.000000,2.0,0.0,0.0,93826.632812,1.0,0.0,1.0
4,850.0,43.0,2.0,125510.820312,1.0,1.0,1.0,79084.101562,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...
9995,771.0,39.0,5.0,0.000000,2.0,1.0,0.0,96270.640625,1.0,0.0,0.0
9996,516.0,35.0,10.0,57369.609375,1.0,1.0,1.0,101699.773438,1.0,0.0,0.0
9997,709.0,36.0,7.0,0.000000,1.0,0.0,1.0,42085.578125,1.0,0.0,1.0
9998,772.0,42.0,3.0,75075.312500,2.0,1.0,0.0,92888.523438,0.0,1.0,0.0


4. Divide the data set into training and test sets (5 points)


In [None]:
#Splitting the dataset into train and test in the ratio 80:20 percent
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.2, random_state = 0)

5. Normalize the train and test data (10 points)

In [None]:
#Normalizing train and test data
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

6. Initialize & build the model. Identify the points of improvement and implement the same the same.(20)

**Creating a model:** Keras model object can be created with Sequential class. At the outset, the model is empty per se. It is completed by adding additional layers and compilation

In [None]:
#Selecting the model to be used
model = Sequential()

**Adding layers [layers and activations]:** Keras layers can be added to the model. Adding layers are like stacking lego blocks one by one. It should be noted that as this is a classification problem, sigmoid layer (softmax for multi-class problems) should be added

In [None]:
model.add(Dense(16, input_dim = 11, activation = 'relu')) #First Hidden Layer With 16 nodes and 11 nodes from input layer
model.add(Dense(8, activation = 'relu')) #Second Hidden Layer with 8 nodes
model.add(Dense(1, activation = 'sigmoid')) #Output Layer with 1 node for classification problem

**Model compile [optimizers and loss functions]:** Keras model should be "compiled" prior to training. Types of loss (function) and optimizer should be designated

In [None]:
#Defining the optimizer and learning rate 
sgd = optimizers.Adam(lr = 0.001)

In [None]:
#Setting optimizer and loss function
model.compile(optimizer = sgd, loss = 'binary_crossentropy', metrics=['accuracy'])

In [None]:
#Display model summary of layers and parameters
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                192       
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 9         
Total params: 337
Trainable params: 337
Non-trainable params: 0
_________________________________________________________________


In [None]:
#Fitting the model using forward and bacward propagation
model.fit(X_train, y_train.values, batch_size = 700, epochs = 100, verbose = 1)

Train on 8000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100

<tensorflow.python.keras.callbacks.History at 0x7f8a31c33ac8>

In [None]:
#Evaluating the results on test data
results = model.evaluate(X_test, y_test.values)



In [None]:
#Print the loss and accuracy
print(model.metrics_names)
print(results)

['loss', 'accuracy']
[0.35095438599586487, 0.8625]


The above model was fitted using hyper parameter that were chosen radomly. In the below steps, the hyper parameters will be tweaked to improve the accuracy score of the model. The following would be carried out below: 

1.   The activation function in the hidden layers would be changed to 'elu' since it usually produces the most accurate result
2.   Optimizers such Adamax, Nadam, Ftrl would be set in turn and the one with the highest accuracy would be used for the next step.
3.   The learning rates would be adjusted above and below 0.001 with activation function still set as 'elu' and the optimizer obtained in step 2.
4.   Finally, with the above settings, The batch size and epoch were also adjusted to obtain the best accuracy score





In [None]:
model.add(Dense(16, input_dim = 11, activation = 'elu')) #First Hidden Layer With 16 nodes and 11 nodes from input layer
model.add(Dense(8, activation = 'elu')) #Second Hidden Layer with 8 nodes
model.add(Dense(1, activation = 'sigmoid')) #Output Layer with 1 node for classification problem

In [None]:
#Defining the optimizer and learning rate 
sgd = optimizers.Adamax(lr = 0.0005)

In [None]:
#Setting optimizer and loss function
model.compile(optimizer = sgd, loss = 'binary_crossentropy', metrics=['accuracy'])


In [None]:
#Fitting the model using forward and bacward propagation
model.fit(X_train, y_train.values, batch_size = 700, epochs = 200, verbose = 1)

Train on 8000 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200

<tensorflow.python.keras.callbacks.History at 0x7f8a30893da0>

In [None]:
results = model.evaluate(X_test, y_test.values)
print(model.metrics_names)
print(results)

['loss', 'accuracy']
[0.3419598941802979, 0.8655]


After following the steps in cell 28(shown in cells [29]-[33], the accuracy of the model increased from 86.25 to 86.55 and the loss reduced from 0.351 to 0.341

7. Predict the results using 0.5 as a threshold (10 points)

In [None]:
#Predicting the Test set results 
y_pred = model.predict(X_test)
#For a Threshold of 50%
y_pred = (y_pred > 0.5)
print(y_pred[:10])

[[False]
 [False]
 [False]
 [False]
 [False]
 [ True]
 [False]
 [False]
 [False]
 [ True]]


From the array above, the model predicts that the sixth and tenth customer on the data set have a high likelyhood of leaving the bank

8. Print the Accuracy score and confusion matrix (5 points)


In [None]:
#Making the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
cm

array([[1520,   75],
       [ 194,  211]])

From the confusion matrix, out of 2000 observation the model accurately predicted 1520 plus 211 correct predictions and 194 plus 75 incorrect predictions.

In [37]:
#Printing the Accuracy
accuracy = (1520+211)/(2000)
accuracy

0.8655

The model was able to predict the probability of customers leaving the bank with an accuracy of 86.55%. This means that the model can be deployed.