# Deep Learning - ANN - Churn Modelling

In this project we will be considering the data of customers from a bank.The bank has seen unusually higher churn rates and want to check on the factors driving it. Churn is basically when people leave the bank. Our objective is to identify the customers which potentially have a higher churn rate and provide the insights to the business team.

In [1]:
#Import the required libraries
import pandas as pd
import numpy as np
import tensorflow as tf

In [2]:
#Loading the data
data=pd.read_csv('Churn_Modelling.csv',index_col='RowNumber')

In [3]:
data.head()

Unnamed: 0_level_0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
RowNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## Data Preprocessing

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 1 to 10000
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CustomerId       10000 non-null  int64  
 1   Surname          10000 non-null  object 
 2   CreditScore      10000 non-null  int64  
 3   Geography        10000 non-null  object 
 4   Gender           10000 non-null  object 
 5   Age              10000 non-null  int64  
 6   Tenure           10000 non-null  int64  
 7   Balance          10000 non-null  float64
 8   NumOfProducts    10000 non-null  int64  
 9   HasCrCard        10000 non-null  int64  
 10  IsActiveMember   10000 non-null  int64  
 11  EstimatedSalary  10000 non-null  float64
 12  Exited           10000 non-null  int64  
dtypes: float64(2), int64(8), object(3)
memory usage: 1.1+ MB


In [5]:
data.isnull().sum()

CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

As we can see we have the attributes of the customers as below:

| Column          | Description                                                                     |
|-----------------|---------------------------------------------------------------------------------|
| CustomerId      | Unique Identification Number of the Customer                                    |
| Surname         | Surname of the customer                                                         |
| CreditScore     | Credit Score of the Customer                                                    |
| Geography       | Resident country of the customer                                                |
| Gender          | Gender of the customer                                                          |
| Age             | Age of the customer                                                             |
| Tenure          | Length of their time with the bank in years                                     |
| Balance         | Bank Balance of the customers                                                   |
| NumOfProducts   | Number of products owned by the customer                                        |
| HasCrCard       | If the Customer has a Credit Card                                               |
| IsActiveMember  | If the Customer is an Active member(had a transaction in the past two   months) |
| EstimatedSalary | Estimated Salary of the customer                                                |
| Exited          | If the Customer has exited the bank(churned)                                    |

From the above we can see that the CustomerId column is not really value adding since it is just a identification number.
The same goes for Surname as well. It will just tell the name of the customer.
The Exited column will be our dependent variable since it tells us whether the customer is churned or not.

In [6]:
#Split the data into X and Y (independent and dependent variables)
X=data.iloc[:,2:-1]  # Since we are getting rid of CustomerId column
                     #Also we do not need the Exited column because it is dependent column
Y=data.iloc[:,-1]
X.head()

Unnamed: 0_level_0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
RowNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,619,France,Female,42,2,0.0,1,1,1,101348.88
2,608,Spain,Female,41,1,83807.86,1,0,1,112542.58
3,502,France,Female,42,8,159660.8,3,1,0,113931.57
4,699,France,Female,39,1,0.0,2,0,0,93826.63
5,850,Spain,Female,43,2,125510.82,1,1,1,79084.1


In [7]:
# Dealing with the categorical variables

#Identifying the categorical variables

X.select_dtypes(include='object').columns


Index(['Geography', 'Gender'], dtype='object')

In [8]:
# Distribution of Geography variable
X['Geography'].value_counts()

France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64

In [9]:
# Distribution of Gender variable
X['Gender'].value_counts()

Male      5457
Female    4543
Name: Gender, dtype: int64

In [10]:
# Converting the above to dummy variables 

# Will by default encode all the categorical variables.
#Not required to pass them seperately
#drop_first will drop the first category from the respective categorical variable
X_dummies=pd.get_dummies(X,drop_first=True)
X_dummies.head()


Unnamed: 0_level_0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_Germany,Geography_Spain,Gender_Male
RowNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,619,42,2,0.0,1,1,1,101348.88,0,0,0
2,608,41,1,83807.86,1,0,1,112542.58,0,1,0
3,502,42,8,159660.8,3,1,0,113931.57,0,0,0
4,699,39,1,0.0,2,0,0,93826.63,0,0,0
5,850,43,2,125510.82,1,1,1,79084.1,0,1,0


In [11]:
# Splitting the dataset into training and testing
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X_dummies,Y,test_size=0.2,random_state=0)

In [12]:
#Feature Scaling

from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train_scaled=sc.fit_transform(X_train)
# Only transfor the test data as per parameters in the train data
X_test_scaled=sc.transform(X_test)

# Building the ANN

In [13]:
# Initializing the ANN
ann=tf.keras.models.Sequential()

# Add the input layer and the first hidden layer
'''
Input layer will have different features as neurons
The hidden layer will have as many neurons as specified in the units arguement
average of nodes in input layers(no of independent variables)
take average of input nodes and output nodes which is 6
input_dim is the number of independent variables
init initializes the weights randomly'''
ann.add(tf.keras.layers.Dense(units=6,activation='relu')) #rectifier activation fn

# Add the second hidden layer
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))

# Add the output layer
'''
Will only need to change the units and activation fn
units will depend on the classes in dependent variable
since our dependent variable is binary we only need one neuron to predict 0 or 1
But if we have three classes ABC in dependent variable we will need three neurons to predict the outcome class
A=100,B=010,C=001
Activation function is sigmoid because we have to predict the probability of customer being churned and for sigmoid values lie 
between 0 and 1
For non binary the activation functin will be softmax'''
ann.add(tf.keras.layers.Dense(units=1,activation='sigmoid'))


# Training the ANN

In [14]:
# Compiling the ann

'''
Optimizer adam will perform stochastic gradient descent
loss is binary_crossentropy because dependent variable is binary.
If categorical it will be categorical_crossentropy
evaluation will be done basis accuracy'''

ann.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [15]:
# Train the ann on the training set
ann.fit(X_train_scaled,Y_train,batch_size=32,epochs=100)  # batch_size is to run and compare the ann in batches to actuals and readj wts

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1d6b00774c8>

# Making Predictions

In [25]:
Y_pred=ann.predict(X_test_scaled) 
#Above will give probabilities because we have the sigmid activation fn
# Coverting thr prob to labels.>0.5 customer will churn else will not churn
Y_pred=(Y_pred>0.5)  

In [26]:
# Confusion matrix
from sklearn.metrics import confusion_matrix,accuracy_score
cm=confusion_matrix(Y_test,Y_pred)
print(cm)

[[1517   78]
 [ 210  195]]


In [27]:
accuracy_score(Y_test,Y_pred)

0.856

# Conclusion
As we have seen above our model has given us an accuracy of ~86% which is very good. However we can still attempt to increase the accuracy by further playing around with the number of layers to add or hyper parameter tuning.