# Customer Churn Prediction using Deep Learning

In [1]:
import pandas as pd
import numpy as np
import keras 
import matplotlib.pyplot as plt

%matplotlib inline

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


#### Read in the data

In [3]:
# Read in the data
df = pd.read_csv(r'C:\Users\amitr\OneDrive\Desktop\Deep Learning\Data\Churn_Modelling.csv')
df.drop(['RowNumber','CustomerId','Surname'], axis=1, inplace=True)
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


#### Data Preprocessing

In [20]:
# Create Training and Test sets

from sklearn.model_selection import train_test_split

y = df['Exited']
X = df.drop('Exited',axis=1)

X = pd.get_dummies(X,drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)


#### Feature Scaling

In [21]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Create the Artificial Neural Network

In [24]:
import keras
from keras.models import Sequential
from keras.layers import Dense

##### How to choose the nodes in the hidden layer

It's an art and there is no specific rule. However, a good starting point is to select an average of input and output nodes.

In this case we have 11 input variables and 1 output (yes/no)...hence we can consider (11 + 1)/ 2 = 6 nodes

##### Parameters to specify for the hidden layers we add
1) Units --> Dimensionality of the output space. Same as output_dim in the earlier versions. Set his to 6 as above

2) Kernel_initializer --> use 'uniform'. That assigns a small starting weight to each node considering a normal distribution

3) Activation --> We'll use the ReLU (Rectified Linear Unit) function for the hidden layers, plus a sigmoid for the final output layer

4) input_dim --> this is also mandatory. Set it as the number of independent variables in the input data (use df.shape to know the number of inputs)

In [40]:
# Create the layers
classifier = Sequential()

# Adding the first hidden layer
classifier.add(Dense(units=6,
                     kernel_initializer='uniform',
                     activation='relu',
                     input_dim=11))

# Adding the second hidden layer
classifier.add(Dense(units=6,
                     kernel_initializer='uniform',
                     activation='relu'))  # don't need to specify the input_dim for subsequent layers

# Add the final output layer
classifier.add(Dense(units=1,   # we have a binary outcome Yes/No
                     kernel_initializer = 'uniform',
                     activation = 'sigmoid'))  # for more than 2 categories use 'softmax'

##### Compiling the network

This adds gradient descent to the Neural Network and makes it ready to execute

1) optimizer --> We have various gradient descent algorithms like stochasticetc. We'll use a common one called 'adam'

2) loss function --> since we are using the sigmoid function in the last layer this should be a logarithmic function (same as used for logistic regression). Set it as binary_crossentropy

3) metrics -> method to evaluate the model after every batch to improve the model. 'accuracy' is the most common

In [41]:
# Compile the ANN
classifier.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

##### Now we have configured the ANN. Next we'll fit it to our data

1) batch_size --> Specify whether the weights would be updated after each observation (Reinforcement learning) or after several observations (batch learning)

2) epochs --> how many times to loop through the entire data

In [42]:
# Fit the ANN to the data
classifier.fit(X_train, y_train, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x19fe2be7f28>

##### Now time to predict the outcomes

This is the same as classical models. We'll follow the same steps

In [52]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_pred = classifier.predict(X_test)
y_pred = y_pred > 0.5

print ("Accuracy on test set : ", accuracy_score(y_test, y_pred))

print ("\nConfusion Matrix : \n", confusion_matrix(y_test, y_pred))

Accuracy on test set :  0.8585

Confusion Matrix : 
 [[1528   58]
 [ 225  189]]


#### Homework assignment : Predicting for a specific customer

Geography: France

Credit Score: 600

Gender: Male

Age: 40 years old

Tenure: 3 years

Balance: $60000

Number of Products: 2

Does this customer have a credit card ? Yes

Is this customer an Active Member: Yes

Estimated Salary: $50000

So should we say goodbye to that customer ?

In [74]:
# Select one row and edit it

X_cust = X[:1].copy()

X_cust['CreditScore'] = 600
X_cust['Age'] = 40
X_cust['Tenure'] = 3
X_cust['Balance'] = 60000
X_cust['NumOfProducts'] = 2
X_cust['EstimatedSalary'] = 50000
X_cust['Gender_Male'] = 1

X_cust

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_Germany,Geography_Spain,Gender_Male
0,600,40,3,60000,2,1,1,50000,0,0,1


In [75]:
# Scale the data
X_cust = sc.transform(X_cust)
X_cust

array([[-0.51795799,  0.10698342, -0.69673114, -0.26352876,  0.79286681,
         0.64298333,  0.97067965, -0.86629511, -0.57946723, -0.57677292,
         0.90682052]])

In [84]:
# Predict if we should retain the customer

pred = classifier.predict(X_cust)

decision = np.where (pred > 0.5, "Yes","No")

print ("Should we retain the customer? : ", decision)

Should we retain the customer? :  [['No']]
