<h1>Predict Customer Churn using Artificial Neural Networks</h1>
<p class='lead'>This demonstration is part of the tutorial for Neural Networks lesson. Here we are going to predict customer churn for a Telecom operator. <b>Customer Churn</b> is defined as the likelihood of a customer moving over to competition. </p> 

<p>Using Deep Learning for a Supervised Learning problem, we will design a neural network that we predict if the customer is likely to churn. </p>

In [1]:
# First let us mount our google drive so that we can access the training and test set from google drive
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
# We will import the libraries that are needed to read and manipulate the datasets
import pandas as pd
import numpy as np

In [3]:
# We read the training dataset and look at the first five rows of the dataset.
train_df = pd.read_csv('/content/gdrive/MyDrive/datasets/Customer Churn/TrainingData.csv')
train_df.head()

Unnamed: 0,Id,state,account_length,area_code,international_plan,voice_mail_plan,number_vmail_messages,total_day_minutes,total_day_calls,total_day_charge,total_eve_minutes,total_eve_calls,total_eve_charge,total_night_minutes,total_night_calls,total_night_charge,total_intl_minutes,total_intl_calls,total_intl_charge,number_customer_service_calls,churn
0,C00000,OH,107,area_code_415,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,no
1,C00001,NJ,137,area_code_415,no,no,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,no
2,C00002,OH,84,area_code_408,yes,no,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,no
3,C00003,OK,75,area_code_415,yes,no,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,no
4,C00004,MA,121,area_code_510,no,yes,24,218.2,88,37.09,348.5,108,29.62,212.6,118,9.57,7.5,7,2.03,3,no


In [4]:
# We do the same for the testing dataset
test_df = pd.read_csv('/content/gdrive/MyDrive/datasets/Customer Churn/TestingData.csv')
test_df.head()

Unnamed: 0,Id,state,account_length,area_code,international_plan,voice_mail_plan,number_vmail_messages,total_day_minutes,total_day_calls,total_day_charge,total_eve_minutes,total_eve_calls,total_eve_charge,total_night_minutes,total_night_calls,total_night_charge,total_intl_minutes,total_intl_calls,total_intl_charge,number_customer_service_calls,churn
0,C000015,TX,73,area_code_415,no,no,0,224.4,90,38.15,159.5,88,13.56,192.8,74,8.68,13.0,2,3.51,1,no
1,C000016,FL,147,area_code_415,no,no,0,155.1,117,26.37,239.7,93,20.37,208.8,133,9.4,10.6,4,2.86,0,no
2,C000017,CO,77,area_code_408,no,no,0,62.4,89,10.61,169.9,121,14.44,209.6,64,9.43,5.7,6,1.54,5,yes
3,C000018,AZ,130,area_code_415,no,no,0,183.0,112,31.11,72.9,99,6.2,181.8,78,8.18,9.5,19,2.57,0,no
4,C000030,AK,136,area_code_415,yes,yes,33,203.9,106,34.66,187.6,99,15.95,101.7,107,4.58,10.5,6,2.84,3,no


In [5]:
train_df.isna().sum()

Id                               0
state                            0
account_length                   0
area_code                        0
international_plan               0
voice_mail_plan                  0
number_vmail_messages            0
total_day_minutes                0
total_day_calls                  0
total_day_charge                 0
total_eve_minutes                0
total_eve_calls                  0
total_eve_charge                 0
total_night_minutes              0
total_night_calls                0
total_night_charge               0
total_intl_minutes               0
total_intl_calls                 0
total_intl_charge                0
number_customer_service_calls    0
churn                            0
dtype: int64

In [6]:
test_df.isna().sum()

Id                               0
state                            0
account_length                   0
area_code                        0
international_plan               0
voice_mail_plan                  0
number_vmail_messages            0
total_day_minutes                0
total_day_calls                  0
total_day_charge                 0
total_eve_minutes                0
total_eve_calls                  0
total_eve_charge                 0
total_night_minutes              0
total_night_calls                0
total_night_charge               0
total_intl_minutes               0
total_intl_calls                 0
total_intl_charge                0
number_customer_service_calls    0
churn                            0
dtype: int64

In [7]:
# This is the first step of feature engineering where we use existing columns and create new features for average length of a call
# during the day time, evenings, night time and international calls.
train_df['total_day_calls'] = train_df['total_day_calls'].apply(lambda x: x if x>0 else x+1)
train_df['total_eve_calls'] = train_df['total_eve_calls'].apply(lambda x: x if x>0 else x+1)
train_df['total_night_calls'] = train_df['total_night_calls'].apply(lambda x: x if x>0 else x+1)
train_df['total_intl_calls'] = train_df['total_intl_calls'].apply(lambda x: x if x>0 else x+1)
train_df['AvgDayCallLength'] = train_df['total_day_minutes']/train_df['total_day_calls'] # Number of minutes per day call
train_df['AvgEveCallLength'] = train_df['total_eve_minutes']/train_df['total_eve_calls'] # Number of minutes per eve call
train_df['AvgNightCallLength'] = train_df['total_night_minutes']/train_df['total_night_calls'] # Number of minutes per night call
train_df['AvgIntlCallLength'] = train_df['total_intl_minutes']/train_df['total_intl_calls'] # Number of minutes per intl call

In [8]:
# We also compute the average charge in $ per call
train_df['total_day_minutes'] = train_df['total_day_minutes'].apply(lambda x: x if x>0 else x+1) 
train_df['total_eve_minutes'] = train_df['total_eve_minutes'].apply(lambda x: x if x>0 else x+1) 
train_df['total_night_minutes'] = train_df['total_night_minutes'].apply(lambda x: x if x>0 else x+1) 
train_df['total_intl_minutes'] = train_df['total_intl_minutes'].apply(lambda x: x if x>0 else x+1) 
train_df['AvgDayCallChargePerMin'] = train_df['total_day_charge']/train_df['total_day_minutes'] # Average Charge per min
train_df['AvgEveCallChargePerMin'] = train_df['total_eve_charge']/train_df['total_eve_minutes'] # Average Charge per min
train_df['AvgNightCallChargePerMin'] = train_df['total_night_charge']/train_df['total_night_minutes'] # Average Charge per min
train_df['AvgIntlCallChargePerMin'] = train_df['total_intl_charge']/train_df['total_intl_minutes'] # Average Charge per min


In [9]:
# We do the same for each record in the test set
test_df['total_day_calls'] = test_df['total_day_calls'].apply(lambda x: x if x>0 else x+1)
test_df['total_eve_calls'] = test_df['total_eve_calls'].apply(lambda x: x if x>0 else x+1)
test_df['total_night_calls'] = test_df['total_night_calls'].apply(lambda x: x if x>0 else x+1)
test_df['total_intl_calls'] = test_df['total_intl_calls'].apply(lambda x: x if x>0 else x+1)
test_df['AvgDayCallLength'] = test_df['total_day_minutes']/test_df['total_day_calls'] # Number of minutes per day call
test_df['AvgEveCallLength'] = test_df['total_eve_minutes']/test_df['total_eve_calls'] # Number of minutes per eve call
test_df['AvgNightCallLength'] = test_df['total_night_minutes']/test_df['total_night_calls'] # Number of minutes per night call
test_df['AvgIntlCallLength'] = test_df['total_intl_minutes']/test_df['total_intl_calls'] # Number of minutes per intl call
test_df['total_day_minutes'] = test_df['total_day_minutes'].apply(lambda x: x if x>0 else x+1) 
test_df['total_eve_minutes'] = test_df['total_eve_minutes'].apply(lambda x: x if x>0 else x+1) 
test_df['total_night_minutes'] = test_df['total_night_minutes'].apply(lambda x: x if x>0 else x+1) 
test_df['total_intl_minutes'] = test_df['total_intl_minutes'].apply(lambda x: x if x>0 else x+1) 
test_df['AvgDayCallChargePerMin'] = test_df['total_day_charge']/test_df['total_day_minutes'] # Average Charge per min
test_df['AvgEveCallChargePerMin'] = test_df['total_eve_charge']/test_df['total_eve_minutes'] # Average Charge per min
test_df['AvgNightCallChargePerMin'] = test_df['total_night_charge']/test_df['total_night_minutes'] # Average Charge per min
test_df['AvgIntlCallChargePerMin'] = test_df['total_intl_charge']/test_df['total_intl_minutes'] # Average Charge per min

In [10]:
# once we have created our features, we drop the columns so that there is no multicollinearity since the feature columns
# are created from the base columns. We also remove the area and state columns as they dont have any significance statistically and
# will add to un-necessary complexity
train_df.drop(['Id','state', 
               'area_code', 
               'total_day_minutes', 
               'total_day_calls', 
               'total_day_charge', 
               'total_eve_minutes', 
               'total_eve_calls', 
               'total_eve_charge', 
               'total_night_minutes', 
               'total_night_calls', 
               'total_night_charge', 
               'total_intl_minutes', 
               'total_intl_calls', 
               'total_intl_charge'], axis=1, inplace=True)


test_df.drop(['Id','state', 
               'area_code', 
               'total_day_minutes', 
               'total_day_calls', 
               'total_day_charge', 
               'total_eve_minutes', 
               'total_eve_calls', 
               'total_eve_charge', 
               'total_night_minutes', 
               'total_night_calls', 
               'total_night_charge', 
               'total_intl_minutes', 
               'total_intl_calls', 
               'total_intl_charge'], axis=1, inplace=True)

In [11]:
train_df.head()

Unnamed: 0,account_length,international_plan,voice_mail_plan,number_vmail_messages,number_customer_service_calls,churn,AvgDayCallLength,AvgEveCallLength,AvgNightCallLength,AvgIntlCallLength,AvgDayCallChargePerMin,AvgEveCallChargePerMin,AvgNightCallChargePerMin,AvgIntlCallChargePerMin
0,107,no,yes,26,1,no,1.313821,1.898058,2.469903,4.566667,0.169988,0.085013,0.045008,0.270073
1,137,no,no,0,0,no,2.135088,1.101818,1.563462,2.44,0.170008,0.084983,0.045018,0.269672
2,84,yes,no,0,2,no,4.216901,0.703409,2.21236,0.942857,0.170007,0.084976,0.044997,0.269697
3,75,yes,no,0,3,no,1.475221,1.215574,1.544628,3.366667,0.170006,0.08503,0.044997,0.270297
4,121,no,yes,24,3,no,2.479545,3.226852,1.801695,1.071429,0.169982,0.084993,0.045014,0.270667


In [12]:
# Now we do one hot encoding for the categorical variables and the target variable since it is yes/no and we need to have it in 1/0 format.
train_df = pd.get_dummies( data=train_df, columns = ['international_plan','voice_mail_plan'],drop_first=True)
train_df['churn'] = train_df['churn'].apply(lambda x: 0 if x=='no' else 1)
test_df = pd.get_dummies( data=test_df, columns = ['international_plan','voice_mail_plan'],drop_first=True)
test_df['churn'] = test_df['churn'].apply(lambda x: 0 if x=='no' else 1)

In [13]:
# Now lets examine our new datasets after feature engineering
train_df.head()

Unnamed: 0,account_length,number_vmail_messages,number_customer_service_calls,churn,AvgDayCallLength,AvgEveCallLength,AvgNightCallLength,AvgIntlCallLength,AvgDayCallChargePerMin,AvgEveCallChargePerMin,AvgNightCallChargePerMin,AvgIntlCallChargePerMin,international_plan_yes,voice_mail_plan_yes
0,107,26,1,0,1.313821,1.898058,2.469903,4.566667,0.169988,0.085013,0.045008,0.270073,0,1
1,137,0,0,0,2.135088,1.101818,1.563462,2.44,0.170008,0.084983,0.045018,0.269672,0,0
2,84,0,2,0,4.216901,0.703409,2.21236,0.942857,0.170007,0.084976,0.044997,0.269697,1,0
3,75,0,3,0,1.475221,1.215574,1.544628,3.366667,0.170006,0.08503,0.044997,0.270297,1,0
4,121,24,3,0,2.479545,3.226852,1.801695,1.071429,0.169982,0.084993,0.045014,0.270667,0,1


In [14]:
test_df.head()

Unnamed: 0,account_length,number_vmail_messages,number_customer_service_calls,churn,AvgDayCallLength,AvgEveCallLength,AvgNightCallLength,AvgIntlCallLength,AvgDayCallChargePerMin,AvgEveCallChargePerMin,AvgNightCallChargePerMin,AvgIntlCallChargePerMin,international_plan_yes,voice_mail_plan_yes
0,73,0,1,0,2.493333,1.8125,2.605405,6.5,0.170009,0.085016,0.045021,0.27,0,0
1,147,0,0,0,1.325641,2.577419,1.569925,2.65,0.170019,0.084981,0.045019,0.269811,0,0
2,77,0,5,1,0.701124,1.404132,3.275,0.95,0.170032,0.084991,0.04499,0.270175,0,0
3,130,0,0,0,1.633929,0.736364,2.330769,0.5,0.17,0.085048,0.044994,0.270526,0,0
4,136,33,3,0,1.923585,1.894949,0.950467,1.75,0.169985,0.085021,0.045034,0.270476,1,1


In [15]:
# We now create our variables for training 
X_train = train_df.drop(['churn'],axis=1)
y_train = train_df[['churn']]
X_test =  test_df.drop(['churn'],axis=1)
y_test = test_df[['churn']]

In [16]:
# Before we build a model, we need to scaler the input data as Neural Networks are highly sensitive to scale of the variables
# We use the standard scaler which computes the z-values for each variable in the dataset. We will use the sklearn library for the standard
# scaler function
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)


In [49]:
# Now comes the most interesting part, which is designing the Neural Network
# We use the Sequential Model from the Tensorflow library. If you remember, our model is sequential - Input layer followed by Hidden Layers and then Output Layer
from tensorflow.keras.models import Sequential
model = Sequential()

In [50]:
# Now we add the first Hidden Layer. In Tensorflow, it is not necessary to define the input layer. The library is intelligent enough.
# The Hidden layer is also called Dense Layer in Tensorflow as every neuron in the layer is connected to every neuron in the preceeding layer
# Our first hidden layer has 512 neurons, uses Activation function as RELU, initializes the weights using the orthogonal method and we 
# have to provide the number of features of the dataset (basically number of neurons in the Input layer) as Input Dim parameter
from tensorflow.keras.layers import Dense, Dropout
model.add(Dense(units=512, activation='relu', input_dim=X_train_sc.shape[1],kernel_initializer='orthogonal', bias_initializer='zeros'))


In [51]:
# We will add subsequent layers 
# We just have to define the number of neurons in each layer and the activation function. Tensorflow is intelligent enough 
# to get the neurons in the prior layer
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=8, activation='relu'))

In [52]:
# Finally we add the Output layer. Remember that this is a Binary problem (predict 0 or 1). 
# So the number of neurons is 1 and the activation is Sigmoid.
model.add(Dense(units=1, activation='sigmoid'))

In [53]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_39 (Dense)             (None, 512)               7168      
_________________________________________________________________
dense_40 (Dense)             (None, 256)               131328    
_________________________________________________________________
dense_41 (Dense)             (None, 256)               65792     
_________________________________________________________________
dense_42 (Dense)             (None, 128)               32896     
_________________________________________________________________
dense_43 (Dense)             (None, 64)                8256      
_________________________________________________________________
dense_44 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_45 (Dense)             (None, 8)                

In [54]:
# We now define the optimization function, loss function and the metrics to monitor performance of the neural network.
# I am using rmsprop as it gives the best accuracy. We define the learning rate which is default 0.01, loss function as binary cross entropy
# remember the cost function from the lesson and we will use accuracy to monitor performance
from tensorflow.keras.optimizers import RMSprop
model.compile(optimizer=RMSprop(learning_rate=0.01), loss='binary_crossentropy', metrics='accuracy')

In [55]:
# Now as with any other model, we fit it to the dataset.
# We define the X, y variables, we define the batch_size, number of epochs and what the model should validate against, which is the test set
model.fit(X_train_sc, y_train, batch_size=32, epochs=100, validation_data = (X_test_sc, y_test))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f14aecbd450>

In [56]:
# So our model is ready and has trained on the data set available. We now have to do the predictions
train_pred = model.predict(X_train_sc)
test_pred = model.predict(X_test_sc)

In [57]:
# We will use the Scikit Learn's Accuracy and Confusion Matrix functions to evaluate the model.
from sklearn.metrics import confusion_matrix, accuracy_score

In [58]:
# Remember that the predictions are probabilities, so when we measure accuracy, we have to convert to binary value using threshold as 0.5
train_acc = accuracy_score(y_pred = train_pred>0.5, y_true=y_train)*100
print(f'Training Set Accuracy : {train_acc} %')

Training Set Accuracy : 90.95 %


In [59]:
print(confusion_matrix(y_pred = train_pred>0.5, y_true=y_train))

[[3428   65]
 [ 297  210]]


In [60]:
# We do the same for test set
test_acc = accuracy_score(y_pred = test_pred>0.5, y_true=y_test)*100
print(f'Test Set Accuracy : {test_acc} %')

Test Set Accuracy : 88.4 %


In [61]:
print(confusion_matrix(y_pred = test_pred>0.5, y_true=y_test))

[[840  27]
 [ 89  44]]


<h1> Conclusion </h1>
<p class='lead' align='justified'>We have seen that our basic Neural Network with a few layers and training across multiple epochs is able to achieve an accuracy of 91% on training set and 88% on the test set, which is not bad given that we have not tuned any hyperparameters. Keras provides <b> Keras Tuner </b> which can be used to tune the hyperparameters, but that is a topic for discussion for another tutorial. </p>
<p class='lead' justify> It is possible that regular algorithms such as Logistic Regression, Decision Trees might give a better accuracy with this given dataset but if the data set is huge (millions of customers with millions of records), then a Neural Network is likely to give a better outcome</p>

<p class='lead'> With this we come to the end of the demonstration. This demo was aimed to reinforce the concepts taught in the class through the video. </p>