# Churn Modelling using Keras

In [2]:
import pandas as pd
#setting pandas to show all columns
pd.set_option('display.max_columns', None)

In [3]:
#Loading the CSV into a Data Frame Object
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')

### Let's take a look at our Data

In [4]:
df.head(3)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes


CustomerID does not tell us anything, therefore we can delete it from the Data Frame

In [5]:
df.drop('customerID', axis=1, inplace=True)

With the exception of <b>Tenure</b>, <b>TotalCharges</b>, <b>MonthlyCharges</b> and <b>Churn</b> all the other variables are <b>Categorical</b>, so we should get their dummie variables.<br>
Always remember to drop the first dummie colums for two basic reasons:<br>
1 - It is always possible to <b>infer</b> the values you dropped given the values you kept.<br>
2 - To avoid the <b>Dummie Variable Trap</b>.<br>
If you don't know what that is, take a look at this webpage:<br>
<a>http://www.algosome.com/articles/dummy-variable-trap-regression.html<a>

In [6]:
X = pd.get_dummies(df.drop(['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn'], axis=1), drop_first=True)
#Putting together our independent variables
X = pd.concat([X, df[['tenure', 'MonthlyCharges', 'TotalCharges']]], axis=1)

In [7]:
#Our dependent variables are basically the df 'Churn' column
y = pd.get_dummies(df['Churn'], drop_first=True).values.ravel()

### Taking a deeper look

In [8]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 30 columns):
SeniorCitizen                            7043 non-null int64
gender_Male                              7043 non-null uint8
Partner_Yes                              7043 non-null uint8
Dependents_Yes                           7043 non-null uint8
PhoneService_Yes                         7043 non-null uint8
MultipleLines_No phone service           7043 non-null uint8
MultipleLines_Yes                        7043 non-null uint8
InternetService_Fiber optic              7043 non-null uint8
InternetService_No                       7043 non-null uint8
OnlineSecurity_No internet service       7043 non-null uint8
OnlineSecurity_Yes                       7043 non-null uint8
OnlineBackup_No internet service         7043 non-null uint8
OnlineBackup_Yes                         7043 non-null uint8
DeviceProtection_No internet service     7043 non-null uint8
DeviceProtection_Yes                   

Looking carefully it is possible to see that <b>TotalCharges</b> is being treated as a <b>Object</b> and we need it to be a <b>Float</b>.<br>
So, let's us take care of that.

In [9]:
X['TotalCharges'] = pd.to_numeric(X['TotalCharges'], errors='coerce')
X['TotalCharges'].fillna(0, inplace=True)

In [10]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 30 columns):
SeniorCitizen                            7043 non-null int64
gender_Male                              7043 non-null uint8
Partner_Yes                              7043 non-null uint8
Dependents_Yes                           7043 non-null uint8
PhoneService_Yes                         7043 non-null uint8
MultipleLines_No phone service           7043 non-null uint8
MultipleLines_Yes                        7043 non-null uint8
InternetService_Fiber optic              7043 non-null uint8
InternetService_No                       7043 non-null uint8
OnlineSecurity_No internet service       7043 non-null uint8
OnlineSecurity_Yes                       7043 non-null uint8
OnlineBackup_No internet service         7043 non-null uint8
OnlineBackup_Yes                         7043 non-null uint8
DeviceProtection_No internet service     7043 non-null uint8
DeviceProtection_Yes                   

<b>Done!</b><br>Now we can move on to the next step.

### Scaling the independent variables

It is always a common practice to scale our data, specially if there are some values in it that would make the smaller values seem insignificant.

In [11]:
from sklearn.preprocessing import StandardScaler
scl = StandardScaler()
X = scl.fit_transform(X)

It is, however, unnecessary to scale the dependent variables. Cause that wouldn't add any value to our model.

### Spliting into Train and Test sets

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=101)

## Now what?

Now that we have our <b>train</b> and <b>test</b> data, we must decide which way to go.<br>
We could simply <b>fit</b> the model to the train data and make some <b>predictions</b> with it, it could even achieve a <b>high accuracy</b>. But who can garantee that its accuracy wasn't <b>just a coincidence</b>?<br><br>
We'll get to that later on.

### The simplest way

In [16]:
#import keras
import keras
from keras.models import Sequential
from keras.layers import Dense

Why use <b>Keras</b>?<br>
<b>Keras</b> is a <b>High Level API</b> written in <b>Python</b> and it is, above all, <b>very easy</b> to learn and implement. <br><br>
So... let us proceed.

In [20]:
#creating the Model
model = Sequential()
model.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=30)) # first hidden layer
model.add(Dense(units=16, kernel_initializer='uniform', activation='relu')) # second hidden layer
model.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid')) # output layer
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) #compiling the model

### What the heck happend here?!
<br>
<b>Sequential</b> model is a linear stack of layers.<br>
<b>Dense</b> is a Densely-connected Neural Network Layer. It makes it incredibly easy to add layers to our NN.<br>
The <b>units</b> parameter is the number of neurons in the current layer. It is a rule of thumb to use as it's value the sum of inputs and outputs divided by two. But, feel free to mess around with it and see what happens.<br>
The <b>activation</b> parameter is the activation function to use in that particular layer.<br>
The <b>input_dim</b> parameter is the dimension of the input layer.<br>
<b>Compile</b> is the method that states the learning process of our model. It does so by defining an optimizer, a loss function (what the NN tries to minimize) and a list of metrics.<br><br>
For more info, take a look at Keras documentation:<br>
<a>https://keras.io/getting-started/sequential-model-guide/<a>

Now that our model is <b>built</b>, we should <b>fit</b> it.

In [21]:
model.fit(X_train, y_train, batch_size=20, epochs=300, verbose=1)

Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78

Epoch 164/300
Epoch 165/300
Epoch 166/300
Epoch 167/300
Epoch 168/300
Epoch 169/300
Epoch 170/300
Epoch 171/300
Epoch 172/300
Epoch 173/300
Epoch 174/300
Epoch 175/300
Epoch 176/300
Epoch 177/300
Epoch 178/300
Epoch 179/300
Epoch 180/300
Epoch 181/300
Epoch 182/300
Epoch 183/300
Epoch 184/300
Epoch 185/300
Epoch 186/300
Epoch 187/300
Epoch 188/300
Epoch 189/300
Epoch 190/300
Epoch 191/300
Epoch 192/300
Epoch 193/300
Epoch 194/300
Epoch 195/300
Epoch 196/300
Epoch 197/300
Epoch 198/300
Epoch 199/300
Epoch 200/300
Epoch 201/300
Epoch 202/300
Epoch 203/300
Epoch 204/300
Epoch 205/300
Epoch 206/300
Epoch 207/300
Epoch 208/300
Epoch 209/300
Epoch 210/300
Epoch 211/300
Epoch 212/300
Epoch 213/300
Epoch 214/300
Epoch 215/300
Epoch 216/300
Epoch 217/300
Epoch 218/300
Epoch 219/300
Epoch 220/300
Epoch 221/300
Epoch 222/300
Epoch 223/300
Epoch 224/300
Epoch 225/300
Epoch 226/300
Epoch 227/300
Epoch 228/300
Epoch 229/300
Epoch 230/300
Epoch 231/300
Epoch 232/300
Epoch 233/300
Epoch 234/300
Epoch 

<keras.callbacks.History at 0x2ac2cbead68>

And now let's make some <b>predictions</b>.

In [22]:
y_pred = model.predict(X_test)

### Is that it?
<br>
Hold your horses, cowboy!<br>
If you are anywhat familiar whit the math of a <b>Sigmoid Function</b> you should know that a sigmoid outputs the likelihood of some event.<br>
If we take a look at our <b>y_pred</b> variable, we would see that it has the probabilities of a customer churn.<br>
To see how our model performed, however, we should have a <b>True</b>-<b>False</b> variable. And to do so, we must define a threshold to decide when a value becomes 0 and when it becomes 1.<br><br>
Our threshold here will be <b>.5</b>

In [23]:
y_pred = (y_pred > 0.5)

### To the performance!

In [24]:
from sklearn.metrics import confusion_matrix, classification_report
print (confusion_matrix(y_test, y_pred))
print ('\n')
print (classification_report(y_test, y_pred))

[[896 130]
 [200 183]]


             precision    recall  f1-score   support

          0       0.82      0.87      0.84      1026
          1       0.58      0.48      0.53       383

avg / total       0.75      0.77      0.76      1409



### Wait a second...
<br>
We got a relatively <b>lower accuracy</b> in the test set than in the train set. But <b>why</b>?<br>
It could mean two things:<br>
<b>Overfitting</b>, or...<br>
Remember when I said earlier that it <b>wasn't a good idea</b> to just simply train the model? Well, I didn't. But I never said that it was a good one, too.<br>
The <b>accuracy</b> obtained while training the model could have been just a <b>coincidence</b>, just as I said.<br><br>
One way to prevent that is to use SciKit-learn's <b>Cross Validation</b>.<br><br>
So let's do just that.

In [26]:
from sklearn.model_selection import cross_val_score
from keras.wrappers.scikit_learn import KerasClassifier

In order to use the <b>Cross Validation</b>, we should follow some simple steps:<br>
1 - We should