## Lets Do a Churn Modelling Using an Artificial Neural Network

In [211]:
## Set the working directory
import os
os.chdir(r'D:\Learning\deeplearning\Neural_Nets')

# Classification template

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
# Including all columns from 3 through 12 (Remember that the upper bound should be 13 to add 12th row)
# We believe all these features have an impact on customer churn.
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

In [215]:
X.shape

(10000, 10)

In [216]:
y.shape

(10000,)

Now that we have the dataset, we need to make sure that the data is in the best shape  for us to apply a Neural Network. Turns out our data has Categorical variables  in the form of Strings as we could see from the xls file. Hence, they need to be encoded before inputting them into a Neural Net.

Our dependent variable (churn) is also categorical, but its binary and takes only 1s and 0s. So we don't need to encode it into numbers cos its already in numerical form. So, right now, we need only to encode our dependent variables that are strings and are categorical variables.

We'll use the LabelEncoder and OneHotEncoder from Python Scikit library.

#### Lets see what X contains before we go ahead and enode.

In [217]:
## Convert X into a pandas dataframe for excel sheet like view
import pandas as pd

In [218]:
df = pd.DataFrame(X)

In [219]:
# Lets create a function out of this as it might come in handy later on.

def create_dataframe(numpy_array):
    return(pd.DataFrame(numpy_array))

In [220]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,France,Female,42,2,0.0,1,1,1,101349.0
1,608,Spain,Female,41,1,83807.9,1,0,1,112543.0
2,502,France,Female,42,8,159661.0,3,1,0,113932.0
3,699,France,Female,39,1,0.0,2,0,0,93826.6
4,850,Spain,Female,43,2,125511.0,1,1,1,79084.1
5,645,Spain,Male,44,8,113756.0,2,1,0,149757.0
6,822,France,Male,50,7,0.0,2,1,1,10062.8
7,376,Germany,Female,29,4,115047.0,4,1,0,119347.0
8,501,France,Male,44,4,142051.0,2,0,1,74940.5
9,684,France,Male,27,2,134604.0,1,1,1,71725.7


We could see that only the Gender and Country fields are categorical.

In [221]:
X.shape

(10000, 10)

In [222]:
# We'll define a function to take care of encoding for us. Before that, we'll import the packages.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

def categorical_encoder(data, index):
    label_encoder = LabelEncoder()
    data[:, index] = label_encoder.fit_transform(data[:, index])
    return(data)

The function takes in the numpy array to be processed and the index of the field to be encoded.

In [223]:
# Lets try that
# The index of the field Country is 1
X = categorical_encoder(X,1)

In [224]:
# Lets see what X has in store for us now.

df = create_dataframe(X)

In [225]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,0,Female,42,2,0.0,1,1,1,101349.0
1,608,2,Female,41,1,83807.9,1,0,1,112543.0
2,502,0,Female,42,8,159661.0,3,1,0,113932.0
3,699,0,Female,39,1,0.0,2,0,0,93826.6
4,850,2,Female,43,2,125511.0,1,1,1,79084.1
5,645,2,Male,44,8,113756.0,2,1,0,149757.0
6,822,0,Male,50,7,0.0,2,1,1,10062.8
7,376,1,Female,29,4,115047.0,4,1,0,119347.0
8,501,0,Male,44,4,142051.0,2,0,1,74940.5
9,684,0,Male,27,2,134604.0,1,1,1,71725.7


Looks like the countries have been encoded with France as 0, Germany as 1 and Spain as 2. Wait a sec. They are not ordinal and hence there is no relational ordering between them. Even though the numbers assigned to these countries are purely random in nature, they are in no way  better or worse than the others as the numbers suggest. We need to fix this.

We could create Dummy Variables to fix this issue.

In [226]:
# Lets create another function to do this.
# Excuse me for the bad choice of names for the function. But its descriptive!! 

# Once again, the numpy array and the index of the field to be encoded are the inputs to this function.
def dummy_variable_maker(data, index):
    onehotencoder = OneHotEncoder(categorical_features = [index])
    data = onehotencoder.fit_transform(data).toarray()
    return(data)

In [227]:
X = dummy_variable_maker(X,1)

ValueError: could not convert string to float: 'Female'

## Oops!! Looks like we can't encode Countries unless we fix Gender first!

In [228]:
# Lets do that.
# Index is 2 for gender
X = categorical_encoder(X, 2)

In [229]:
df = create_dataframe(X)

In [230]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,619,0,0,42,2,0.0,1,1,1,101349.0
1,608,2,0,41,1,83807.9,1,0,1,112543.0
2,502,0,0,42,8,159661.0,3,1,0,113932.0
3,699,0,0,39,1,0.0,2,0,0,93826.6
4,850,2,0,43,2,125511.0,1,1,1,79084.1
5,645,2,1,44,8,113756.0,2,1,0,149757.0
6,822,0,1,50,7,0.0,2,1,1,10062.8
7,376,1,0,29,4,115047.0,4,1,0,119347.0
8,501,0,1,44,4,142051.0,2,0,1,74940.5
9,684,0,1,27,2,134604.0,1,1,1,71725.7


In [231]:
## Alright, now that we have fixed our gender, lets move on to Countries

X = dummy_variable_maker(X,1)

In [232]:
df = create_dataframe(X)

In [233]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,1.0,0.0,0.0,619.0,0.0,42.0,2.0,0.0,1.0,1.0,1.0,101348.88
1,0.0,0.0,1.0,608.0,0.0,41.0,1.0,83807.86,1.0,0.0,1.0,112542.58
2,1.0,0.0,0.0,502.0,0.0,42.0,8.0,159660.8,3.0,1.0,0.0,113931.57
3,1.0,0.0,0.0,699.0,0.0,39.0,1.0,0.0,2.0,0.0,0.0,93826.63
4,0.0,0.0,1.0,850.0,0.0,43.0,2.0,125510.82,1.0,1.0,1.0,79084.1
5,0.0,0.0,1.0,645.0,1.0,44.0,8.0,113755.78,2.0,1.0,0.0,149756.71
6,1.0,0.0,0.0,822.0,1.0,50.0,7.0,0.0,2.0,1.0,1.0,10062.8
7,0.0,1.0,0.0,376.0,0.0,29.0,4.0,115046.74,4.0,1.0,0.0,119346.88
8,1.0,0.0,0.0,501.0,1.0,44.0,4.0,142051.07,2.0,0.0,1.0,74940.5
9,1.0,0.0,0.0,684.0,1.0,27.0,2.0,134603.88,1.0,1.0,1.0,71725.73


As we could see, we have 11 features instead of 9 before. That's because we have created dummy variables for the country variable and since there are 3 countries, 2 additional fields have been added to accomodate them. Rows countaining France will say 1.0 on those corresponding rows and the same logic applies to the other countries as well.

But before we go ahead with modelling. Lets think again. We don't need 3 dummy variables for 3 countries. We need only 2 as the 3rd one will be represented by the values that aren't the other 2 dummy variables.

So, let's remove one of the dummy variable field and avoid falling into the dummy variable trap.

In [234]:
# Remove field 1
# We'll take all columns except the first one.
X = X[:, 1:]

## Hurray!! Now we are ready to split the data into train and test sets.

In [235]:
# Note that cross_validation has been replaced by model_selection in the latest version
from sklearn.model_selection import train_test_split
def split_date(X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
    return(X_train, X_test, y_train, y_test)

In [236]:
X_train, X_test, y_train, y_test = split_date(X,y)

In [237]:
df = create_dataframe(X_train)

In [238]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1.0,0.0,579.0,0.0,39.0,5.0,117833.3,3.0,0.0,0.0,5831.0
1,0.0,0.0,750.0,0.0,32.0,5.0,0.0,2.0,1.0,0.0,95611.47
2,0.0,1.0,729.0,0.0,34.0,9.0,53299.96,2.0,1.0,1.0,42855.97
3,0.0,1.0,689.0,1.0,38.0,5.0,75075.14,1.0,1.0,1.0,8651.92
4,0.0,0.0,605.0,1.0,52.0,7.0,0.0,2.0,1.0,1.0,173952.5
5,0.0,0.0,667.0,0.0,37.0,9.0,71786.9,2.0,1.0,1.0,67734.79
6,0.0,0.0,673.0,1.0,65.0,0.0,0.0,1.0,1.0,1.0,85733.33
7,0.0,0.0,724.0,1.0,31.0,5.0,0.0,1.0,1.0,0.0,134889.95
8,0.0,0.0,731.0,0.0,38.0,10.0,123711.73,2.0,1.0,0.0,171340.68
9,0.0,1.0,484.0,0.0,39.0,5.0,0.0,2.0,1.0,1.0,175224.12


### Lets Apply Feature Scaling.

Feature scaling is absolutely necessarily as there is a lot of computations -- highly compute intensive calculations and a lot of parallel computing. Feature scaling eases up all these calculations. We don't need one independent variable dominating another.

In [239]:
from sklearn.preprocessing import StandardScaler
def feature_scaler(data):
    sc = StandardScaler()
    return(sc.fit_transform(data))

In [240]:
X_train = feature_scaler(X_train)

In [241]:
X_test = feature_scaler(X_test)

In [242]:
# Lets see if that worked.
df = create_dataframe(X_train)

In [243]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1.760216,-0.574682,-0.735507,-1.087261,0.015266,0.00886,0.67316,2.535034,-1.553624,-1.03446,-1.64081
1,-0.568112,-0.574682,1.024427,-1.087261,-0.652609,0.00886,-1.207724,0.804242,0.643657,-1.03446,-0.079272
2,-0.568112,1.740094,0.808295,-1.087261,-0.461788,1.393293,-0.356937,0.804242,0.643657,0.966688,-0.99684
3,-0.568112,1.740094,0.396614,0.919743,-0.080145,0.00886,-0.009356,-0.926551,0.643657,0.966688,-1.591746
4,-0.568112,-0.574682,-0.467915,0.919743,1.255605,0.701077,-1.207724,0.804242,0.643657,0.966688,1.283302
5,-0.568112,-0.574682,0.17019,-1.087261,-0.175556,1.393293,-0.061844,0.804242,0.643657,0.966688,-0.564126
6,-0.568112,-0.574682,0.231942,0.919743,2.495944,-1.721681,-1.207724,-0.926551,0.643657,0.966688,-0.251081
7,-0.568112,-0.574682,0.756835,0.919743,-0.74802,0.00886,-1.207724,-0.926551,0.643657,-1.03446,0.603893
8,-0.568112,-0.574682,0.828879,-1.087261,-0.080145,1.739402,0.766993,0.804242,0.643657,-1.03446,1.237875
9,-0.568112,1.740094,-1.713248,-1.087261,0.015266,0.00886,-1.207724,0.804242,0.643657,0.966688,1.30542


## And that finishes the Preprocessing Stage Completely!!

In [244]:
## Lets go ahead and save this preprocessing stage.

In [245]:
y_train.shape

(7500,)

In [246]:
y_test.shape

(2500,)

In [247]:
temp = y_train
temp1 = y_test

## Lets get to creating the ANN

We'll start with the imports

In [248]:
import keras

In [249]:
keras.backend.backend()

'tensorflow'

If Keras uses Theano as its backend, we need to change it to TensorFlow. We are going to restart jupyter with TensorFlow as the backend for keras using the command "set "KERAS_BACKEND=tensorflow".

In [250]:
## Now that we have done that we need to check one more thing.

keras.backend.image_dim_ordering()

'tf'

That's great, Keras is using tf as its image_dim_ordering. Well now we could continue.

In [251]:
# In the even that it needs to be changed, we'll use this custom function.
# This works only when the backend is already TensorFlow.
def change_image_dim_ordering():
    K = keras.backend.backend()
    if K=='tensorflow':
        keras.backend.set_image_dim_ordering('tf')

Alright, moving on..

In [252]:
# The sequential module is used to initialize our ANN
from keras.models import Sequential
# The dense module is required to build the layers of our ANN
from keras.layers import Dense

In [253]:
# Initialize the ANN (.i.e define it as a sequence of layers)
# There are 2 ways to defining an ANN. One is either by defining the sequence of layers or
# 2 -- Defining a graph.

# We are gonna use the first method. We will create an object of the Sequential class.
# Our problem is a classification problem.
classifier = Sequential()

## A Little Refresher

Remember, 

Step 1 : The first step in creating an ANN is to randomly initialize the weights to small
numbers close to 0 (not zero). This will be done by the dense function in the Dense module.

Step 2 : The first observation in the dataset (the first row) is given to the the input layer,
with each feature into one input node. In our case, we have 11 features in our feature matrix i.e the 11 independent variables. Therefore, in our input layer, we'll have 11 input nodes.

Step 3 : Forward-Propogation - From left to right, the neurons are activated by the activation function in such a way that the higher the value of the activation function is for the neuron, the more the impact neuron will have in the network. There are several activation functions to choose from. But the best and the most proven one is <b>The Rectifier function</b>.

<b>The Sigmoid Function</b> is the best for the output layer as we'll be able to get the probabilities for the different classes. We'll be able to see the probability that the output is 1 or 0 for each observation and even the probabilites for the new observations as we make predictions on the test set.

Step 4 : Compares the predicted result to the actual result. (The churn class). This generates an error.

Step 5 : Back-propogate the error from right to left. Update the weights accordingly to minimize the error. The weights that are more responsible for the generated error are targeted here. There are several ways to updating the weights. The learning rate decides by how much we update the weights.

Step 6 : Repeat steps 1 to 5 either after each observation or after every batch of observations.

Step 7 : When the whole training set is passed through the ANN once, that completes an epoch. Repeat epochs many more times after that.

<b>Note:</b> We are going to be using Stochastic Gradient Descent.

In [254]:
# Add the input layer and the first hidden layer.
# The add method in our Sequential class is used to add layers to the ANN. It is only gonna
# add the hidden layers. And thereby, by creating the first hidden layer, we are indirectly going to 
# specify the number of nodes nodes in the previous layer which is the input layer.

# The first argument is -- output_dim which takes the number of nodes in the hidden layer
# we are going to be adding.

# There is no thumb rule on the number of nodes that need to added to the hidden layer. But Practioners
# prefer to go by a number that is the average of the number of nodes in the input and output layers.
# We could also experiment and use parameter tuning to come up with probably better numbers for specific
# problems as well. K-fold cross-validation is one of the methods. 

# We have 11 input nodes and 1 for the output node (Because the output is binary). 
# So 12 in total and therefore 6 is the average.

# Now that we have decided on the number of nodes in the hidden layer, the next step is to initialize
# the weights. For out stochastic gradient descent, we have to randomly initialize the weights to 
# small numbers close to zero. And so, we can randomly initialize them with a uniform function. We 
# will use the glorot_uniform function to take care of this. For simple options, we have the "uniform"
# function that will initialize the weights in a uniform distribution with values close to zero.

# The third argument is the activation function. We will choose the rectifier function for the hidden 
# function. The corresponding parameter is 'relu'.

# And finally, we need another mandatory parameter -- the input_dim parameter. It defines the 
# number of nodes in the input layer. It is mandatory cos we are only initializing the ANN and so we
# have to tell the first hidden layer we are about to create, which nodes it could expect as inputs.
# For the subsequent layers, this parameter is not needed as the next layer will already know
# what to expect. So now, it is 11 for the input_dim.

classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))



In [255]:
## Updating the Dense Call to the Keras 2 API as per the warning:
classifier.add(Dense(kernel_initializer="uniform", units=6, activation="relu", input_dim=11))

Now that we have added the first hidden layer, let's go ahead and add one more hidden layer. The problem we are dealing with doesn't necessarily require another one, but we will add one additional layer just to learn how to do it.

In [256]:
# Second layer
classifier.add(Dense(kernel_initializer="uniform", units=6, activation="relu"))

In [257]:
classifier.layers

[<keras.layers.core.Dense at 0x1b34f486860>,
 <keras.layers.core.Dense at 0x1b34f551860>,
 <keras.layers.core.Dense at 0x1b35215f4e0>]

In [258]:
# Now that we have added the hidden layers, lets add the output layer.
# Everything is the same except the number of nodes and the activation function.
# We are going to be using the sigmoid function.

# Imagine a situation where the number of dependent variables is  more than 2 unlike the example here.
# In that case, we need to change the units parameter and change the activation to softmax.
# Softmax is the sigmoid function but applied to a dependent variable that has more than 2 categories.
classifier.add(Dense(kernel_initializer="uniform", units=1, activation="sigmoid"))

In [259]:
classifier.layers

[<keras.layers.core.Dense at 0x1b34f486860>,
 <keras.layers.core.Dense at 0x1b34f551860>,
 <keras.layers.core.Dense at 0x1b35215f4e0>,
 <keras.layers.core.Dense at 0x1b352202ba8>]

## We are done creating our Artificial Neural Network.

Lets compile our ANN by applying the Stochastic Gradient Descent.

In [260]:
# Compiling

# It needs some parameters.

# 1. Optimizer -- The algorithm we want to use to find the optimal set of weights. We have initialized
# the weights, but we need to optimize them to find the best possible solution for the ANN.
# "adam" is one of the best opitmizers - very efficient.

# 2. Loss - The loss function that needs to be used within the Stochastic gradient descent algorithm.
# That is within the adam algorithm. Stochastic gradient descent algorithm is based on a loss function.
# (Remember that this algorithm is the best solution to find the global minimum).
# We need to optimize it to find the optimal weights so the loss will be minimized as much as possible.

# In simple linear regression, the loss function is the sum of squared errors ie. the sum of squared
# differences between the real value and the predicted value. The problem we are dealing with here has
# parameters that need to optimized through the stochastic gradient descent to find the optimal weights.

# But in our ANN, we use the sigmoid function to find out the probability of a class occuring 
# and hence its a logistic regression model. In a logistic regression model, the loss function is not
# SSE but is going to be a logarithmic loss function called the logarithmic loss.

# The loss function we are going to use for the ANN, on which the stochastic gradient descent algorithm
# Adam is based on, is going to be logarithmic loss. And for binary dependent variables, its called,
# "binary_crossentropy" and for more than 2 categorical variables, its called "categorical_crossentropy"

# 3. metrics -- the criterion that we choose to evaluate the model. Typically we use the accuracy
# criterion. This will be used to improve the model. We can see the accuracy improving as we compile it.
# The metrics parameter takes in a list of arguments. But here there is only gonna be one element
# -- the accuracy.
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics = ['accuracy'])

## Compilation is done. Now we could fit the ANN to the Dataset.

In [263]:
# Lets fit it,

# We need two more arguments besides the X and y train sets. 
# 1. Again, recalling the steps involved in training the ANN with SGD, we see that we could 
# update the weights either after each observation or after a batch of observation is trained.

# So the argument batch_size helps us choose this.

# 2. Choose the number of epochs. 

# These values are completely arbitrary. Right now, we are going to choose batch size 10 and epochs 100
classifier.fit(X_train, y_train, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1b3536bb198>

## The Big Shape Problem in ANN

Why does this happen?

For thefinal output layer of 2 or for any number of categories the labels need to be of a categorical type where essentially it is a binary vector for each observation e.g a 3 class output vector [0,2,1,0,1,0] becomes [[1,0,0],[0,0,1],[0,1,0],[1,0,0],[0,1,0],[1,0,0]].

Source -- https://stackoverflow.com/questions/31997366/python-keras-shape-mismatch-error

And in our problem for the output vector [0,1,0,0,1], it becomes [[1,0],[0,1],[1,0],[1,0],[0,1]] or vice versa. We are going to make this change as follows:

In the event that it happens ...

In [103]:
from keras.utils import np_utils, generic_utils

y_train, y_test = [np_utils.to_categorical(x) for x in (y_train, y_test)]

In [119]:
y_train

array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       ..., 
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

In [269]:
y_train.shape

(7500,)

Still we might have a problem to fix. The final y_train and y_test sets should be 1D. We'll reshape the array as per the solution suggested at https://stackoverflow.com/questions/13730468/from-nd-to-1d-arrays

We'll use flatiter()

## Lets Test our Model

In [272]:
y_pred = classifier.predict(X_test)

y_pred contains the probability that the customers leave the bank.

In [273]:
y_pred

array([[ 0.1475133 ],
       [ 0.29130149],
       [ 0.13750894],
       ..., 
       [ 0.23227979],
       [ 0.21217541],
       [ 0.08198183]], dtype=float32)

As we could see, the first value is 14% approximately which says that this particular customer is only 14% likely to leave the bank. Down below, we see 8.1% which is again saying that the customer is almost unlikely to leave the bank. But we need to compare this to the actual observations now to see if our model is right in predicting the probabilities.

Lets do it using the confusion matrix.

In [274]:
from sklearn.metrics import confusion_matrix

In [275]:
cm = confusion_matrix(y_test, y_pred)

ValueError: Can't handle mix of binary and continuous

We can't mix binary variables (that is our actual train_y) and the continuous variabels (that is the probabilities in pred_y). We need to convert the probabilities into a binary form too. The best way to do it is classify them based on a threshold value. We'll chose the standard .50 threshold point. Anything over this will be considered a 1 and everything else as 0.

In [276]:
y_pred = (y_pred > 0.5)

In [282]:
# The little operation will create a Boolean array as follows:
y_pred

array([[False],
       [False],
       [False],
       ..., 
       [False],
       [False],
       [False]], dtype=bool)

In [283]:
cm = confusion_matrix(y_test, y_pred)

In [284]:
print(cm)

[[1942   49]
 [ 337  172]]


The confustion matrix says that 1942 + 172 of the 2500 data points were predicted correctly. That's an accuracy of 

In [280]:
y_pred.shape

(2500, 1)

In [281]:
(1942+172)/2500

0.8456

That's an accuracy of 84%. This is a very good accuracy level considering the fact that we didn't do any parameter tuning. There is plenty of room for improvement and that should certainly increase the accuracy of our Neural Net.