## Artificial Neural Network

This projects deals with building an Artificial Neural Network and understanding why Neural Networks outperform most of the other simpler Machine Learning Algorithms.

The goal of ANN is to build a network of neurons with 1 or more hidden layers of neurons to allow for backpropagation of synapses that improves the performance/accuracy of the model. The goal of this type of models is to constantly keep checking for loss function which is the difference between predicted y_hat and actual Y to adjust the weights at the input nodes and hidden nodes.

Lets start with importing the necessary modules.

First we need to install the KERAS library, Tensorflow and Theano done using the below commands

In [1]:
!pip install --upgrade keras
!pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
!pip install tensorflow

Requirement already up-to-date: keras in c:\users\rames\anaconda3\lib\site-packages
Requirement already up-to-date: pyyaml in c:\users\rames\anaconda3\lib\site-packages (from keras)
Requirement already up-to-date: scipy>=0.14 in c:\users\rames\anaconda3\lib\site-packages (from keras)
Requirement already up-to-date: six>=1.9.0 in c:\users\rames\anaconda3\lib\site-packages (from keras)
Requirement already up-to-date: numpy>=1.9.1 in c:\users\rames\anaconda3\lib\site-packages (from keras)
Collecting git+git://github.com/Theano/Theano.git
  Cloning git://github.com/Theano/Theano.git to c:\users\rames\appdata\local\temp\pip-11cdfxm6-build


  Error [WinError 2] The system cannot find the file specified while executing command git clone -q git://github.com/Theano/Theano.git C:\Users\rames\AppData\Local\Temp\pip-11cdfxm6-build
Cannot find command 'git'




## Importing the data and performing some preprocessing to clean the data

The data that we would be using is a sample of a fictional dataset. The data fictiously represents a bank data with geo demographical information of the customers such as 
1. customerId, 
2. Last Name, 
3. CreditScore, 
4. Geography( Here France, Spain and Germany), 
5. Gender, 
6. Age, 
7. Tenure, 
8. Current Balance in accounts, 
9. Number of Products the customer currently has, 
10. Whether he has credit card with the bank, 
11. Is he an active customer (last 2 months), 
12. Estimated Salary

Finally the dependent variable 
Exited - 1 denoting he stopped being a customer and 0 representing still a customer

A copy of the csv is available in here:(Add link here)

In [2]:
# Lets import the data
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


Lets split the data into input nodes and output nodes.
X represent the input nodes. 
Y represent the output nodes.


In [3]:
X = dataset.iloc[:, 3:13].values
Y = dataset.iloc[:, 13].values

### Dealing with Categorical data

Since out input data has two categorical data in Geography and Gender, we need to convert them into numerical fields. We can use One Hot Encoder to do that and then remove one of the Geography encoded values to avoid Dummy Variable Trap

In [4]:
# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()

X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])

labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

onehotencoder = OneHotEncoder(categorical_features = [1])

X = onehotencoder.fit_transform(X).toarray()

X=X[:, 1:]

### Split the available data into Test and Train

We can directly use train test split from the sklearn to split 20% of the data as test data for our evaluations later on.

In [5]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=0)

### Feature Scaling

Since our Output node is a binary outcome, we will be using the sigmoid/logistic activation function in our output node. Therefore it is essential for us to standardize the independent variables to allow for the sigmoid function to capture the variance better. Here we are going to do that using StandardScalar.


In [6]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [7]:
Y_train.shape

(8000,)

In [8]:
X_train.shape

(8000, 11)

## Building an Artificial Neural Network

Now we can start building an artificial neural Network

Step one is to import all the necessary modules from the Keras library. We are importing Sequential and Dense 

Step 2 is to initiate a Neural Network which is done using Sequential function. This signifies that our neural nodes would be sequentially arranged

Step 3 is to initiate our first hidden layer
* We use the add function to add our first hidden layer

* Here we need to specify the number of output dimensions, which would be the number of nodes in our first hidden layer. As a general rule of thumb, it is acceptable to use the average of input and output nodes as "the" number of nodes here. Since we have 1 output node and 11 input nodes, we use an average of those two numbers to determine the number of nodes in our first hidden layer

* The next thing we need to specify is the initialization function . Uniform allows us to associate a small non zero value to the initial weights for the input.

* I am adding a rectifier function as activation function for our first hidden layer

* Finally the input dimentions in our case =11

Step 4 is to initialize the second hidden layer
* Using the same number of nodes as the first layer
* Using the same rectifier activation function


Step 5 - Define the output Node
* Our Output node has 1 node since this is a binary output of 1 or 0
* Our activation function, of course, is a sigmoid function since it gives a nice probability that can be used to convert to 0 or 1 using a threshold

In [9]:
# Importing the keras libraries to create out Neural Networks

import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [10]:
# Defining the initial object as a sequential model with more than one layers
clf=Sequential()

# Adding the first hidden layer
# output_dim is the average of input dims=11 and output dims=1.
# init is the function to imitialize the weights. uniform is the most simple way to add a close to 0 value for weights 
# Choosing the activation function to be relu which corresponds to rectifier function 
clf.add(Dense(units=6,kernel_initializer="uniform",activation='relu',input_dim=11))

#Adding the second hidden layer
clf.add(Dense(units=6,kernel_initializer="uniform",activation='relu'))

# Adding the final output layer
clf.add(Dense(units=1,kernel_initializer="uniform",activation='sigmoid'))

## Compile the whole ANN.. 

* Applying Stochastic Neural Network as our optimizer which is the optimizing algorithm that is used for weight calculation
* So far the weights have only been initialized, so the Stochastic algorithm would be used for weight estimation/calculation . Done here using ADAM
* Loss represents the loss function that will used to find the optimized solution
* Metrics are the criterion that are used to evaluate the model

In [11]:
clf.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [12]:
# Prediction and model evaluation

clf.fit(X_train,Y_train,batch_size=10,epochs=100)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1f6c71cd710>

In [13]:
# Applying the model on the split test data
Y_pred=clf.predict(X_test)
Y_pred=(Y_pred>0.5)

# Calculating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(Y_test,Y_pred)
cm

array([[1548,   47],
       [ 272,  133]], dtype=int64)

Looking at the above results we can infer that the model consistently predicts with an accuracy of about 83% in the train dataset. From the confusion Matrix we can calcualte the accuracy in the test data as (1548+133)/2000 = 84%.

Based on the above the model is pretty robust with consistent accuracy metrics leading to believe that there is no over fitting