# Artificial Neural Network

### Introduction

## Importing the Libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

## Importing the Dataset

In [2]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values
Y = dataset.iloc[:, -1].values

In [3]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [4]:
print(Y)

[1 0 1 ... 1 1 0]


## Encoding the Categorical Data

To encode dependent variable or nominal data we use LabelEncoder class of preprocessing module of sklearn library

1. First we create an object or instance of LabelEncoder class which expect no arguments

2. After that we connect our LabelEncoder object to the target vector or dependent variable and encode the categorical values specified.

3. It is not necessary to convert target variable to numpy array thus we only update our target vector with updated one.

#### Label Encoding the Gender Column

In [5]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded_target_column = le.fit_transform(X[:, 2])
X[:, 2] = encoded_target_column

In [6]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


#### One Hot Encoding the Geography Column

To encode independent variable or ordinal data we use ColumnTransformer class of compose module of sklearn library and also OneHotEncoder class of preprocessing module of sklearn library

1. First we create an object or instance of ColumnTransformer class which expect two arguments A. column to be transformed and encoder to be used specified within a tuple, i.e transformers B. wheteher we want to retain the columns not being transformed or not, i.e remainder

2. After that we connect our ColumnTransformer object to the feature matrix or independent variable and encode the categorical values specified.

3. Future machine learning model requires numpy array for further processing so we forcefully convert our encoded feature matrix to numpy array and update our feature matrix with updated one.

In [7]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(),[1])], remainder='passthrough')
encoded_feature_matrix = ct.fit_transform(X)
X = np.array(encoded_feature_matrix)

In [8]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


#### Splitting the Dataset into Training and Test Set

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,test_size=0.2, random_state=1)

In [10]:
print(X_train)

[[0.0 1.0 0.0 ... 0 1 124749.08]
 [1.0 0.0 0.0 ... 0 0 41104.82]
 [0.0 1.0 0.0 ... 1 1 45750.21]
 ...
 [1.0 0.0 0.0 ... 1 1 92027.69]
 [1.0 0.0 0.0 ... 1 1 101168.9]
 [0.0 1.0 0.0 ... 1 0 33462.94]]


In [11]:
print(X_test)

[[1.0 0.0 0.0 ... 1 1 97057.28]
 [1.0 0.0 0.0 ... 1 0 66526.01]
 [1.0 0.0 0.0 ... 0 1 90537.47]
 ...
 [0.0 0.0 1.0 ... 0 1 161571.79]
 [0.0 1.0 0.0 ... 1 1 165257.31]
 [0.0 1.0 0.0 ... 1 1 49025.79]]


In [12]:
print(Y_train)

[0 0 1 ... 1 0 1]


In [13]:
print(Y_test)

[0 0 0 ... 0 0 0]


#### Feature Scaling

To scale the feature matrix we use the StandardScaler class of the preprocessing module of the sklearn library.

1. First we create the object or instance of the StandardScaler class.

2. Then we scale the selected features of the X_train or training feature matrix in which we exclude the dummy features 
   using fit_transform method.

3. Now we update the X_train feature matrix with the scaled values.

4. Now we scale the X_train or test feature matrix using the same scaler object thus only using the transform method to scale 
   and not the fit_transform which will lead to creation of new scaler.

In [14]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
scaled_feature_train = sc.fit_transform(X_train)
X_train = scaled_feature_train

scaled_feature_test = sc.transform(X_test)
X_test = scaled_feature_test

In [15]:
print(X_train)

[[-0.99850112  1.71490137 -0.57273139 ... -1.55337352  0.97725852
   0.42739449]
 [ 1.00150113 -0.58312392 -0.57273139 ... -1.55337352 -1.02327069
  -1.02548708]
 [-0.99850112  1.71490137 -0.57273139 ...  0.64376017  0.97725852
  -0.94479772]
 ...
 [ 1.00150113 -0.58312392 -0.57273139 ...  0.64376017  0.97725852
  -0.14096853]
 [ 1.00150113 -0.58312392 -0.57273139 ...  0.64376017  0.97725852
   0.01781218]
 [-0.99850112  1.71490137 -0.57273139 ...  0.64376017 -1.02327069
  -1.15822478]]


In [16]:
print(X_test)

[[ 1.00150113 -0.58312392 -0.57273139 ...  0.64376017  0.97725852
  -0.05360571]
 [ 1.00150113 -0.58312392 -0.57273139 ...  0.64376017 -1.02327069
  -0.58392685]
 [ 1.00150113 -0.58312392 -0.57273139 ... -1.55337352  0.97725852
  -0.16685331]
 ...
 [-0.99850112 -0.58312392  1.74601919 ... -1.55337352  0.97725852
   1.0669965 ]
 [-0.99850112  1.71490137 -0.57273139 ...  0.64376017  0.97725852
   1.13101314]
 [-0.99850112  1.71490137 -0.57273139 ...  0.64376017  0.97725852
  -0.88790165]]


## Building the ANN

#### Initializing the ANN

To create our Artifcial Neural Networ we create an object or instance of Sequential class which belogs to models sub module of keras module of tensorflow library.

In [18]:
ann =  tf.keras.models.Sequential()

#### Adding input layer and first hidden layer

To add layer to our ANN we are using the add method of tensorflow library.

add method take as argument object or instance of Dense class which belongs to layers sub module of keras module of tensorflow library.

This object of Dense class takes as argument:

    number of neurons (units)
    
    activation function (relu)
    
Selection of number of neuron is experimental and their is no rule to decide it in advance.

Activation function is selected based on the operation we want to perform within the layer.

In [19]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

#### Adding the Second hidden layer

In [21]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

#### Adding the Output layer

The ouput layer is added similar to the others layer.

Here we have choosen units or number of neurons in ouput layer as one because our dependent variable (Exited) is binary and it may either be 1 or 0 and for activation function we have choosen sigmoid as it returns binary ouput as well as probabilty of getting that binary.

In [22]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

## Training the ANN

#### Compiling the ANN

To compile our ANN we are going to use compile method of tensorflow which take as argument:
    
    optimizer: a method which optimizes the weight on edge between neurons of two layers with aim of reducing the loss
    
    loss: loss is difference between predicted and actual value
    
    metrics: is the array which takes different parameters which are used to measure the efficicency of ANN
    
In this case we are using 'adam' optimizer fuction which is an Stochastic Gradient Descent and 'binary_crossentropy' as our loss funciton and we are just measuring the accuracy of the ANN .

In this case we are basically classifying user wheter he exited or not thus binary classification and binary_crossentropy loss fucnction.

If we have multiple classes to classify we will use 'categorical_crossentropy'

In [25]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

#### Training on  Training Set

To train our ANN we are using fit method of sklearn which takes as agument feature train matrix and target train vector as well as the size of batch we are training and iterations or epochs or number of iter to train.

In ANN we dont traint the model on whole dataset at once instead we supply the model with batch of training data and its values is experimental but in most of the cases 32 suits.

In [26]:
ann.fit(X_train, Y_train, batch_size = 32, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x11175c8fa30>

## Making Prediction and Evaluating the Model

#### Predicting the result of Single Observation

We are going to predict that whether a customer will exit the bank or not based on the input given below.

*Geography: France*

*Credit Score: 600*

*Gender: Male*

*Age: 40 years old*

*Tenure: 3 years*

*Balance: \$ 60000*

*Number of Products: 2*

*Does this customer have a credit card ? Yes*

*Is this customer an Active Member: Yes*

*Estimated Salary: \$ 50000*

So, should we say goodbye to that customer ?

To predict that we are going to use the predict method and also use sc.transform method to transform values back to original scalae which were alterd in feature scaling step and after that comparing it with 0.5 which means if the value is greater than 0.5 its 'TRUE' else 'FALSE'.

**Important note 1:** Notice that the values of the features were all input in a double pair of square brackets. That's because the "predict" method always expects a 2D array as the format of its inputs. And putting our values into a double pair of square brackets makes the input exactly a 2D array.

**Important note 2:** Notice also that the "France" country was not input as a string in the last column but as "1, 0, 0" in the first three columns. That's because of course the predict method expects the one-hot-encoded values of the state, and as we see in the first row of the matrix of features X, "France" was encoded as "1, 0, 0". And be careful to include these values in the first three columns, because the dummy variables are always created in the first columns.

In [28]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

[[False]]


#### Predicting the Test Results

To predict the test result we use the predict method on our trained model which requires test feature matrix and returns a predict vector consisiting of predicted value of target vector for test data.

We store the predict vector in a variable which is inded the prediction of the target vector based on test feature matrix supplied.

Now we will concatenate the predicted value of target vector (Y_pred) and test or real value of target vector (Y_test) using the concatenate method of the numpy.

Concatenate method of numpy takes a tuple containing the arrays to be merged and number of output column as argument.

With that we also apply reshape method on each arrays to make them vertical insted of horizontal for better analysis.

reshape method takes len of the column to be reshaped and number of output column as argument.

In [31]:
Y_pred = ann.predict(X_test)
Y_pred = (y_pred > 0.5)
print(np.concatenate((Y_pred.reshape(len(Y_pred),1), Y_test.reshape(len(Y_test),1)),1))

[[0 0]
 [0 0]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


#### Making the Confusion Matrix

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.

A confusion matrix is also known as an error matrix.

To make confusion matrix we use the confusion_matrix class of the metrics module of sklearn library. The confusion_matrix method take the test or real target vector(Y_test) and predicted target vector(Y_pred) as argument.

To get the accuracy of our model in precentage we use the accuracy_score method of metrics module of sklearn library.

The accuracy_score method takes real or test target vector(Y_test) and predicted target vector(Y_pred) as argument.

In [34]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(Y_test, Y_pred)
print(cm)
accuracy_score(Y_test, Y_pred)

[[1526   59]
 [ 272  143]]


0.8345

In [35]:
accuracy_score(Y_test, Y_pred)

0.8345