# Artificial Neural Network Simplified (Churn Dataset)

### **Goals of the project -** 
* To understand the basic implemetation of the ANN
* To build the ANN layer by layer and understanding the significance of each layer and the arguments used
* To understand how to cross validate the results of ANN
* Learn to fine tune the ANN using Grid Search Mechanism


In [62]:
import pandas as pd
import warnings  
warnings.filterwarnings('ignore') # to ignore the warnings

In [63]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [64]:
data = pd.read_csv('/content/gdrive/My Drive/Data/Churn_Modelling.csv')
data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## **Step 1** : Pre-processing

In [65]:
# encoding the categorical columns and getting rid of the redundant columns
geog = pd.get_dummies(data['Geography'], drop_first=True)
gend = pd.get_dummies(data["Gender"], drop_first=True)

In [66]:
# converting these columns to 'int'
geog = geog.astype(int)
gend = gend.astype(int)

In [67]:
# concatenating these encoded variables to the original dataset
data1 = pd.concat([data, gend, geog], axis=1)

In [68]:
# seperating the independent and dependent variables
feature_cols = ['CreditScore', 'Age', 'Tenure', 'Balance',
                'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Male', 'Germany', 'Spain']

x = data1[feature_cols]
y = data1['Exited']

In [69]:
# splitting the data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)

In [70]:
# scaling the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)

## **Step - 2** : Building the Artificial Neural Network

In [71]:
# importing the required libraries to form an Artificial Neural Network
from keras.models import Sequential     # required to initialize the neural network coz ANN is a sequence of layers
from keras.layers import Dense          # to build the layers in ANN

In [72]:
# initializing the ANN
ann_classifier = Sequential()

**Step 2.1 :** Adding the input layer and the 1st hidden layer

In [73]:
ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu', input_dim=11))

**Arguments used -**
* `output_dim` = no. of nodes in hidden layer, generally half of the total of all variables
* `init='uniform'` means assigining weights close to 0 in a uniform manner
* `activation='relu'` means assigning rectifier function at the hidden layer
* `input_dim=11` means input layer parameters(no. of variables in training set)

**Step 2.2 :** Adding the 2nd hidden layer

This time, there is no need to specify the input layer as the operation above tells this layer what input to expect

In [74]:
ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu'))

**Step 2.3 :** Adding the output layer

In [75]:
ann_classifier.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))

**Step 2.4 :** Compiling the ANN

In [76]:
ann_classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

**Arguments used -** 
* `optimizer` = name of the algorithm we want to apply, usually SGD algorithm known by 'adam'
* `loss` = loss function within SGD algorithm, or the function we need to optimize to find optimal weights usually based on the activation function used for the o/p layer, or the type of dependent variable
* `metrics` parameter has [] coz it expects a list of values as the weights have been calculated after each observation or each batch of observation. Hence the algorithm uses this parameter to calculate the accuracy to improve the model performance

**Step 2.5 :** Fitting the ANN to the training set

**Arguemnts used -**
* `batch_size` means after how many observations the weights should be updated
* `nb_epoch` means how many times you want to run through the network
* `1` epoch would signify that whole data has been passed through the network once

In [77]:
ann_classifier.fit(x_train, y_train, batch_size=10, nb_epoch=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x7ff4f54b4748>

## **Step 3 :** Predicting the results on test set

In [78]:
y_pred = ann_classifier.predict(x_test)
y_pred = y_pred > 0.5       

* Here we set a threshold of 0.5 
* People having this score greater than 0.5 means a probability of leaving the bank
* Hence we apply a trick here that if values less than 0.5 then it would return False and if greater than 0.5 it would return True
* Then we plot the Confusion Matrix for the same

In [95]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print("The accuracy obtained on testing set is", round((accuracy_score(y_test, y_pred) * 100), 2), '%')

[[1534   83]
 [ 193  190]]
The accuracy obtained on testing set is 86.2 %


## **Step 4 :** Evluating the ANN (Cross Validation)

**Step 4.1 :** Wrapping k-fold cross validation into keras model

In [80]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score

**Step 4.2 -** Building a function to initialize the ANN and its respective layers

In [81]:
def build_classifier():
    from keras.models import Sequential
    from keras.layers import Dense
    ann_classifier = Sequential()
    ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu', input_dim=11))
    ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu'))
    ann_classifier.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))
    ann_classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return ann_classifier

**Step 4.3 -** Performing the cross validation

In [84]:
ann_classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, nb_epoch = 100)
accuracies = cross_val_score(estimator=ann_classifier, X=x_train, y=y_train, cv=7, n_jobs=-1)
# will contain 10 accuracies returned by kfold cv
print("The average of the accuracies is", round((accuracies.mean() * 100), 2), '%')
print("The standard deviation of the accuracies is ", accuracies.std())

The average of the accuracies is 79.69 %
The standard deviation of the accuracies is  0.021439048083949606


##**Step 5 :** Tuning the ANN 
* This is usually done for the sake of ease in choosing the best parameters for the ANN instead of manually imputing 
* This method also saves time by avoiding trial and error
* We use the Grid Search method for this task

In [85]:
from sklearn.model_selection import GridSearchCV
def build_classifier(optimizer):
    from keras.models import Sequential
    from keras.layers import Dense
    ann_classifier = Sequential()
    ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu', input_dim=11))
    ann_classifier.add(Dense(output_dim=6, init='uniform', activation='relu'))
    ann_classifier.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))
    ann_classifier.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return ann_classifier

A small change we do while building this model is that the 'optimizer' argument is passed while calling the function so that it can use the optimizers provided in the list below

In [86]:
# here we set the what parameters to pass to check for the optimal values suggested by this method
ann_classifier = KerasClassifier(build_fn = build_classifier)

# we pass these arguments of paramters as list
params = {'batch_size': [25, 32], 'nb_epoch': [100, 200, 300], 'optimizer': ['adam', 'rmsprop']}

grid_search = GridSearchCV(estimator=ann_classifier, param_grid=params, cv=10, scoring='accuracy')
grid_search = grid_search.fit(x_train, y_train)
best_parameters = grid_search.best_params_      # will give the best parameters
best_accuracy = grid_search.best_score_         # will give the best accuracy score

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [87]:
# checking the parameters obtained as suggestted by the grid search
print(best_parameters)
print(best_accuracy)

{'batch_size': 25, 'nb_epoch': 100, 'optimizer': 'adam'}
0.79325


## **Step - 6 :** Running the ANN again based on parameters obtained done above

In [88]:
# defining the layers
ann_classifier2 = Sequential()
ann_classifier2.add(Dense(output_dim=6, init='uniform', activation='relu', input_dim=11))
ann_classifier2.add(Dense(output_dim=6, init='uniform', activation='relu'))
ann_classifier2.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))
ann_classifier2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
ann_classifier2.fit(x_train, y_train, batch_size=25, nb_epoch=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x7ff4f1d5fef0>

In [89]:
# predicting the result
y_pred2 = ann_classifier2.predict(x_test)
y_pred2 = y_pred2 > 0.5

In [92]:
print('The accuracy obtained after tuning the ANN is', round((accuracy_score(y_test, y_pred2) * 100), 2), '%')

The accuracy obtained after tuning the ANN is 84.05 %
