In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


Author : Karthik Vikram

# <center>Data Preprocessing</center>

In [0]:
# Classification template

# Importing the libraries
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Deep_Learning_A_Z/Part 1/Artificial_Neural_Networks/Churn_Modelling.csv')
X = dataset.iloc[:,3:-1].values
y = dataset.iloc[:, -1].values

In [3]:
dataset.head(n=8)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1


Examining the column names, we can see that the columns {RowNumber,CustomerId,Surname} have no significance in determining if the customer leaves the bank. Hence we ignore and omit these columns from the dataset. Now the shape of our dataset is **10000 x 10** with the extra columns removed


<center> <h3>Encoding the Data</h3></center>


The Column names Geography and Gender name are strings. ANN works best on numbers. Closer examination shows that the data is collected from only three countries - France, Germany and Spain (i.e. 3 categorical data). We can <i>OneHotEncode</i> it into 3 columns of binaries. 


---


The gender column has only two possible values - Either male or female so we directly convert it to a single column binary 

In [4]:
#Encoding Catergorical Data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X1 = LabelEncoder()
X[:, 1] = labelencoder_X1.fit_transform(X[:, 1])

labelencoder_X2 = LabelEncoder()
X[:, 2] = labelencoder_X1.fit_transform(X[:, 2])

onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


<h3><center>Removing the Dummy Variable</center></h3>

Dummy Variables : Consider there is a column with 3 categorical values A,B,C. We can OneHotEncode it into 3 columns i.e. A column, B column , C column where if the value is A for an entry; we will put a 1 in the A column and 0 in the B and C column. The same applies for values B and C.

We can achieve the same results using 2 columns ; i.e.

B | B column 1 ; C column 0

C | B column 0 ; C column 1

take a note here,

A | B column 0 ; C column 0 - We have got a representation for A without the third column. Now A column is rudimentary (dummy). We can remove it.


---

We have converted the Gerography column containing 3 categories into 3 columns of binaries. From the explanation given above we can remove any one column from the 3, the representation remains the same.

In [0]:
# Removing the dummy variable column
X = X[:,1:] 

In [0]:

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

<center><h4>Feature Scaling</h4></center>

From the we can see that the numeric data in different columns are not in the same range. While training the model, such vast difference makes some columns of data insignificant while setting the weights of synases. To overcome this we scale the inputs into relative scale. i.e. we take their log values for training.


In [0]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


# Building the Aritificial Neural Network

The first layer after the input layers is the ReLu activation layer. The weights of the synapses are initialized with numbers close to zero. 

I have used Dropout Regularization to prevent oerfitting. The parameter rate=0.1 specifies that 10% of the nuerons will be disengaged while training in order to bring about better correlation between the different nuerons and the features in the dataset.

The 'adam' SGD optimizer function. Since the prediction of the model has to be Yes or No, I have used a sigmoid layer at the last, that will return the probability of the output being a yes or no.

The loss function I have used is the binary_crossentropy function which will be good for binary output predictions which are in the format of probabilities.

The accuracy metrics for synapses' weight adjustement. 


In [8]:

# Fitting classifier to the Training set
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

# Create your classifier here
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'
                     ,input_dim=11))
classifier.add(Dropout(rate=0.1))

#Second Hidden layer
classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'))
classifier.add(Dropout(rate=0.1))

#Output Layer
classifier.add(Dense(output_dim=1,kernel_initializer='uniform',activation='sigmoid'))
classifier.add(Dropout(rate=0.1))
# We use sigmoid so that we get a probability of the customer
# leaving the bank and the probabilty of not leaving the bank
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy'])

#Training the model, the weights are adjusted once in 10 samples
classifier.fit(X_train,y_train,batch_size=10,epochs=100)


Using TensorFlow backend.
  # This is added back by InteractiveShellApp.init_path()
  from ipykernel import kernelapp as app


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f10bd3e0710>

<center><h4>Running the model on the Test Set</h4></center>

The model predicts the probabilities of whether a certain customer leaves the bank or not. I have converted the probabilities as a binary output by keeping a threshold of 0.5 . i.e. if the probability is greater than 0.5 the customer leaves the bank.


In [0]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred>0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


In [10]:
cm

array([[1557,   38],
       [ 286,  119]])

# Artificial Neural Network Evaluation

The accuracies obtained above change with each run. I have used K-fold (10 folds) cross validation for getting a better insight of the model's performance. The mean of all the accuracies obtained in 10 runs of model will be more accurate. From the standard deviation of the accuracy values we can see that our model has low bias and low variance which means our model can produce consistent and accurate predictions.

In [11]:
# Importing the Keras wrapper class for the scikit learn function KerasClassifier
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense

def build_classifier():
    classifier = Sequential()
    classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'
                     ,input_dim=11))
    classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'))
    classifier.add(Dense(output_dim=1,kernel_initializer='uniform',activation='sigmoid'))
    classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy'])
    
    return classifier

classifier = KerasClassifier(build_fn=build_classifier, batch_size=10,epochs=100)
accuracies = cross_val_score(classifier,X_train,y=y_train,cv=10,n_jobs=-1,verbose=True)

mean = accuracies.mean()
variance = accuracies.std()

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:  8.7min finished


In [12]:
print('Mean of Accuracies',mean)
print('Standard Deviation in the accuracies',variance)

Mean of Accuracies 0.8423749952390789
Standard Deviation in the accuracies 0.02076242000605802


# Tuning the Parameters

We use the GridSearch method from the sklearn.model_selection function for selecting the best hyper parameters for the model.

In [0]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense

def build_classifier(optimizer):
    classifier = Sequential()
    classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'
                     ,input_dim=11))
    classifier.add(Dense(output_dim=6,kernel_initializer='uniform',activation='relu'))
    classifier.add(Dense(output_dim=1,kernel_initializer='uniform',activation='sigmoid'))
    classifier.compile(optimizer=optimizer, loss='binary_crossentropy', metrics =['accuracy'])
    
    return classifier
  
classifier = KerasClassifier(build_fn=build_classifier)


In [14]:
parameters = {'batch_size': [25,32],
              'nb_epoch': [50,100],
              'optimizer' : ['adam', 'rmsprop']}
grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,verbose=1)
grid_search = grid_search.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
  # Remove the CWD from sys.path while we load stuff.
  # This is added back by InteractiveShellApp.init_path()
  if sys.path[0] == '':


Fitting 10 folds for each of 8 candidates, totalling 80 fits
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


[Parallel(n_jobs=1)]: Done  80 out of  80 | elapsed:  5.7min finished


Epoch 1/1


In [0]:
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_

In [16]:
print(grid_search.best_params_)
print(best_accuracy)

{'batch_size': 25, 'nb_epoch': 50, 'optimizer': 'adam'}
0.796
