## Implementing different neural net optimizers

The optimizer name itself explains that it helps to optimize certain processes. Here in neural networks, the optimizers are incorporated with loss function with the goal of minimizing the value of loss function. More specifically, the optimizers help the cost function to coverage at global minima. 

In this notebook, we are going to check the behaviour of some popular optimizers of NN. Our task is to evaluate the performance of the standard ANN model on different optimizers. Here the task of the ANN model is to classify the genders based on the different physical aspects of the human face. For each optimizer, we will initialize the model, new losses and accuracy will be recorded. 

So following steps are to be taken to evaluate the optimizers of the neural networks.    

#### Reading the dataset

The dataset that we are using to train the model is stored in a CSV file, so first we will take the data into our system using the Pandas library and will brief the data by checking the first 5 rows of it.

In [1]:
# Reading the dataset
import pandas as pd
data = pd.read_csv('gender_classification.csv')
data.head()

Unnamed: 0,long_hair,forehead_width_cm,forehead_height_cm,nose_wide,nose_long,lips_thin,distance_nose_to_lip_long,gender
0,1,11.8,6.1,1,0,1,1,Male
1,0,14.0,5.4,0,0,1,0,Female
2,0,11.8,6.3,1,1,1,1,Male
3,0,14.4,6.1,0,1,1,1,Male
4,1,13.5,5.9,0,0,0,0,Female


Let’s check the shape of the data.

In [2]:
# Shape of the data
data.shape

(5001, 8)

There are a total of 5001 rows and 8 columns are variable to train and evaluate the model.

#### Data Preprocessing

As we are dealing with a classification problem it is necessary to check the class distribution of the outcome variable. 

In [3]:
# Counting labels
data['gender'].value_counts()

Female    2501
Male      2500
Name: gender, dtype: int64

We can see that the classes are distributed very well and there is a need for class balancing. But you might have observed that the classes are in textual form so we have to convert them into numbers before they get fed to the model. Below we will replace the textual categories with numbers. 

In [4]:
# Labeling with numerical values
data['gender'] = data['gender'].replace(to_replace=['Female', 'Male'], value=[0, 1])
data['gender'].value_counts()

0    2501
1    2500
Name: gender, dtype: int64

So we have now successfully converted the categories into numbers. Now let’s define input and output features. Out of 8 columns, The first 7 columns will be the input features and the last 8th column will be the output feature.

In [5]:
# Defining input and output features
X = data.iloc[:,:-1].values
y = data.iloc[:,-1].values

Now we will create the training and testing patterns out of the above-defined features. Out of 5000 samples around, 4000 will be used for training and around 1000 samples will be used for testing the model. 

In [6]:
# Creating training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Following we can also check the shapes above training and testing patterns. 

In [7]:
# Shape of train-test sets
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(4000, 7)
(4000,)
(1001, 7)
(1001,)


We have successfully created the training and testing patterns as we desired. 

#### Defining a Neural Network Classifier

To build neural networks are using the Keras library to facilitate all building blocks of the model. Now below will first import the Sequential model and Dense layer. This sequential model will hold the various layers of the model. 

In [8]:
# Libraries for neural networks
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense

Now will start the building model, first will initialize the sequential model then will input layer, hidden layer, and output layer. Both input and hidden layers will activate according to the Relu function and as we are dealing with a binary classification problem the neurons of output layers will be activated accordingly. 

In [9]:
# Defining the neural network model
model = Sequential()
model.add(Dense(12, input_dim=7, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Summary of the neural network model
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 12)                96        
                                                                 
 dense_1 (Dense)             (None, 8)                 104       
                                                                 
 dense_2 (Dense)             (None, 1)                 9         
                                                                 
Total params: 209
Trainable params: 209
Non-trainable params: 0
_________________________________________________________________


From the above model summary, we can see that the model has correctly initialized with all layers and neurons.

#### Training with different Optimizers

Here while training with different optimizers we have initialized the same model for each optimizer, hence to avoid redundancy here we have shown how to initialize the model. The loss function that we are optimizing here is binary_crossentropy. 

So let’s start with the first optimizer which is stochastic gradient descent(SGD).

#### Stochastic Gradient Descent

Below we will first import the SGD function from the keras.optimizer class, there will set learning as 0.01 and in the compile method, we will call this optimizer with loss function and accuracy metrics. 

In [10]:
# Defining Stochastic Gradient Descent as optimizer
opt = keras.optimizers.SGD(learning_rate=0.01)

# Compiling the classifier
model.compile(loss='binary_crossentropy', optimizer = opt, metrics=['accuracy'])

Now next we will train the network with a training dataset. 

In [11]:
# Training the classifier
model.fit(X_train, y_train, epochs=20, batch_size=10)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2062e244e20>

Let’s check the accuracy of the training and testing set.

In [12]:
# Checking training and test accuracies
sgd_acc = model.evaluate(X_train, y_train)
print ("Training Accuracy: %.2f%%\n" % (sgd_acc[1]*100))
sgd_loss = model.evaluate(X_test, y_test)
print ("Testing Accuracy: %.2f%%\n" % (sgd_loss[1]*100))

Training Accuracy: 95.60%

Testing Accuracy: 94.81%



As we can see from the above SGD has helped the model get accuracy above 95% on both train and test data. 

### Adam (Adaptive Moment Estimation)

In a similar way that we have done for SGD, Adam optimizer will be used. Below we will compile, train and evaluate the model.

In [13]:
# Defining Adam as optimizer
opt = keras.optimizers.Adam(learning_rate=0.01)

In [14]:
# Compiling the classifier
model.compile(loss='binary_crossentropy', optimizer = opt, metrics=['accuracy'])

# Training the classifier
model.fit(X_train, y_train, epochs=20, batch_size=10)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x20634e0edf0>

In [15]:
# Checking training and test accuracies
adam_acc = model.evaluate(X_train, y_train)
print ("Training Accuracy: %.2f%%\n" % (adam_acc[1]*100))
adam_loss = model.evaluate(X_test, y_test)
print ("Testing Accuracy: %.2f%%\n" % (adam_loss[1]*100))

Training Accuracy: 96.60%

Testing Accuracy: 96.00%



Using Adam optimizer we have got 96% accuracy on the training dataset and 96% on the testing dataset.

#### AdaGrad (Adaptive Gradient)

Next will compile, train and evaluate the model using the AdaGrad optimizer.

In [16]:
# Defining AdaGrad as optimizer
opt = keras.optimizers.Adagrad(learning_rate=0.01)

# Compiling the classifier
model.compile(loss='binary_crossentropy', optimizer = opt, metrics=['accuracy'])

In [17]:
# Training the classifier
model.fit(X_train, y_train, epochs=20, batch_size=10)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x20634e17220>

In [18]:
# Checking training and test accuracies
agd_acc = model.evaluate(X_train, y_train)
print ("Training Accuracy: %.2f%%\n" % (agd_acc[1]*100))
agd_loss = model.evaluate(X_test, y_test)
print ("Testing Accuracy: %.2f%%\n" % (agd_loss[1]*100))

Training Accuracy: 96.88%

Testing Accuracy: 95.80%



Using AdaGrad optimizer we have got training accuracy of 96.88% and test accuracy of 95.8%.

#### RMSProp (Root Mean Squared Propagation)

Finally, we will compile, train and evaluate the model using the RMSProp optimizer.

In [19]:
# Defining RMSProp as optimizer
opt = keras.optimizers.RMSprop(learning_rate=0.01)

In [20]:
# Defining RMSProp as optimizer
opt = keras.optimizers.RMSprop(learning_rate=0.01)

In [21]:
# Training the classifier
model.fit(X_train, y_train, epochs=20, batch_size=10)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2064f3fa0d0>

In [22]:
# Checking training and test accuracies
rms_acc = model.evaluate(X_train, y_train)
print ("Training Accuracy: %.2f%%\n" % (rms_acc[1]*100))
rms_loss = model.evaluate(X_test, y_test)
print ("Testing Accuracy: %.2f%%\n" % (rms_loss[1]*100))

Training Accuracy: 96.93%

Testing Accuracy: 95.80%



So using the RMSProp optimizer we have got training accuracy of 96.93% and test accuracy of 95.8%.

Now we have finally checked the accuracy for all the optimizers let’s summarize those all. 

In [23]:
# Summarizing all
opt_summary = pd.DataFrame(data={'Optimizers' : ['SGD', 'Adam', 'AdaGrad', 'RMSProp'],
                   'Accuracy' : [sgd_acc[1], adam_acc[1], agd_acc[1], rms_acc[1]],
                   'Loss' : [sgd_loss[0], adam_loss[0], agd_loss[0], rms_loss[0]],
                  })
opt_summary

Unnamed: 0,Optimizers,Accuracy,Loss
0,SGD,0.956,0.134855
1,Adam,0.966,0.107713
2,AdaGrad,0.96875,0.105492
3,RMSProp,0.96925,0.105703


As we can see above all four optimizers have performed for this binary classification problem.

So this is how we can realize the optimizers for the neural network. 