<a href="https://colab.research.google.com/github/cagBRT/IntroToDNNwKeras/blob/master/ML_Optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The Keras optimizer ensures that appropriate weights and loss functions are used to keep the difference between the predicted and actual value of the neural network learning model optimized. There are various types of Keras optimizers available to choose from.

Optimizers are the general concept used in neural networks because it involves randomly initializing and manipulating the value of weights for every epoch to increase the model network’s accuracy potential. A comparison is made in every epoch between the output from the training data and the actual data, which helps us calculate the errors and find out the loss functions and further updation of the corresponding weights.

There needs to be some way to conclude how the weight should be manipulated to get the most accuracy for which Keras optimizers come into the picture. Keras optimizer helps us achieve the ideal weights and get a loss function that is completely optimized. One of the most popular of all optimizers is gradient descent. Various other keras optimizers are available and used widely for different practical purposes. There is a provision of various

In [None]:
!pip install plot-model

In [None]:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.utils import plot_model

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

In [None]:
def build_compile(optimizer_name='SGD'):

    # Use the same network topology as last week
    model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)),
                          keras.layers.Dense(128, activation='relu'),
                          keras.layers.Dense(10, activation='softmax')])

    # compile the model with a cross-entropy loss and specify the given optimizer
    model.compile(optimizer=optimizer_name, loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
    return model

In [None]:
model.summary()

In [None]:
from plot_model import plot_model
plot_model()

here are various types of Keras optimizers that are listed below –

Adagrad: This optimizer of Keras uses specific parameters in the learning rates. It has got its base of the frequencies made in the updates by the value of parameters, and accordingly, the working happens. The individual features affect the learning rate and are adjusted accordingly. There is also the scenario where various values of the learning rate for some weights correspond.
Adam: This optimizer stands for Adaptive Moment estimation. This makes the adam algorithm; the gradient descent method is upgraded for the optimization tasks. It requires less memory and is very efficient. This method must go in this scenario when we have a lot of data in bulk quantity and parameters associated with it. It is most popular among developers of neural networks.
Nadam: This optimizer makes use of the Nadam algorithm. I stand for Nesterov and adam optimizer, and the component of Nesterov is more efficient than the previous implementations. Nesterov component is used for the updation of the gradient by the Nadam optimizer.
Adamax: It is the adaption of the algorithm of Adam optimizer hence the name Adam max. The base of this algorithm is the infinity norm. When using the models that have embeddings, it is considered superior to Adam optimizer in some scenarios.
RMSprop: It stands for Root mean Square propagation. The main motive of the RMSprop is to make sure that there is a constant movement in the average calculation of the square of gradients, and the performance of the task of division for gradient upon the root of average also takes place.

In [None]:
optimizer_names = ['SGD','Momentum','Nesterov', 'RMSprop','Adagrad','Adam','NAdam']
optimizer_list = ['SGD',keras.optimizers.SGD(learning_rate=0.01, momentum=0.5, nesterov=False),keras.optimizers.SGD(learning_rate=0.01, momentum=0.5, nesterov=True), 'RMSprop','Adagrad','Adam','NAdam']

In [None]:
# Two arrays for training and validation performance
hist_acc = []
hist_val_acc = []

# Iterate over optimizers and train the network, using x_test and y_test as a validation set in each epoch
for item,name in zip(optimizer_list, optimizer_names):
    print("-----------------------------")
    print("Doing %s optimizer" %str(name))
    print("-----------------------------")

    # Get the model from our function above
    model = build_compile(item)

    # Train the model
    history = model.fit(x_train, y_train, epochs=50, batch_size=32, validation_data=(x_test, y_test))

    # Store the performance
    hist_acc.append(history.history['val_accuracy'])
    hist_val_acc.append(history.history['val_accuracy'])
    print("-----------------------------")

In [None]:
# summarize history for accuracy on training set
for i in range(len(optimizer_list)):
    plt.plot(hist_acc[i],'-o',label=str(optimizer_names[i]))
plt.title('model accuracy on train')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='upper left')
plt.show()

In [None]:
# summarize history for accuracy on test set
for i in range(len(optimizer_list)):
    plt.plot(hist_val_acc[i],'-o', label=str(optimizer_names[i]))
plt.title('model accuracy on test')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='upper left')
plt.show()


In [None]:
nepochs = 50

In [None]:
# Implement formula (15)
initial_learning_rate = 0.01
epochs = nepochs
decay = initial_learning_rate / epochs

def lr_time_based_decay(epoch, lr):
    return initial_learning_rate * 1 / (1 + decay * epoch)


In [None]:
# Plot the learning rate as a function of the number of epochs
plt.plot(lr_time_based_decay(np.arange(0,nepochs),0.01))
plt.ylabel('learning rate')
plt.xlabel('epoch')
plt.show()

In [None]:
# Train the network with the learning rate schedule
model = build_compile()
history_time_based_decay = model.fit(
    x_train,
    y_train,
    epochs=nepochs,
    batch_size=32,
    callbacks=[keras.callbacks.LearningRateScheduler(lr_time_based_decay, verbose=1)], validation_data=(x_test, y_test))


In [None]:
# Implement formula (16)
initial_learning_rate = 0.01
def lr_step_decay(epoch, lr):
    drop_rate = 0.5
    epochs_drop = 10.0
    return initial_learning_rate * np.power(drop_rate, np.floor(epoch/epochs_drop))

In [None]:
# Plot the learning rate as a function of the number of epochs
plt.plot(lr_step_decay(np.arange(0,nepochs),0.01))
plt.ylabel('learning rate')
plt.xlabel('epoch')
plt.show()


In [None]:
# Train the network with the learning rate schedule
model = build_compile()
history_step_decay = model.fit(
    x_train,
    y_train,
    epochs=nepochs,
    batch_size=32,
    callbacks=[keras.callbacks.LearningRateScheduler(lr_step_decay, verbose=1)], validation_data=(x_test, y_test))

In [None]:
# Implement formula (17)
initial_learning_rate = 0.01
def lr_exp_decay(epoch, lr):
    k = 0.1
    return initial_learning_rate * np.exp(-k*epoch)


In [None]:
# Plot the learning rate as a function of the number of epochs
plt.plot(lr_exp_decay(np.arange(0,nepochs),0.01))
plt.ylabel('learning rate')
plt.xlabel('epoch')
plt.show()

In [None]:
# Train the network with the learning rate schedule
model = build_compile()
history_exp_decay = model.fit(
    x_train,
    y_train,
    epochs=nepochs,
    batch_size=32,
    callbacks=[keras.callbacks.LearningRateScheduler(lr_exp_decay, verbose=1)], validation_data=(x_test, y_test))

In [None]:
# summarize history for accuracy
plt.plot(hist_acc[0],'-o',label='Constant')
plt.plot(history_exp_decay.history['val_accuracy'],'-o', label="Exp. Decay")
plt.plot(history_step_decay.history['val_accuracy'],'-o', label="Step Decay")
plt.plot(history_time_based_decay.history['val_accuracy'],'-o', label="Time Decay")
plt.title('model accuracy on train')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='upper left')
plt.show()

In [None]:
plt.plot(hist_val_acc[0],'-o',label='Constant')
plt.plot(history_exp_decay.history['val_accuracy'],'-o', label="Exp. Decay")
plt.plot(history_step_decay.history['val_accuracy'],'-o', label="Step Decay")
plt.plot(history_time_based_decay.history['val_accuracy'],'-o', label="Time Decay")
plt.title('model accuracy on test')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='upper left')
plt.show()

In [None]:
# Build the model with an L2 regularization added to all weights

model_l2 = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)),
                      keras.layers.Dense(128, activation='relu',kernel_regularizer=keras.regularizers.l2(0.001)),
                      keras.layers.Dense(10, activation='softmax',kernel_regularizer=keras.regularizers.l2(0.001))])

# Compile the model and optimize with adam
model_l2.compile(optimizer='Adam', loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])

In [None]:
# Fit the model to the data while providing a validation set for each epoch
history_l2 = model_l2.fit(x_train, y_train, epochs=50, batch_size=32, validation_data=(x_test, y_test))

In [None]:
# Build the model with early stopping
model_es = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)),
                      keras.layers.Dense(128, activation='relu'),
                      keras.layers.Dense(10, activation='softmax')])

# Compile the model and optimize with adam
es = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1)
model_es.compile(optimizer='Adam', loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])

In [None]:
# Fit the model to the data while providing a validation set for each epoch
history_es = model_es.fit(x_train, y_train, epochs=50, batch_size=32, validation_data=(x_test, y_test), callbacks=[es])


In [None]:
# Build the model with dropout
model_dropout = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)),
                      keras.layers.Dense(128, activation='relu'),
                      keras.layers.Dropout(.2),
                      keras.layers.Dense(10, activation='softmax'),
                      keras.layers.Dropout(.2)])

# Compile the model and optimize with adam
model_dropout.compile(optimizer='Adam', loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])

In [None]:
# Fit the model to the data while providing a validation set for each epoch
history_dropout = model_dropout.fit(x_train, y_train, epochs=50, batch_size=32, validation_data=(x_test, y_test))


In [None]:
# summarize history for accuracy
plt.plot(hist_val_acc[0],'-o', label='Standard')
plt.plot(history_l2.history['val_acc'],'-o', label="L2")
plt.plot(history_es.history['val_acc'],'-o', label="Early Stopping")
plt.plot(history_dropout.history['val_acc'],'-o', label="Dropout")
plt.title('model accuracy on test')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='upper right')
plt.show()