**Aim: Write a program to demonstrate the change in accuracy/loss/convergence time with change in 
optimizers like stochastic gradient descent, adam, adagrad, RMSprop and Nadam for any suitable 
application**

Objectives: 
1. To learn optimization algorithms 
2. To learn and understand hyperparameters 

Theory: 

SGD, Adam, RMSprop, Nadam 
The word ‘stochastic‘means a system or a process that is linked with a random probability. Hence, in 
Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for 
each iteration. In Gradient Descent, there is a term called “batch” which denotes the total number of 
samples from a dataset that is used for calculating the gradient for each iteration. In typical Gradient 
Descent optimization, like Batch Gradient Descent, the batch is taken to be the whole dataset. 
Although, using the whole dataset is really useful for getting to the minima in a less noisy and less 
random manner, but the problem arises when our datasets gets big. 
Adam is a replacement optimization algorithm for stochastic gradient descent for training deep 
learning models. Adam combines the best properties of the AdaGrad and RMSProp algorithms to 
provide an optimization algorithm that can handle sparse gradients on noisy problems. 
The RMSprop optimizer is similar to the gradient descent algorithm with momentum. The RMSprop 
optimizer restricts the oscillations in the vertical direction. Therefore, we can increase our learning 
rate and our algorithm could take larger steps in the horizontal direction converging faster. The 
difference between RMSprop and gradient descent is on how the gradients are calculated. The 
following equations show how the gradients are calculated for the RMSprop and gradient descent 
with momentum. The value of momentum is denoted by beta and is usually set to 0.9. 
Nadam combines NAG and Adam. Nadam is employed for noisy gradients or for gradients with high 
curvatures. The learning process is accelerated by summing up the exponential decay of the moving 
averages for the previous and current gradient

Code:

In [3]:
import tensorflow as tf 
from tensorflow.keras import layers,models,datasets 
from tensorflow.keras.applications.vgg16 import VGG16


In [4]:
(train_images,train_labels),(test_images,test_labels)=datasets.cifar100.load_data() 
train_images=train_images/255 
test_images=test_images/255 

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
[1m169001437/169001437[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 0us/step


In [6]:
base=VGG16(include_top=False,input_shape=(32,32,3)) 
base.trainable=False 
model=models.Sequential() 
model.add(layers.Flatten()) 
model.add(layers.Dense(1200,activation="relu")) 
model.add(layers.Dense(100,activation="softmax")) 
model.compile(optimizer="adam",loss="sparse_categorical_crossentropy",metrics=
["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=200) 

Epoch 1/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 32ms/step - accuracy: 0.0533 - loss: 4.5279 - val_accuracy: 0.1159 - val_loss: 3.8610
Epoch 2/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 30ms/step - accuracy: 0.1314 - loss: 3.7814 - val_accuracy: 0.1515 - val_loss: 3.7106
Epoch 3/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 30ms/step - accuracy: 0.1625 - loss: 3.6146 - val_accuracy: 0.1705 - val_loss: 3.6002
Epoch 4/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 30ms/step - accuracy: 0.1851 - loss: 3.4920 - val_accuracy: 0.1800 - val_loss: 3.5515
Epoch 5/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 30ms/step - accuracy: 0.1953 - loss: 3.4132 - val_accuracy: 0.1929 - val_loss: 3.5004
Epoch 6/10
[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 30ms/step - accuracy: 0.2090 - loss: 3.3473 - val_accuracy: 0.2010 - val_loss: 3.4430
Epoch 7/10
[1m200/200

<keras.src.callbacks.history.History at 0x23a7f0121e0>

In [8]:
model.compile(optimizer="sgd",loss="sparse_categorical_crossentropy",metrics=["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=50) 

Epoch 1/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 59ms/step - accuracy: 0.2719 - loss: 3.0147 - val_accuracy: 0.2403 - val_loss: 3.2510
Epoch 2/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 53ms/step - accuracy: 0.2859 - loss: 2.9660 - val_accuracy: 0.2447 - val_loss: 3.2415
Epoch 3/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 53ms/step - accuracy: 0.2881 - loss: 2.9606 - val_accuracy: 0.2448 - val_loss: 3.2360
Epoch 4/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 53ms/step - accuracy: 0.2885 - loss: 2.9569 - val_accuracy: 0.2448 - val_loss: 3.2321
Epoch 5/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 53ms/step - accuracy: 0.2892 - loss: 2.9478 - val_accuracy: 0.2460 - val_loss: 3.2296
Epoch 6/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 56ms/step - accuracy: 0.2947 - loss: 2.9356 - val_accuracy: 0.2450 - val_loss: 3.2280
Epoch 7/10
[1m50/50[0m [32m━━━━

<keras.src.callbacks.history.History at 0x23a7f03b7d0>

In [9]:
model.compile(optimizer="adagrad",loss="sparse_categorical_crossentropy",metrics=["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=50)

Epoch 1/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 69ms/step - accuracy: 0.2917 - loss: 2.9275 - val_accuracy: 0.2453 - val_loss: 3.2219
Epoch 2/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 59ms/step - accuracy: 0.2925 - loss: 2.9297 - val_accuracy: 0.2453 - val_loss: 3.2217
Epoch 3/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 60ms/step - accuracy: 0.2977 - loss: 2.9220 - val_accuracy: 0.2452 - val_loss: 3.2214
Epoch 4/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 63ms/step - accuracy: 0.2946 - loss: 2.9289 - val_accuracy: 0.2459 - val_loss: 3.2211
Epoch 5/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 60ms/step - accuracy: 0.2978 - loss: 2.9177 - val_accuracy: 0.2457 - val_loss: 3.2208
Epoch 6/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 60ms/step - accuracy: 0.2969 - loss: 2.9134 - val_accuracy: 0.2459 - val_loss: 3.2206
Epoch 7/10
[1m50/50[0m [32m━━━━

<keras.src.callbacks.history.History at 0x23a7edd92b0>

In [10]:
model.compile(optimizer="rmsprop",loss="sparse_categorical_crossentropy",metrics=["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=50)

Epoch 1/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 70ms/step - accuracy: 0.2196 - loss: 3.4991 - val_accuracy: 0.2043 - val_loss: 3.4704
Epoch 2/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 104ms/step - accuracy: 0.2505 - loss: 3.1410 - val_accuracy: 0.2073 - val_loss: 3.4186
Epoch 3/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 117ms/step - accuracy: 0.2545 - loss: 3.1191 - val_accuracy: 0.2214 - val_loss: 3.3732
Epoch 4/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 111ms/step - accuracy: 0.2575 - loss: 3.1021 - val_accuracy: 0.2250 - val_loss: 3.3299
Epoch 5/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 101ms/step - accuracy: 0.2600 - loss: 3.0806 - val_accuracy: 0.2248 - val_loss: 3.3344
Epoch 6/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 105ms/step - accuracy: 0.2595 - loss: 3.0682 - val_accuracy: 0.2262 - val_loss: 3.3191
Epoch 7/10
[1m50/50[0m [3

<keras.src.callbacks.history.History at 0x23a7effebd0>

In [11]:
model.compile(optimizer="sgd",loss="sparse_categorical_crossentropy",metrics=["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=50) 

Epoch 1/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 68ms/step - accuracy: 0.2986 - loss: 2.8901 - val_accuracy: 0.2531 - val_loss: 3.2010
Epoch 2/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 56ms/step - accuracy: 0.3139 - loss: 2.8185 - val_accuracy: 0.2539 - val_loss: 3.1949
Epoch 3/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 54ms/step - accuracy: 0.3209 - loss: 2.8102 - val_accuracy: 0.2555 - val_loss: 3.1929
Epoch 4/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 57ms/step - accuracy: 0.3186 - loss: 2.7999 - val_accuracy: 0.2542 - val_loss: 3.1917
Epoch 5/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 59ms/step - accuracy: 0.3197 - loss: 2.7938 - val_accuracy: 0.2556 - val_loss: 3.1913
Epoch 6/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 58ms/step - accuracy: 0.3230 - loss: 2.7971 - val_accuracy: 0.2556 - val_loss: 3.1900
Epoch 7/10
[1m50/50[0m [32m━━━━

<keras.src.callbacks.history.History at 0x23a7efee810>

In [12]:
model.compile(optimizer="nadam",loss="sparse_categorical_crossentropy",metrics=["accuracy"]) 
model.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),steps_per_epoch=50) 

Epoch 1/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 122ms/step - accuracy: 0.3028 - loss: 2.8646 - val_accuracy: 0.2513 - val_loss: 3.2085
Epoch 2/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 144ms/step - accuracy: 0.3143 - loss: 2.7969 - val_accuracy: 0.2502 - val_loss: 3.1991
Epoch 3/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 103ms/step - accuracy: 0.3210 - loss: 2.7768 - val_accuracy: 0.2517 - val_loss: 3.2025
Epoch 4/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 106ms/step - accuracy: 0.3257 - loss: 2.7563 - val_accuracy: 0.2496 - val_loss: 3.2006
Epoch 5/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 128ms/step - accuracy: 0.3260 - loss: 2.7414 - val_accuracy: 0.2512 - val_loss: 3.2061
Epoch 6/10
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 103ms/step - accuracy: 0.3345 - loss: 2.7270 - val_accuracy: 0.2544 - val_loss: 3.1917
Epoch 7/10
[1m50/50[0m [3

<keras.src.callbacks.history.History at 0x23a7f08f0b0>