<a href="https://colab.research.google.com/github/AnanyaGodse/Machine-Learning-II-Deep-Learning-/blob/main/Experiment_4_Optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

In [None]:
# Loading the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalizing the data
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

In [None]:
# Defining a simple neural network
def create_model(optimizer):
    model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(128, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
# Hyperparameters
batch_size = 32
epochs = 10

In [None]:
# Define optimizers
optimizers = {
    "Gradient Descent": tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.0),
    "Stochastic Gradient Descent": tf.keras.optimizers.SGD(),
    "SGD with Momentum": tf.keras.optimizers.SGD(momentum=0.9),
    "Mini-Batch Gradient Descent": tf.keras.optimizers.SGD(),
    "Adagrad": tf.keras.optimizers.Adagrad(),
    "RMSProp": tf.keras.optimizers.RMSprop(),
    "AdaDelta": tf.keras.optimizers.Adadelta(),
    "Adam": tf.keras.optimizers.Adam(),
}

In [None]:
# Train and evaluate the model with different optimizers
results = {}
for name, optimizer in optimizers.items():
    print(f"Training with optimizer: {name}")
    model = create_model(optimizer)
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=0)
    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
    results[name] = (test_loss, test_accuracy)

Training with optimizer: Gradient Descent


  super().__init__(**kwargs)


Training with optimizer: Stochastic Gradient Descent
Training with optimizer: SGD with Momentum
Training with optimizer: Mini-Batch Gradient Descent
Training with optimizer: Adagrad
Training with optimizer: RMSProp
Training with optimizer: AdaDelta
Training with optimizer: Adam


In [None]:
# Compare results
for optimizer, (loss, accuracy) in results.items():
    print(f"Optimizer: {optimizer}, Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

Optimizer: Gradient Descent, Test Loss: 0.1669, Test Accuracy: 0.9530
Optimizer: Stochastic Gradient Descent, Test Loss: 0.1627, Test Accuracy: 0.9524
Optimizer: SGD with Momentum, Test Loss: 0.0687, Test Accuracy: 0.9791
Optimizer: Mini-Batch Gradient Descent, Test Loss: 0.1603, Test Accuracy: 0.9528
Optimizer: Adagrad, Test Loss: 0.3226, Test Accuracy: 0.9142
Optimizer: RMSProp, Test Loss: 0.0846, Test Accuracy: 0.9781
Optimizer: AdaDelta, Test Loss: 1.0357, Test Accuracy: 0.7966
Optimizer: Adam, Test Loss: 0.0820, Test Accuracy: 0.9796


<pre>
1. Gradient Descent
Advantages: Simple and converges to the global minimum for convex functions.
Disadvantages: Slow convergence and sensitive to learning rate.

2. Stochastic Gradient Descent (SGD)
Advantages: Faster updates lead to quicker convergence.
Disadvantages: High variance can cause oscillations.

3. SGD with Momentum
Advantages: Accelerates convergence and reduces oscillations.
Disadvantages: Requires tuning of the momentum hyperparameter.

4. Mini-Batch Gradient Descent
Advantages: Balances efficiency and stability in updates.
Disadvantages: Choosing the right batch size can be tricky.

5. Adagrad
Advantages: Adapts learning rates for sparse data.
Disadvantages: Learning rate can diminish too quickly.

6. RMSProp
Advantages: Adapts learning rates, effective for non-stationary objectives.
Disadvantages: Requires tuning of decay rate.

7. AdaDelta
Advantages: Addresses diminishing learning rates from Adagrad.
Disadvantages: More complex and can converge slower.

8. Adam
Advantages: Combines benefits of momentum and RMSProp for fast convergence.
Disadvantages: Can complicate tuning with multiple hyperparameters.
</pre>


**Conclusion:**
Adam and SGD with Momentum had the best test accuracy and lowest loss.