# Optimization Algorithms in Deep Learning

In this notebook, we’ll compare some popular **optimization algorithms** used to train neural networks:
- **Stochastic Gradient Descent (SGD)**
- **SGD with Momentum**
- **RMSprop**
- **Adam**

These optimizers help the model converge faster and avoid local minima.

In [None]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

print("TensorFlow version:", tf.__version__)

## Load and preprocess dataset

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

print("Training data:", x_train.shape)
print("Test data:", x_test.shape)

## Define a function to build the model

In [None]:
def create_model(optimizer):
    model = Sequential([
        Flatten(input_shape=(28,28)),
        Dense(128, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=optimizer,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

## Train models with different optimizers

In [None]:
optimizers = {
    "SGD": tf.keras.optimizers.SGD(learning_rate=0.01),
    "SGD+Momentum": tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),
    "RMSprop": tf.keras.optimizers.RMSprop(learning_rate=0.001),
    "Adam": tf.keras.optimizers.Adam(learning_rate=0.001)
}

histories = {}
for name, opt in optimizers.items():
    print(f"\nTraining with {name}...")
    model = create_model(opt)
    history = model.fit(x_train, y_train, epochs=5, batch_size=32,
                        validation_data=(x_test, y_test), verbose=0)
    histories[name] = history

## Plot accuracy comparison

In [None]:
plt.figure(figsize=(8,5))
for name, history in histories.items():
    plt.plot(history.history['val_accuracy'], label=name)

plt.title('Validation Accuracy with Different Optimizers')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

## Key Takeaways
- **SGD**: Simple but slower convergence.
- **SGD+Momentum**: Helps escape local minima, faster learning.
- **RMSprop**: Adaptive learning rate, good for RNNs.
- **Adam**: Combines RMSprop + Momentum, widely used default optimizer.