#  Adam optimizer

# Description



The Adam optimizer is a stochastic gradient descent algorithm used in training artificial neural networks (ANNs). It combines elements of two other popular optimization techniques, AdaGrad and RMSProp:

It is based on adaptive estimation of first-order and second-order moments. First, it looks at how steep the slope is when we are trying to find the best way to decrease the error (that's the first-order moment). Then, it also considers how quickly the slope changes in different directions (that's the second-order moment).

Adam adjusts its learning process by keeping track of these moments as it goes along. This helps it figure out how much to change the learning rates for different parts of the neural network. By doing this, Adam can learn more effectively and get better results, especially when the data is noisy or the learning process is complicated.

## Details of the Adam Optimizer

1. Adaptive Learning Rate: Adam adjusts the learning rate for each parameter individually, based on the first and second moments of the gradients.

2. Momentum: It utilizes momentum to accelerate the optimization process by accumulating exponentially decaying average of past gradients.

3. Bias Correction: Adam applies bias correction to the estimates of the first and second moments to alleviate their initialization bias towards zero, especially in the initial training stages.

4. Parameter Updates: The parameters are updated based on the calculated moving averages of the gradients.

## Pros of Adam Optimizer

1. Efficient: Adam is known for its efficiency in training deep neural networks, often converging faster than other optimization algorithms.

2. Adaptive Learning Rates: Its adaptive learning rate mechanism allows for dynamic adjustments to the learning rates for each parameter, which can be beneficial in different stages of training.

3. Suitability for Sparse Data: Adam performs well with sparse gradients and noisy data, making it suitable for a wide range of applications.

## Cons of Adam Optimizer

1. Memory Usage: Adam maintains additional state variables for each parameter, which can increase memory consumption, especially for large models.

2. Hyperparameter Sensitivity: Adam requires careful tuning of hyperparameters such as learning rate, beta1, beta2, and epsilon, to achieve optimal performance. Improper tuning may lead to suboptimal results.

3. Robustness to Noise: In some cases, Adam may exhibit sensitivity to noise in the objective function, causing fluctuations in convergence behavior.

## References
- https://keras.io/api/optimizers/adam/
- https://www.geeksforgeeks.org/adam-optimizer-in-tensorflow/
- https://www.geeksforgeeks.org/intuition-of-adam-optimizer/
- https://spotintelligence.com/2023/03/01/adam-optimizer/

# Implementation using TensorFlow on fashion minst dataset.

In [None]:
from fashionmnist_model import FMM
import tensorflow as tf

In [None]:
# Load the data
X_train, y_train, X_test, y_test = FMM.load_data()

In [None]:
X_train, X_test = FMM.reshape_data(X_train, X_test)

In [None]:
optimizer = tf.keras.optimizers.Adam()

In [None]:
# Create and compile the model with the optimizer
model = FMM.create_model_v2()
print(f"Training with {optimizer.__class__.__name__} optimizer...")
history = FMM.compile_and_train(
    model, X_train, y_train, optimizer
)

In [None]:
FMM.evaluate(model, X_test, y_test, history)
FMM.plot_history(history)

### Application to Fashion MNIST Dataset:

- Loss: The "loss" measures how well the neural network is performing. It represents the difference between the predicted output and the actual output. Lower loss values indicate better performance.

- Accuracy: The "accuracy" measures how often the neural network makes correct predictions. Higher accuracy values indicate better performance.

Analysis of the results:

1. Epochs: The training process was repeated 30 times (epochs) to improve the neural network's performance.

2. Training and Validation: The neural network was trained on a training set and evaluated on a validation set after each epoch. This helped to monitor how well the model generalizes to new data.

3. Improvement: Over the epochs, both training and validation loss decrease, and accuracy increases. This indicates that the model is learning from the data and improving its performance.

4. Performance: The neural network achieved an accuracy of around 87% on the validation set by the end of training. This means that it correctly classified the images in the Fashion MNIST dataset about 87% of the time.

5. Stability: The loss and accuracy values on the validation set appear to stabilize towards the end of training, suggesting that the model's performance isn't changing significantly after a certain number of epochs.

In [None]:
raise KeyboardInterrupt

# Hyperparameter tunning

In [None]:
! pip install -U "ray[data,train,tune,serve]"

In [None]:
import ray
from ray import tune, train

In [None]:
ray.init()

In [None]:
# Define a function to train the model
def train_model(config):
    from fashionmnist_model import FMM
    import tensorflow as tf
    
    X_train, y_train, X_test, y_test = FMM.load_data()
    X_train, X_test = FMM.reshape_data(X_train, X_test)

    optimizer = tf.keras.optimizers.legacy.Adam(
        learning_rate=config["learning_rate"],
        beta_1=config["beta_1"],
        beta_2=config["beta_2"],
        epsilon=config["epsilon"],
        weight_decay=config["weight_decay"],
    )

    model = FMM.create_model_v2()
    history = FMM.compile_and_train(
        model, X_train, y_train, optimizer
    )
    
    loss, accuracy, _, _ = FMM.evaluate(model, X_test, y_test, history)

    train.report({"accuracy": accuracy, "loss": loss, **config})

In [None]:
search_space = {
    "learning_rate": tune.grid_search([0.001, 0.0005, 0.0001]),
    "beta_1": tune.grid_search([0.9, 0.95, 0.99]),
    "beta_2": tune.grid_search([0.999, 0.9999]),
    "epsilon": tune.grid_search([1e-8, 1e-7, 1e-6]),
    "weight_decay": tune.grid_search([1e-6, 1e-5, 1e-4]),
}

In [None]:
# Run hyperparameter tuning
analysis = tune.run(
    train_model,
    config=search_space,
    metric="accuracy",
    mode="max",
)

In [None]:
ray.shutdown()

In [None]:
FMM.plot_analysis_results(analysis, x_axis="learning_rate", y_axis="accuracy")

In [None]:
# Print the best hyperparameters and results
best_config = analysis.best_config
print("Best hyperparameters:", best_config)
print("Best accuracy:", analysis.best_result["accuracy"])

In [None]:
optimizer = tf.keras.optimizers.Adam(**best_config)
model = FMM.create_model_v2()
print(f"Training with {optimizer.__class__.__name__} optimizer...")
history = FMM.compile_and_train(
    model, X_train, y_train, optimizer
)

In [None]:
FMM.evaluate(model, X_test, y_test, history)
FMM.plot_history(history)