# Optimizer Comparison Exercise
## Objective
Explore how different optimization algorithms affect the training of neural networks.

## Background
Optimizers are algorithms or methods used to change the attributes of the neural network, such as weights and learning rate, to reduce the losses. Common optimizers include SGD (Stochastic Gradient Descent), Mini-batch SGD, Momentum, Adam, and RMSProp.


# Part 1: Classification task

## Setup
Start by importing necessary libraries and preparing a simple dataset.

In [39]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

In [40]:
# Load or create a dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [41]:
print(np.shape(X_train))
print(np.shape(y_train))
print(np.shape(X_test))
print(np.shape(y_test))

(800, 20)
(800,)
(200, 20)
(200,)


In [43]:
# Load or create a dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


history = model.fit(X_train, y_train, epochs=30, validation_data=(X_test, y_test), verbose=0)

## Building the Neural Network
Define a function to create a basic neural network. This function will take an optimizer as an argument. Define the network yourself with 2 hidden layers using 64 nodes each. They should both use relu as activation function. Dont forget the output layer which, for a binary classification problem should use what activation function?


In [28]:
def build_model(optimizer):
    # Add your model here following the above instructions.
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(2, activation='sigmoid')
    ])
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model


## Experimentation
Experiment with different optimizers. For each optimizer, train the model and plot its accuracy and loss.

### Task:
1. Try the following optimizers: 'sgd', 'adam', 'rmsprop', and a custom SGD with Momentum.
2. For the custom SGD with Momentum, use: tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
3. Train each model for a fixed number of epochs (e.g., 30) and evaluate its performance on the test set.
4. Do step 1-3 same, but change batch size.

In [35]:
optimizers = ['sgd', 'adam', 'rmsprop', 'SGD_momentum']
histories = {}

for opt in optimizers:
    print(f"Training with optimizer: {opt}")
    
    if opt == "SGD_momentum": 
        model = build_model(tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9))
    else:
        model = build_model(opt)
    
    # We manually specify the validation data
    history = model.fit(X_train, y_train, epochs=30, validation_data=(X_test, y_test), verbose=0)
    histories[str(opt)] = history


Training with optimizer: sgd


ValueError: in user code:

    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\training.py", line 1051, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\training.py", line 1109, in compute_loss
        return self.compiled_loss(
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\engine\compile_utils.py", line 265, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\losses.py", line 142, in __call__
        losses = call_fn(y_true, y_pred)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\losses.py", line 268, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\losses.py", line 2156, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "c:\Users\Kevin\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\backend.py", line 5707, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(

    ValueError: `logits` and `labels` must have the same shape, received ((32, 2) vs (32, 1)).


## Visualization
Plot the training and validation accuracy and loss for each optimizer.

In [None]:
plt.figure(figsize=(12, 8))
for opt, history in histories.items():
    plt.plot(history.history['accuracy'], label=f'{opt} - Accuracy')
    plt.plot(history.history['val_accuracy'], label=f'{opt} - Validation Accuracy')
plt.title('Model Accuracy by Optimizer')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

plt.figure(figsize=(12, 8))
for opt, history in histories.items():
    plt.plot(history.history['loss'], label=f'{opt} - Loss')
    plt.plot(history.history['val_loss'], label=f'{opt} - Validation Loss')
plt.title('Model Loss by Optimizer')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
plt.show()


## Analysis and Questions
* Which optimizer provided the fastest convergence?
* Which optimizer achieved the highest accuracy on the validation set?
* Discuss the possible reasons behind the performance differences observed.


# Part 2: Regression task
Now, let's apply the same set of optimizers to a regression problem.

## Setup for Regression
Import libraries and prepare a regression dataset. For simplicity, let's use a synthetic dataset.

In [None]:
from sklearn.datasets import make_regression

# Generate synthetic data for regression
X_reg, y_reg = make_regression(n_samples=500, n_features=2, noise=15, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)


## Define the Neural Network Model for Regression
Create a function to build a neural network model suitable for regression. Use 2 hidden layeres of 10 nodes each both using relu activations. Dont forget the output layer. For regression it is recommended to use MSE as loss and metric.

In [None]:
def build_model_regression(optimizer):
    # -- Define the model yourself here following the instuctions above.
    model = 
    model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['mse'])
    return model


## Experiment with Different Optimizers for Regression
Repeat the same process as in the classification task, but now for regression.


In [None]:
history_dict_reg = {}

for opt in optimizers:
    print(f"Training (Regression) with optimizer: {opt}")
    if opt=="SGD_momentum": 
        model = build_model_regression(tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9))
    else:
        model = build_model_regression(opt)
    history = model.fit(X_train_reg, y_train_reg, epochs=100, validation_split=0.2, verbose=0)
    history_dict_reg[opt] = history

    # Evaluate the model
    test_loss, test_mse = model.evaluate(X_test_reg, y_test_reg, verbose=0)
    print(f"Test MSE with {opt}: {test_mse:.4f}\n")


## Visualization of Results for Regression
Plot the training and validation loss (MSE) for each optimizer.

In [None]:
plt.figure(figsize=(12, 8))
for opt, history in history_dict_reg.items():
    plt.plot(history.history['mse'], label=f'{opt} - Train (Reg)')
    plt.plot(history.history['val_mse'], label=f'{opt} - Val (Reg)')

plt.title('Model MSE with Different Optimizers (Regression)')
plt.ylabel('MSE')
plt.xlabel('Epoch')
plt.legend()
plt.show()


## Analysis and Questions
* Compare the performance of the optimizers between the classification and regression tasks.
* Did certain optimizers perform better on one task than the other? Why might this be?
* Discuss the implications of these findings for selecting optimizers in real-world applications.

## Conclusion
Reflect on the importance of understanding the strengths and limitations of different optimizers in relation to the specific nature of the problem and dataset.