# Exercise - optimizers and activation functions

1. Use the **fetch_california_housing** data (remember to split your data into a train and test data). Use the five optimizers presented in class to train five neural networks (identival aside from the optimizer used). How well does the networks perform on the test set, as measured by MSE and MAE? Rank the optimizers.
1. Select the best optimizer and use it for this exercise. Experiment with different activation functions, including at least sigmoid, tanh, and relu. Rank the activation functions you try. 
1. Using your findings, as well as experimenting with more layers, try to minimize the test MSE.

**Note**: You may want to use https://www.tensorflow.org/api_docs/python/tf/keras/activations and https://www.tensorflow.org/api_docs/python/tf/keras/optimizers.

**Note**: **fetch_california_housing** data source file: https://github.com/scikit-learn/scikit-learn/blob/36958fb24/sklearn/datasets/_california_housing.py#L53

**See slides for more details!**

# Exercise 1

Use the fetch_california_housing data (remember to split your data into a train and test data). Use the five optimizers presented in class to train five neural networks (identival aside from the optimizer used). How well does the networks perform on the test set, as measured by MSE and MAE? Rank the optimizers.

In [1]:
import tensorflow as tf
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

x, y = fetch_california_housing(return_X_y=True)

# Use `train_test_split` to split your data into a train and a test set.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

# Scale
scaler = StandardScaler()
z_train = scaler.fit_transform(x_train)
z_test = scaler.transform(x_test)

print(z_train.shape, z_test.shape, y_train.shape, y_test.shape)

(16512, 8) (4128, 8) (16512,) (4128,)


Here is a small function you can use as a starting point for your network - but feel free to experiment!

In [2]:
def build_nn(activation = 'sigmoid'):
    your_regression_nn = tf.keras.models.Sequential([
        tf.keras.layers.Dense(64, activation=activation, input_shape=(8,)), # input_shape=8 since 8 features
        tf.keras.layers.Dense(1, activation='linear'), # linear is used for regression. 1 node since 1 output (pr. observation)
        ])

    return your_regression_nn

**Important note**: Remember to use "mse" as your loss function! Now, it is okay to try something else, but at least do not use cross entropy (remember that is for classification.

Go through each of the five optimizers covered in class and rank their performance on this dataset.

In [4]:
# SGD
# This code I have completed for you - use it to construct to other 4 cases (i.e. for the other 4 optimizers covered in class).
nn_sgd = build_nn()
nn_sgd.compile(
    optimizer='SGD',
    loss='mse',
    metrics=['mae'], # to also track MAE. MSE is "automatically" measured since it is the loss
    )
nn_sgd.fit(z_train, y_train, epochs=5)
mse, mae = nn_sgd.evaluate(z_test, y_test)
print(f'Test mse = {round(mse, 3)}, test mae = {round(mae, 3)}.')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test mse = 0.499, test mae = 0.511.


In [5]:
optimizers = ['sgd','adam','adagrad','adadelta','rmsprop']
for opt in optimizers:
# This code I have completed for you - use it to construct to other 4 cases (i.e. for the other 4 optimizers covered in class).
    nn_gen = build_nn()
    nn_gen.compile(
        optimizer=opt,
        loss='mse',
        metrics=['mae'], # to also track MAE. MSE is "automatically" measured since it is the loss
        )
    nn_gen.fit(z_train, y_train, epochs=5,verbose=0)
    mse, mae = nn_gen.evaluate(z_test, y_test)
    print(f'Optimizer = {opt}, Test mse = {round(mse, 3)}, test mae = {round(mae, 3)}.')

Optimizer = sgd, Test mse = 0.5, test mae = 0.514.
Optimizer = adam, Test mse = 0.463, test mae = 0.485.
Optimizer = adagrad, Test mse = 0.935, test mae = 0.734.
Optimizer = adadelta, Test mse = 3.937, test mae = 1.673.
Optimizer = rmsprop, Test mse = 0.465, test mae = 0.496.


# Exercise 2

Select the best optimizer and use it for this exercise. Experiment with different activation functions, including at least sigmoid, tanh, and relu. Rank the activation functions you try. 

In [6]:
activations = ['sigmoid','relu','tanh','elu','gelu','selu']
for act in activations:
    nn_gen = build_nn(act)
    nn_gen.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae'],
    )
    nn_gen.fit(z_train, y_train, epochs=5,verbose=0)
    mse, mae = nn_gen.evaluate(z_test, y_test)
    print(f'Activation = {act}, Test mse = {round(mse, 3)}, test mae = {round(mae, 3)}.')

Activation = sigmoid, Test mse = 0.467, test mae = 0.501.
Activation = relu, Test mse = 0.401, test mae = 0.434.
Activation = tanh, Test mse = 0.457, test mae = 0.488.
Activation = elu, Test mse = 0.46, test mae = 0.483.
Activation = gelu, Test mse = 0.422, test mae = 0.463.
Activation = selu, Test mse = 0.466, test mae = 0.488.


# Exercise 3

Using your findings, as well as experimenting with more layers, try to minimize the test MSE.

In [7]:
# Try to experiment a bit, but here is an example of a model with more layers
def build_better_nn(activation):
    your_regression_nn = tf.keras.models.Sequential([
        tf.keras.layers.Dense(32, activation=activation, input_shape=(8,)), # input_shape=8 since 8 features
        tf.keras.layers.Dense(64, activation=activation),
        tf.keras.layers.Dense(128, activation=activation),
        tf.keras.layers.Dense(1, activation='linear'), # linear is used for regression. 1 node since 1 output
        ])

    return your_regression_nn

In [8]:
nn_final = build_better_nn("relu")
nn_final.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae'],
    )
nn_final.fit(z_train, y_train, epochs=25)
mse, mae = nn_final.evaluate(z_test, y_test)
print(f'Final model test mse = {round(mse, 3)}, test mae = {round(mae, 3)}.')

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Final model test mse = 0.285, test mae = 0.35.
