# Machine Learning: Exercise session 09

In this exercise session we will focus on neural networks (NN), and convolutional neural networks (CNN).

The first two problems deal with dense NN and must be submitted for grading.

The third problem deals with CNN and it is **not** counted for grading.

The fourth problem shows additional topics about NN and it is **not** counted for grading.

## 1. Basics of dense neural networks

Make sure you have `tensorflow` installed on your computer. `Tensorflow` is a software geared towards Deep Learning, i.e., neural networks. It comes as a Python package, so it must be installed as you would install other packages, such as `numpy`, `pandas`, and `scikit-learn`.
`Keras` is a library that comes with `tensorflow` and allows you to create, fit, and predict neural networks.

In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

ModuleNotFoundError: No module named 'tensorflow'

The goal of this problem is to build an image classifier with a dense NN. We will be using the Fashion MNIST dataset [https://github.com/zalandoresearch/fashion-mnist#readme].
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples.
The dataset consists of 10 classes which are `T-shirt/top`, `Trouser`, `Pullover`, `Dress`, `Coat`, `Sandal`, `Shirt`, `Sneaker`, `Bag`, and `Ankle boot`.

* Load the data by filling the `??`. Furthermore, investigate the dimension and type of the training data.

In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(??, ??), (??, ??) = fashion_mnist.load_data()

* The pixel intensities are represented as integers (from 0 to 255). Since we are going to train the NN using gradient descent, we must scale the input features so that they lie in the [0, 1] interval.
Fill in the `??`.

_Hint_: Notice that you want your input features to be floats, but the original values are stored as integers.

In [None]:
X_train_scaled = ?? / ??
X_train_scaled.dtype

* Since the dataset has no validation set, let us create one. Split the training set so that you have 5,000 observations in the validation set. Fill in the `??`.

In [None]:
X_valid = X_train_scaled[:??]
X_train = X_train_scaled[??:]
y_valid = y_train_full[:??]
y_train = y_train_full[??:]

* Plot the first image in the training dataset using the `plt.imshow` function. Set the argument `cmap="binary"`.

* In this dataset, the classes are encoded as integers from 0 to 9. From the dataset websites, retrieve the names of the classes and store them in a list of strings. Fill in the `??`.

In [None]:
class_names = [??, ??, ??, ??, ??, ??, ??, ??, ??, ??]

* Having encoded the integer classes with their names, we can now plot some images and their corresponding labels. Fill in the `??`.

In [None]:
n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(??):
    for col in range(??):
        index = n_cols * row + col
        plt.subplot(n_rows, n_cols, index + 1)
        plt.imshow(X_train[??], cmap="binary")
        plt.axis('off')
        plt.title(class_names[y_train[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()

* We are now ready to build the architecture for our neural network. We want two hidden layers with 300 and 100 nodes respectively and the `"relu"` activation function. For the final layer we use the `"softmax"` activation function. Why do we need to "flatten" the inputs? Fill in the `??`.

In [None]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[??,??]),
    keras.layers.Dense(units=??, activation=??),
    keras.layers.Dense(units=??, activation=??),
    keras.layers.Dense(units=??,  activation=??)
])

* Using the `summary()` method of the object `model`, try to understand how to compute the number of parameters.

_Hint_: do not forget about the bias terms (i.e., intercept).

* Using the `get_layer` and `get_weights` methods, you can access the values of the weights of each layer. Fill in the `??`. Why do you think the weights are not initialized to zero?

In [None]:
weights, biases = model.get_layer(??).get_weights()

* Now that the model is created, we must compile it. Specify as loss `"sparse_categorical_crossentropy"`, as optimizer `"sgd"`, and as metrics `"accuracy"`. Fill in the `??`.

In [None]:
model.compile(loss = ??,
              optimizer = ??
              metrics= [??])

* Now, let us train the model over 30 epochs. The stochastic gradient descent algorithm handles one mini-batch of observations at a time (e.g., 32 observations), and it goes through the whole training set. Each pass is called an _epoch_. Fill in the `??`.

In [None]:
history = model.fit(x=??, y=??, 
                   validation_data=(??, ??),
                   epochs=??)

* Now that our model is trained, let us plot the learning curves. Fill in the `??`. Is there any evidence of overfitting? Why/why not?

In [None]:
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(??)
plt.gca().set_ylim(??, ??);

* Let us evaluate our model on the test set by calling the `evaluate()` method of the `model` object.

* Let us predict the class of the observations 24, 188, 3023 in the test set and plot the results. Fill in the `??`.

In [None]:
indexes = [??]
X_new = X_test[??]
y_pred = np.argmax(model.predict(??), axis = 1)
y_new = y_test[??]
print("Predicted classes:", y_pred)
print("True classes:", y_new)

In [None]:
plt.figure(figsize=(7.2, 2.4))
for index, image in enumerate(X_new):
    cls_nm = class_names[??]
    plt.subplot(1, 3, index + 1)
    plt.imshow(image, cmap="binary")
    plt.axis('off')
    plt.title(cls_nm, fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()

## 2. Advanced topics in dense neural networks

We continue with the example of the previous problem.

#### Saving and restoring fitted models

* Since training neural networks is computationally intensive and time consuming, it is important to be able to save and restore trained models. Keras makes this very easy. Save and reload the model fitted in the previous exercise by filling in the `??`.

In [None]:
model.save(??)
model_restored = keras.models.load_model(??)

* What if training lasts several hours? This is quite common, especially when training on large datasets. In this case, you should not only save your model at the end of training, but also save checkpoints at regular intervals during training, to avoid losing everything if your computer crashes. You can do so by using the `callbacks` argument in the `fit()` function. This argument accepts list of `callbacks`. A `callback` is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc) (https://keras.io/api/callbacks). Fill in the `??` to save the checkpoints of your model at the end of each epoch.

In [None]:
checkpoint_cb = keras.callbacks.ModelCheckpoint(??)
history = model.fit(??, ??, 
                    validation_data=(??, ??), 
                    epochs=??, callbacks=[??])

#### Avoiding overfitting

* A useful strategy to avoid overfitting with neural networks is by using early stopping, i.e., interrupt the training when there is no improvement on the validation set for a number of epochs. When this happens, early stop allows you to roll back to the best model. Notice that early stop is also implemented as a `Keras` callback. Fill in the `??`.

_Hint_: The number of epochs can be set to a large value since training will stop automatically when there is no more progress. In this case, there is no
need to restore the best model saved because the callback will keep track of the best weights and restore them for you at the end of training.

In [None]:
early_stopping_cb = keras.callbacks.EarlyStopping(patience=??, restore_best_weights=??)

history = model.fit(??, ??,
                   validation_data=(??, ??), 
                   epochs=??, callbacks=[??])

* In neural networks there is high risk of overfitting the training data. Therefore, it is important to apply some form of regularization. A very popular one is the `dropout` which consists of dropping randomly a given percentage of nodes in some layers (during **training only**). To implement dropout using Keras, you can use the `keras.layers.Dropout` layer. Fill in the `??`.

In [None]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[??,??]),
    keras.layers.Dense(units=??, activation=??),
    keras.layers.Dense(units=??, activation=??),
    keras.layers.Dropout(rate=??),
    keras.layers.Dense(units=??,  activation=??)
])

model.compile(loss = ??,
              optimizer = ??,
              metrics= [??])

In [None]:
history = model.fit(??, ??,
                   validation_data=(??, ??), 
                   epochs=??, callbacks=[??])

In [None]:
model.evaluate(??, ??);

#### Learning rate

* Finding a good learning rate is very important. If you set it much too high, training may diverge (as we discussed in “Gradient Descent”). If you set it too low, training will eventually converge to the optimum, but it will take a very long time. One of the easiest options is to use a constant learning rate. Fill in the `??` by setting the learning rate to 0.01.

In [None]:
optimizer = keras.optimizers.SGD(learning_rate=??)
model.compile(loss = ??,
              optimizer = optimizer,
              metrics= [??])
history = model.fit(??, ??,
                   validation_data=(??, ??), 
                   epochs=??, callbacks=[??])

## 3. Convolutional neural networks

##### This problem is not graded

In this problem we show how to create and train a convolutional neural network to the fashion MNIST dataset.

* We first have to adjust the predictor dataset by adding a third dimension. In CNN, each image is encoded as with three dimension (height, width, depth), where depth represents the color scales. Black and white images only have the scale of greys, so depth = 1. Color images, have the red, green, and blue scales (RGB), so depth = 3.

In [None]:
X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

* We begin by building the CNN architecture.

In [None]:
from functools import partial

DefaultConv2D = partial(
    keras.layers.Conv2D,
    filters = 64, # number of filters
    kernel_size = (3, 3), # height and width of each filter)
    padding = "same", # when strides = 1, input and output layers have same dimensions
    activation = "relu")

model = keras.models.Sequential([
     DefaultConv2D(filters = 64, 
                   kernel_size = (7, 7), # height and width of each filter)
                   input_shape = [28, 28, 1]),
    keras.layers.MaxPooling2D(pool_size=2),
    DefaultConv2D(filters=128),
    DefaultConv2D(filters=128),
    keras.layers.MaxPooling2D(pool_size=2),
    DefaultConv2D(filters=256),
    DefaultConv2D(filters=256),
    keras.layers.MaxPooling2D(pool_size=2),
    keras.layers.Flatten(),
    keras.layers.Dense(units=128, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=64, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=10, activation='softmax'),
])

* We compile the model by defining the loss, optimizer and metrics.

In [None]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="nadam", # alternative to SGD
              metrics=["accuracy"])

* We train the model saving the results in variable named `history`.

In [None]:
model.fit(x = X_train, 
          y = y_train,
          epochs = 10,
          validation_data = (X_valid, y_valid))

* Let us evaluate the model performance and make some predictions

In [None]:
model.evaluate(X_test, y_test);

In [None]:
indexes = np.arange(1, 11)
X_new = X_test[indexes]
y_pred = np.argmax(model.predict(X_new), axis = 1)
y_new = y_test[indexes]
print("Predicted classes:", y_pred)
print("True classes     :", y_new)

## 4. Additional topics

##### This problem is not graded

### Does `model.compile` initialize all the weights?
Notice that once you create a model with `keras.models.Sequential` for example, all the weights are initialized.
When you compile a model with `model.compile` you just set:
- a loss function,
- an optimizer,
- some metrics,
- possibly, some callbacks function.

If you have a model that you already trained and you wish, for example, to change the learning rate of the optimizer, you can just compile the model again. Nothing happens to the pre-existing weights.

### Fine-tuning neural networks
Neural networks have many hyperparameters, such as the number of hidden layers, nodes per hidden layer, learning rate, dropout rate, etc.
`Keras Tuner` is a Python package that helps you build a grid of hyperparameter and run a search across these.

In [None]:
def build_model(hp):
    '''
    Hyperparameters -> Keras compiled model
    Produces a Keras compiled model, taking one combination of parameters in hp
    '''
    
    model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(units=300, activation="relu"),
    keras.layers.Dense(units=100, activation="relu"),
    keras.layers.Dropout(rate=hp.get("dropout_rate")),
    keras.layers.Dense(units=10, activation="softmax")
    ])
    
    optimizer = keras.optimizers.SGD(learning_rate=hp.get("learning_rate"))
    model.compile(loss = "sparse_categorical_crossentropy",
                  optimizer = "sgd",
                  metrics= ["accuracy"])
    
    return model
    

In [None]:
from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters

In [None]:
hp = HyperParameters()
hp.Choice('learning_rate', [1e-1, 1e-3])
hp.Choice('dropout_rate', [0.0, 0.2]);

In [None]:
tuner = RandomSearch(hypermodel = build_model,
                     hyperparameters=hp,
                     max_trials=5, 
                     objective="val_accuracy", 
                     allow_new_entries=False)

In [None]:
tuner.search(x=X_train,
             y=y_train,
             epochs=5,
             validation_data=(X_valid, y_valid))

In [None]:
# Show a summary of the search
tuner.results_summary()

# Retrieve the best model.
best_model = tuner.get_best_models(num_models=1)[0]

# Evaluate the best model.
loss, accuracy = best_model.evaluate(X_test, y_test)

In [None]:
# Refit with the best hyperparameters
history = best_model.fit(X_train, y_train, 
                         validation_data=(X_valid, y_valid), 
                         epochs=30, callbacks=[early_stopping_cb])

In [None]:
# Evaluate model
best_model.evaluate(X_test, y_test);