<a href="https://colab.research.google.com/github/Salvoaf/labDeepLearning/blob/main/09_Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Hyperparameter Optimization

*   Adapted from [this tutorial](https://www.tensorflow.org/tutorials/keras/keras_tuner?hl=uk)



## Overview

The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. The process of selecting the right set of hyperparameters for your machine learning (ML) application is called *hyperparameter tuning* or *hypertuning*.

Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remain constant over the training process and directly impact the performance of your ML program. Hyperparameters are of two types:
1. **Model hyperparameters** which influence model selection such as the number and width of hidden layers
2. **Algorithm hyperparameters** which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier

In this tutorial, you will use the Keras Tuner to perform hypertuning for an image classification application.

In most of the examples we have seen so far, the behaviour of our classification model may be strongly influenced by a number of hyperparameters.

*Trivial* solutions for hyperparameters tuning:
*   experience with (and solid understanding of) the underlying algorithms;
*   exploitation of the guidance of literature;
*   exploitation of large-scale computational resources.

When we have just to tune a **small number of hyperparameters**, two viable approaches are **grid search** and **random search**. 
Which one is better (i.e. more effective/efficient)? 


It is quite common that the hyperparameter optimization functions have a *low effective dimensionality*, i.e. they are more sensitive to changes in some dimensions than others. If we are aware of the most influential dimensions, we can desing an appropriate grid search. Otherwise, it is better to resort to random search.

Consider the following example. We have a granted budget of 9 trials to explore a bidimensional hyperparameter space (e.g. learning rate and dropout rate of a layer).

![Grid vs random](https://miro.medium.com/proxy/1*ZTlQm_WRcrNqL-nLnx6GJA.png)

The **grid search** provides an even coverage of the original 2D space, but
projections on the two dimensions produces a poor and inefficient coverage of the 1D subspaces. 

Conversely, **random search** provides a less evenly distributed coverage of the original space, but obtains a far more detailed insight on both the subspaces.

More formally: the function $f$ to optimize can be approximated as $g$ due to low effective dimensionality:

$$f(x, y) = g(x) + h(y) ≈ g(x)$$

- **Grid search** tests $g$, represented as green plot in the above figure, for only three different values of $x$. 
- **Random search**, instead, tests distinct values in all nine trials.

The increasing attention gained by ML/DL has fostered the development of other, more sophisticated, **algorithmic approaches** for hyperparameter tuning, including Bayesian optimization approaches and gradient based approaches. In this notebook we will just provide the key concepts and leverage them in a practical example since they have been made available in ad-hoc python libraries.
A recent example is the [keras tuner](https://keras-team.github.io/keras-tuner/) project.

Keras tuner includes four classes of tuners:
- [BayesianOptimization](https://distill.pub/2020/bayesian-optimization/)
- [Hyperband](https://arxiv.org/pdf/1603.06560)
- RandomSearch
- Sklearn

You can also use two pre-defined [HyperModel](https://keras.io/api/keras_tuner/hypermodels/) classes
- [HyperXception](https://keras.io/api/keras_tuner/hypermodels/hyper_xception/)
- [HyperResNet](https://keras.io/api/keras_tuner/hypermodels/hyper_resnet/)

for computer vision applications.

# Hyperoptimization in practice

## Setup

In [None]:
import os
import tensorflow as tf
from tensorflow import keras
import datetime

Install and import the Keras Tuner.

In [None]:
!pip install -q -U keras-tuner

[K     |████████████████████████████████| 135 kB 20.5 MB/s 
[K     |████████████████████████████████| 1.6 MB 21.7 MB/s 
[?25h

In [None]:
import keras_tuner as kt

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
# Create permanent folder in Google Drive
my_dir = "/content/drive/My Drive/[AIDE] 2022-2023 - Data/Tensorboard"
if not os.path.exists(my_dir):
  os.makedirs(my_dir)
log_dir = my_dir + "/logs/" + datetime.datetime.now().strftime("%m%d-%H%M")

## Download and prepare the dataset

In this tutorial, you will use the Keras Tuner to find the best hyperparameters for a machine learning model that classifies images of clothing from the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).

Load the data.

In [None]:
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [None]:
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

## Define the model

When you build a model for hypertuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hypertuning is called a *hypermodel*.

You can define a hypermodel through two approaches:

* By using a model builder function
* By subclassing the `HyperModel` class of the Keras Tuner API

In this tutorial, you use a model builder function to define the image classification model. The model builder function returns a compiled model and uses hyperparameters you define inline to hypertune the model.

In [None]:
def model_builder(hp):
  model = keras.Sequential()
  model.add(keras.layers.Flatten(input_shape=(28, 28)))

  # Tune the number of units in the first Dense layer
  # Choose an optimal value between 32-512
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(keras.layers.Dense(10))

  # Tune the learning rate for the optimizer
  # Choose an optimal value from 0.01, 0.001, or 0.0001
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

## Instantiate the tuner and perform hypertuning

Instantiate the tuner to perform the hypertuning. The Keras Tuner has four tuners available - `RandomSearch`, `Hyperband`, `BayesianOptimization`, and `Sklearn`. In this tutorial, you use the `Hyperband` tuner.

To instantiate the Hyperband tuner, you must specify the hypermodel, the `objective` to optimize and the maximum number of epochs to train (`max_epochs`).

In [None]:
tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     overwrite=True,
                     max_epochs=10,
                     factor=3,
                     directory=log_dir,
                     project_name='09_kerasTuner')

The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing 1 + log<sub>`factor`</sub>(`max_epochs`) and rounding it up to the nearest integer.

Create a callback to stop training early after reaching a certain value for the validation loss.

In [None]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

Run the hyperparameter search. The arguments for the search method are the same as those used for `tf.keras.model.fit` in addition to the callback above.

In [None]:
tensorboard = keras.callbacks.TensorBoard(log_dir)
tuner.search(img_train, label_train, epochs=50, validation_split=0.2, callbacks=[stop_early, tensorboard])

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"The hyperparameter search is complete. \
        The optimal number of units in the first densely-connected layer is {best_hps.get('units')} and \
        the optimal learning rate for the optimizer is {best_hps.get('learning_rate')}.")

Trial 30 Complete [00h 00m 44s]
val_accuracy: 0.89041668176651

Best val_accuracy So Far: 0.89041668176651
Total elapsed time: 00h 11m 44s
The hyperparameter search is complete.         The optimal number of units in the first densely-connected layer is 128 and         the optimal learning rate for the optimizer is 0.001.


 The following two commands will show you the TensorBoard inside Colab.

 
[TensorBoard](https://www.tensorflow.org/tensorboard/) is the TensorFlow visualization toolkit and provides the visualization and tooling needed for machine learning experimentation:
*   Tracking and visualizing metrics such as loss and accuracy
*   Visualizing the model graph (ops and layers)
*   Viewing histograms of weights, biases, or other tensors as they change over time
*   *Projecting embeddings to a lower dimensional space*
*    Displaying images, text, and audio data
*    Profiling TensorFlow programs


In [None]:
import tensorboard
%load_ext tensorboard
%tensorboard --logdir "$log_dir"

## Train the model

Find the optimal number of epochs to train the model with the hyperparameters obtained from the search.

In [None]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Best epoch: 27


Re-instantiate the hypermodel and train it with the optimal number of epochs from above.

In [None]:
hypermodel = tuner.hypermodel.build(best_hps)

# Retrain the model
hypermodel.fit(img_train, label_train, epochs=best_epoch, validation_split=0.2)

Epoch 1/27
Epoch 2/27
Epoch 3/27
Epoch 4/27
Epoch 5/27
Epoch 6/27
Epoch 7/27
Epoch 8/27
Epoch 9/27
Epoch 10/27
Epoch 11/27
Epoch 12/27
Epoch 13/27
Epoch 14/27
Epoch 15/27
Epoch 16/27
Epoch 17/27
Epoch 18/27
Epoch 19/27
Epoch 20/27
Epoch 21/27
Epoch 22/27
Epoch 23/27
Epoch 24/27
Epoch 25/27
Epoch 26/27
Epoch 27/27


<keras.callbacks.History at 0x7f4b5a1358d0>

To finish this tutorial, evaluate the hypermodel on the test data.

In [None]:
eval_result = hypermodel.evaluate(img_test, label_test)
print("[test loss, test accuracy]:", eval_result)

[test loss, test accuracy]: [0.4060742259025574, 0.8862000107765198]


The `log_dir` directory contains detailed logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If you re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional `overwrite=True` argument while instantiating the tuner.

## Summary

In this tutorial, you learned how to use the Keras Tuner to tune hyperparameters for a model. To learn more about the Keras Tuner, check out these additional resources:

* [Keras Tuner on the TensorFlow blog](https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html)
* [Keras Tuner website](https://keras-team.github.io/keras-tuner/)

Also check out the [HParams Dashboard](https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams) in TensorBoard to interactively tune your model hyperparameters.