# Hyperparameter Tuning With KerasTuner

```{article-info}
:avatar: https://avatars.githubusercontent.com/u/25820201?v=4
:avatar-link: https://github.com/PhotonicGluon/
:author: "[Ryan Kan](https://github.com/PhotonicGluon/)"
:date: "Jul 5, 2024"
:read-time: "{sub-ref}`wordcount-minutes` min read"
```

In this example, we will explore the use of KerasTuner to tune models that use layers from Keras-MML.

:::{important}
You will need to install the KerasTuner package for this example.
:::

In [1]:
%pip install keras-tuner~=1.4.7

Note: you may need to restart the kernel to use updated packages.


:::{note}
We will use the `jax` backend for faster execution of the code. Feel free to ignore the cell below.
:::

In [2]:
import os
os.environ["KERAS_BACKEND"] = "jax"

We will perform hyperparameter tuning on a simple [multi-layer perceptron (MLP)](https://en.wikipedia.org/wiki/Multilayer_perceptron) that aims to classify handwritten digits in the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database).

Of course, other neural network architectures such as [convolutional neural networks (CNNs)](https://en.wikipedia.org/wiki/Convolutional_neural_network) are better suited for this task, but for this example we will stick with MLPs.

## Setup

First, let's define some constants relating to the data.

In [3]:
NUM_CLASSES = 10        # 10 distinct classes, 0 to 9
INPUT_SHAPE = (28, 28)  # 28 x 28 greyscale images

Load the data from the MNIST dataset, which is already available in Keras.

In [4]:
import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Now we perform some simple preprocessing.

In [5]:
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

We will further split the `x_train` and `y_train` into a training and validation set.

In [6]:
x_val = x_train[-10000:]
x_train = x_train[:-10000]

y_val = y_train[-10000:]
y_train = y_train[:-10000]

## Defining (Our Initial) Tuneable Model

To allow KerasTuner to search for the best set of hyperparameters, we need to write a function that takes in the hyperparameters and returns a *compiled* Keras model. The convention for such a function is to accept an argument `hp` for the hyperparameters when building the model.

### Defining the Search Space

In the following example, we will define a simple MLP with two `DenseMML` layers and a `Dense` layer (which acts as the classification head). Suppose we want to tune the number of units in the first `DenseMML` layer. To do so, we define an integer hyperparameter with `hp.Int("units", min_value=32, max_value=512, step=32)`. This means that the hyperparameter
- is named `units`;
- can have a minimum value of 32;
- can have a maximum value of 512; and
- can take values in intervals of 32.

In [7]:
import keras_tuner
import keras_mml


def build_model(hp: keras_tuner.HyperParameters):
    model = keras.Sequential(
        [
            keras.Input(shape=INPUT_SHAPE),
            keras.layers.Flatten(),
            keras_mml.layers.DenseMML(hp.Int("units", min_value=32, max_value=512, step=32)),
            keras_mml.layers.DenseMML(256),
            keras.layers.Dense(NUM_CLASSES, activation="softmax"),  # The last layer needs to be `Dense` for the output to work
        ]
    )
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

We can quickly check that the model indeed builds successfully.

In [8]:
build_model(keras_tuner.HyperParameters())

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.


<Sequential name=sequential, built=True>

### Starting the Search

After defining the search space, we need to select a tuner class to run the search. Here we use `RandomSearch` as an example.

We need to specify several arguments to initialize the `RandomSearch` tuner.
- **`hypermodel`**: The model-building function, which is `build_model` in this example.
- **`objective`**: The name of the objective to optimize.
  - Note that the decision whether to minimize or maximize the `objective` is automatically inferred for built-in metrics (e.g., `loss`, `acc`).
- **`max_trials`**: The total number of trials to run during the search.
- **`executions_per_trial`**: The number of models that should be built and fit for each trial.
- **`overwrite`**: Control whether to overwrite the previous results in the same directory (`True`) or resume the previous search instead (`False`).
- **`directory`**: A path to a directory for storing the search results.
- **`project_name`**: The name of the subdirectory in the `directory`.


:::{admonition} What is a "trial"?
:class: note

In order to search for the best hyperparameter values, the tuners run multiple trials where each trial will use a different hyperparameter value. Executions within the same trial have the same hyperparameter values. The reason why we want to run multiple executions per trial is to reduce variance during model training. If you want to get results faster, you could set `executions_per_trial = 1`.
:::

In [9]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_accuracy",
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="misc/hyperparameter_tuning_example",
    project_name="my_tunable_model_1",
)

Once we defined the tuner, we can print out a summary of the search space.

In [10]:
tuner.search_space_summary()

Search space summary
Default search space size: 1
units (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}


We can now start the search for the best hyperparameter configuration. All the arguments passed to search is passed to `model.fit()` in each execution.

:::{important}
Remember to pass `validation_data` to evaluate the model!
:::

In [11]:
tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 3 Complete [00h 00m 28s]
val_accuracy: 0.9167500138282776

Best val_accuracy So Far: 0.9240500032901764
Total elapsed time: 00h 01m 07s


### Querying the Results

We can now retrieve the best models from the search.

In [12]:
models = tuner.get_best_models(num_models=2)  # Gets the top 2 models
best_model = models[0]
best_model.summary()

  saveable.load_own_variables(weights_store.get(inner_path))


We can also get a summary of the search results.

In [13]:
tuner.results_summary()

Results summary
Results in misc/hyperparameter_tuning_example/my_tunable_model_1
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 1 summary
Hyperparameters:
units: 160
Score: 0.9240500032901764

Trial 0 summary
Hyperparameters:
units: 96
Score: 0.9227499961853027

Trial 2 summary
Hyperparameters:
units: 352
Score: 0.9167500138282776


### Retraining the Model

If you want to train the model with the entire dataset, you may retrieve the best hyperparameters and retrain the model by yourself.

In [14]:
# Get the top 2 hyperparameters
best_hps = tuner.get_best_hyperparameters(2)

# Build the model with the best hyperparameters
model = build_model(best_hps[0])

Combine training and validation into one big training dataset.

In [15]:
import numpy as np

x_all = np.concatenate((x_train, x_val))
y_all = np.concatenate((y_train, y_val))

Now fit the model on that set.

In [16]:
model.fit(x=x_all, y=y_all, epochs=2)

Epoch 1/2
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.8602 - loss: 0.4470
Epoch 2/2
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9142 - loss: 0.2894


<keras.src.callbacks.history.History at 0x7f53a811b4f0>

## A More Complicated Tunable Model

Now that we've seen an introduction of how KerasTuner works, let's make a more complex model.

In our new model, we make the tuner
- determine the number of hidden layers to use via the `num_layers` hyperparameter;
- determine the number of units for *each* hidden layer via each individual `units_{i}` hyperparameter;
- determine the *common* activation for all hidden layers via the `activation` hyperparameter; and
- decide whether to include 25% dropout using the `dropout` parameter.

In [17]:
def build_model_new(hp: keras_tuner.HyperParameters):
    model = keras.Sequential()
    
    # These layers are the same as the previous model
    model.add(keras.Input(shape=INPUT_SHAPE))
    model.add(keras.layers.Flatten())
    
    # Tune the number of layers
    for i in range(hp.Int("num_layers", 1, 3)):  # 1 to 3 hidden layers
        model.add(
            keras_mml.layers.DenseMML(
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation=hp.Choice("activation", ["relu", "tanh", "linear"])
            )
        )
    
    # Add dropout, if specified by the hyperparameters
    if hp.Boolean("dropout"):
        model.add(keras.layers.Dropout(rate=0.25))
    
    # Classification head
    model.add(keras.layers.Dense(NUM_CLASSES, activation="softmax"))  # The last layer needs to be `Dense` for the output to work
    
    # Compile and return the model
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

Again, we will use `RandomTuner` to find the best hyperparameters. However we will increase the number of trials to run to 5.

In [18]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model_new,
    objective="val_accuracy",
    max_trials=5,
    executions_per_trial=2,
    overwrite=True,
    directory="misc/hyperparameter_tuning_example",
    project_name="my_tunable_model_2",
)

Let's look at the search space now.

In [19]:
tuner.search_space_summary()

Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh', 'linear'], 'ordered': False}
dropout (Boolean)
{'default': False, 'conditions': []}


Start the search.

In [20]:
tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 5 Complete [00h 00m 33s]
val_accuracy: 0.9699999988079071

Best val_accuracy So Far: 0.9699999988079071
Total elapsed time: 00h 02m 13s


Get the best model...

In [21]:
models = tuner.get_best_models(num_models=1)  # Even when `num_models` is 1, `models` returns a list...
best_model = models[0]                        # ...so we still have to do this
best_model.summary()

  saveable.load_own_variables(weights_store.get(inner_path))


...and a summary of the results.

In [22]:
tuner.results_summary()

Results summary
Results in misc/hyperparameter_tuning_example/my_tunable_model_2
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 4 summary
Hyperparameters:
num_layers: 2
units_0: 448
activation: relu
dropout: True
units_1: 128
Score: 0.9699999988079071

Trial 3 summary
Hyperparameters:
num_layers: 2
units_0: 384
activation: relu
dropout: True
units_1: 352
Score: 0.9695000052452087

Trial 2 summary
Hyperparameters:
num_layers: 2
units_0: 224
activation: tanh
dropout: False
units_1: 448
Score: 0.9628500044345856

Trial 0 summary
Hyperparameters:
num_layers: 2
units_0: 256
activation: relu
dropout: False
units_1: 32
Score: 0.9627000093460083

Trial 1 summary
Hyperparameters:
num_layers: 1
units_0: 64
activation: tanh
dropout: True
units_1: 224
Score: 0.9449999928474426


## Conclusion

In this code example, we showed how KerasTuner can be used with Keras-MML layers for hyperparameter tuning.