# KerasTuner demo

`KerasTuner` is a general purpose hyperparameter tuning library. It is integrated with Keras worflows but it is not limited to them. It can be used to tune `scikit-learn` models.

## Define the search space

In the following code example, we deinfe a Keras model with two `Dense` layers. We want to tune the number of units in the first `Dense` layer. We just define an integer hyperparameter with `hp.Int('units', min_value=32, max_value=512, step=32)` whose range is from `32` to `512`inclusive. When sampling from it, the minimum step for walking through the interval is `32`.

In [1]:
import keras
from keras import layers

def build_model_with_two_dense_layers(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(
        layers.Dense(
            # define the hyperparameter
            units=hp.Int("unit", min_value=32, max_value=512, step=32),
            activation="relu",
        )
    )
    model.add(layers.Dense(10, activation="softmax"))
    model.compile(
        optimizer="adam",
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model
    

2024-06-02 15:09:48.363951: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Let us test if the model was built successfully

In [2]:
import keras_tuner

build_model_with_two_dense_layers(keras_tuner.HyperParameters())

<Sequential name=sequential, built=False>

There are many other types of hyperparameters as well. We can define multiple hyperparameters in the function. In the following code, we tune whether to use a `Dropout` layer with `hp.Boolean()`, tine which activation function to use with `hp.Choice`, tune the learning rate of the optimizer with `hp.Float()`.

In [3]:
def build_model_with_two_dense_layers_and_dropout(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(
        layers.Dense(
            # Tune number of units
            units=hp.Int("units", min_value=32, max_value=512, step=32),
            # Tune the activation function to use
            activation=hp.Choice("activation", ["relu", "tanh"]),
        )
    )
    # Tune whether to use dropout.
    if hp.Boolean("dropout"):
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    # Define the optimizer learning rate as a hyperparameter
    learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model

build_model_with_two_dense_layers_and_dropout(keras_tuner.HyperParameters())

<Sequential name=sequential_1, built=False>

As shown below, the hyperparameters are actual values. In fact, they are just functions returning actual values. For example, `hp.Int()` returns an `int` value. Therefore, you can put them into variables, `for` loops, or `if` conditions. 

In [4]:
hp = keras_tuner.HyperParameters()
print(hp.Int("units", min_value=32, max_value=512, step=32))

32


The hyperparameters can be defined in advance and keep the Keras model code in a separate function as shown below

In [5]:
def call_existing_code(units, activation, dropout, lr):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(layers.Dense(units=units, activation=activation))
    if dropout:
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model

def build_model(hp):
    units = hp.Int("units", min_value=32, max_value=512, step=32)
    activation = hp.Choice("activation", ["relu", "tanh"])
    dropout = hp.Boolean("dropout")
    lr = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    # call existing model-building code with the hyperparameter values
    model = call_existing_code(
        units=units, activation=activation, dropout=dropout, lr=lr
    )
    return model

build_model(keras_tuner.HyperParameters())

<Sequential name=sequential_2, built=False>

Each of the hyperparameters is uniquely identified by its name - the first argument of the corresponding hp type.
To tune the number of units in different `Dense` layers separately as different hyperparameters, we give them different names as `f"units_{i}"`.

Notably, this is also an example of creating conditional hyperparameters. There are many hyperparameters specifying the number of units in the `Dense` layers. The number of such hyperparameters is decided by the number of layers, which is also a hyperparameter.  Therefore, the total number of hyperparameters used may be different from trial to trial. Some hyperparameter is only used when a certain condition is satisfied. For example, `units_3` is only used when `num_layers` is larger than 3. With `KerasTuner`, one can easily define such hyperparameters dynamically while creating the model.

In [6]:
def build_model_conditional_hpars(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    # tune the number of layers
    for i in range(hp.Int("num_layers", 1, 3)):
        model.add(
            layers.Dense(
                # tune number of units separately
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation=hp.Choice("activation", ["relu", "tanh"]),
            )
        )
    if hp.Boolean("dropout"):
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model
            
build_model_conditional_hpars(keras_tuner.HyperParameters())

<Sequential name=sequential_3, built=False>

## Start the search

After defining the search space, we need to select a tuner class to run the search. The available algorithms are `RandomSearch`, `BayesianOptimization`, and `Hyperband`. Here we use `RandomSearch` as an example.

To initialize the tuner we need to specify several arguments in the initializer.

* `hypermodel`: the model building function which is `build_model` in our case
* `objective`: the name of the objective to optimize. Note that whether to minimize or maximize is automatically inferred for built-in metrics.
* `max_trials`: the total number of trials to run during the search
* `executions_per_trial`: the number of models that should be built and fit for each trial. Different trialshave different hyperparameter values. The executions within the same trial have the same hyperparameter values. The purpose of having multiple executions per trial is to reduce results variance and therefore be able to more accurately assess the performance of a model. If we want to get results faster we could set `executions_per_trial=1` (single round of training for each model configuration)
* `overwrite`: control whether to overwrite the previous results in the same directory or resume the previous search instead. Here we set `overwrite=True` to start a new search and ignore any previous results.
* `directory`: a path to a directory for storing the search results.
* `project_name`: the name of the sub-directory in the `directory`.

In [7]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model_conditional_hpars,
    objective="val_accuracy",
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="my_dir",
    project_name="helloworld",
)

Printing summary of the search space:

In [8]:
tuner.search_space_summary()

Search space summary
Default search space size: 5
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh'], 'ordered': False}
dropout (Boolean)
{'default': False, 'conditions': []}
lr (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


## Preparation of the MNIST dataset for the search

In [9]:
import keras
import numpy as np

(x, y), (x_test, y_test) = keras.datasets.mnist.load_data()


x_train = x[:-10000]
x_val = x[-10000:]
y_train = y[:-10000]
y_val = y[-10000:]

x_train = np.expand_dims(x_train, -1).astype("float32") / 255.0
x_val = np.expand_dims(x_val, -1).astype("float32") / 255.0
x_test = np.expand_dims(x_test, -1).astype("float32") / 255.0

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Then , start the search for the best hyperparameter configuration. All the arguments passed to `search` is passed to `model.fit()` in each execution. Pass `validation_data` to evaluate the model.

In [10]:
print(x_val.shape)
print(y_val.shape)
print(x.shape)
print(y.shape)

(10000, 28, 28, 1)
(10000, 10)
(60000, 28, 28)
(60000,)


In [11]:
tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 3 Complete [00h 00m 35s]
val_accuracy: 0.9536499977111816

Best val_accuracy So Far: 0.9613499939441681
Total elapsed time: 00h 01m 07s


During the `search`, the model-building function is called with different hyperparameter values in different trial. In each trial, the tuner would generate a new set of hyperparameter values to build the model. The model then is fit and evaluated. The metrics are recorded. The tuner progressively explores the space and finally finds a good set of hyperparameter values.

## Query the results

When the search is over, you can retrieve the best model(s). The model is saved at its best performing epoch evaluated on the `validation_data`.

In [12]:
# get the top 2 models.
models = tuner.get_best_models(num_models=2)
best_model = models[0]
best_model.summary()

  saveable.load_own_variables(weights_store.get(inner_path))
  saveable.load_own_variables(weights_store.get(inner_path))


In [13]:
tuner.results_summary()

Results summary
Results in my_dir/helloworld
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 1 summary
Hyperparameters:
num_layers: 2
units_0: 352
activation: tanh
dropout: True
lr: 0.0006361349306744992
units_1: 160
units_2: 320
Score: 0.9613499939441681

Trial 2 summary
Hyperparameters:
num_layers: 3
units_0: 96
activation: tanh
dropout: False
lr: 0.0002752444553894322
units_1: 384
units_2: 448
Score: 0.9536499977111816

Trial 0 summary
Hyperparameters:
num_layers: 3
units_0: 32
activation: tanh
dropout: False
lr: 0.008152371271105028
units_1: 32
units_2: 32
Score: 0.9381499886512756
