# Getting started with KerasTuner

**Authors:** Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin<br>
**Date created:** 2019/05/31<br>
**Last modified:** 2021/10/27<br>
**Description:** The basics of using KerasTuner to tune model hyperparameters.

In [1]:
!pip install keras-tuner -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m122.9/129.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m122.9/129.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m122.9/129.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m923.3 kB/s[0m eta [36m0:00:00[0m
[?25h

## Introduction

KerasTuner is a general-purpose hyperparameter tuning library. It has strong
integration with Keras workflows, but it isn't limited to them: you could use
it to tune scikit-learn models, or anything else. In this tutorial, you will
see how to tune model architecture, training process, and data preprocessing
steps with KerasTuner. Let's start from a simple example.

## Tune the model architecture

The first thing we need to do is writing a function, which returns a compiled
Keras model. It takes an argument `hp` for defining the hyperparameters while
building the model.

### Define the search space

In the following code example, we define a Keras model with two `Dense` layers.
We want to tune the number of units in the first `Dense` layer. We just define
an integer hyperparameter with `hp.Int('units', min_value=32, max_value=512, step=32)`,
whose range is from 32 to 512 inclusive. When sampling from it, the minimum
step for walking through the interval is 32.

In [2]:
import keras
from keras import layers


def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(
        layers.Dense(
            # Define the hyperparameter.
            units=hp.Int("units", min_value=32, max_value=512, step=32),
            activation="relu",
        )
    )
    model.add(layers.Dense(10, activation="softmax"))
    model.compile(
        optimizer="adam",
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model


You can quickly test if the model builds successfully.

In [3]:
import keras_tuner

build_model(keras_tuner.HyperParameters())

<Sequential name=sequential, built=False>

There are many other types of hyperparameters as well. We can define multiple
hyperparameters in the function. In the following code, we tune whether to
use a `Dropout` layer with `hp.Boolean()`, tune which activation function to
use with `hp.Choice()`, tune the learning rate of the optimizer with
`hp.Float()`.

In [4]:

def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(
        layers.Dense(
            # Tune number of units.
            units=hp.Int("units", min_value=32, max_value=512, step=32),
            # Tune the activation function to use.
            activation=hp.Choice("activation", ["relu", "tanh"]),
        )
    )
    # Tune whether to use dropout.
    if hp.Boolean("dropout"):
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    # Define the optimizer learning rate as a hyperparameter.
    learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model


build_model(keras_tuner.HyperParameters())

<Sequential name=sequential_1, built=False>

As shown below, the hyperparameters are actual values. In fact, they are just
functions returning actual values. For example, `hp.Int()` returns an `int`
value. Therefore, you can put them into variables, for loops, or if
conditions.

In [5]:
hp = keras_tuner.HyperParameters()
print(hp.Int("units", min_value=32, max_value=512, step=32))

32


You can also define the hyperparameters in advance and keep your Keras code in
a separate function.

In [6]:

def call_existing_code(units, activation, dropout, lr):
    model = keras.Sequential()
    model.add(layers.Flatten())
    model.add(layers.Dense(units=units, activation=activation))
    if dropout:
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model


def build_model(hp):
    units = hp.Int("units", min_value=32, max_value=512, step=32)
    activation = hp.Choice("activation", ["relu", "tanh"])
    dropout = hp.Boolean("dropout")
    lr = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    # call existing model-building code with the hyperparameter values.
    model = call_existing_code(
        units=units, activation=activation, dropout=dropout, lr=lr
    )
    return model


build_model(keras_tuner.HyperParameters())

<Sequential name=sequential_2, built=False>

Each of the hyperparameters is uniquely identified by its name (the first
argument). To tune the number of units in different `Dense` layers separately
as different hyperparameters, we give them different names as `f"units_{i}"`.

Notably, this is also an example of creating conditional hyperparameters.
There are many hyperparameters specifying the number of units in the `Dense`
layers. The number of such hyperparameters is decided by the number of layers,
which is also a hyperparameter. Therefore, the total number of hyperparameters
used may be different from trial to trial. Some hyperparameter is only used
when a certain condition is satisfied. For example, `units_3` is only used
when `num_layers` is larger than 3. With KerasTuner, you can easily define
such hyperparameters dynamically while creating the model.

In [7]:

def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Flatten())
    # Tune the number of layers.
    for i in range(hp.Int("num_layers", 1, 3)):
        model.add(
            layers.Dense(
                # Tune number of units separately.
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation=hp.Choice("activation", ["relu", "tanh"]),
            )
        )
    if hp.Boolean("dropout"):
        model.add(layers.Dropout(rate=0.25))
    model.add(layers.Dense(10, activation="softmax"))
    learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model


build_model(keras_tuner.HyperParameters())

<Sequential name=sequential_3, built=False>

### Start the search

After defining the search space, we need to select a tuner class to run the
search. You may choose from `RandomSearch`, `BayesianOptimization` and
`Hyperband`, which correspond to different tuning algorithms. Here we use
`RandomSearch` as an example.

To initialize the tuner, we need to specify several arguments in the initializer.

* `hypermodel`. The model-building function, which is `build_model` in our case.
* `objective`. The name of the objective to optimize (whether to minimize or
maximize is automatically inferred for built-in metrics). We will introduce how
to use custom metrics later in this tutorial.
* `max_trials`. The total number of trials to run during the search.
* `executions_per_trial`. The number of models that should be built and fit for
each trial. Different trials have different hyperparameter values. The
executions within the same trial have the same hyperparameter values. The
purpose of having multiple executions per trial is to reduce results variance
and therefore be able to more accurately assess the performance of a model. If
you want to get results faster, you could set `executions_per_trial=1` (single
round of training for each model configuration).
* `overwrite`. Control whether to overwrite the previous results in the same
directory or resume the previous search instead. Here we set `overwrite=True`
to start a new search and ignore any previous results.
* `directory`. A path to a directory for storing the search results.
* `project_name`. The name of the sub-directory in the `directory`.

In [8]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_accuracy",
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="my_dir",
    project_name="helloworld",
)

You can print a summary of the search space:

In [9]:
tuner.search_space_summary()

Search space summary
Default search space size: 5
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh'], 'ordered': False}
dropout (Boolean)
{'default': False, 'conditions': []}
lr (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


Before starting the search, let's prepare the MNIST dataset.

In [10]:
import keras
import numpy as np

(x, y), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x[:-10000]
x_val = x[-10000:]
y_train = y[:-10000]
y_val = y[-10000:]

x_train = np.expand_dims(x_train, -1).astype("float32") / 255.0
x_val = np.expand_dims(x_val, -1).astype("float32") / 255.0
x_test = np.expand_dims(x_test, -1).astype("float32") / 255.0

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


Then, start the search for the best hyperparameter configuration.
All the arguments passed to `search` is passed to `model.fit()` in each
execution. Remember to pass `validation_data` to evaluate the model.

In [11]:
tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 3 Complete [00h 00m 19s]
val_accuracy: 0.9569500088691711

Best val_accuracy So Far: 0.9695999920368195
Total elapsed time: 00h 02m 50s


During the `search`, the model-building function is called with different
hyperparameter values in different trial. In each trial, the tuner would
generate a new set of hyperparameter values to build the model. The model is
then fit and evaluated. The metrics are recorded. The tuner progressively
explores the space and finally finds a good set of hyperparameter values.

### Query the results

When search is over, you can retrieve the best model(s). The model is saved at
its best performing epoch evaluated on the `validation_data`.

In [14]:
# Get the top 2 models.
models = tuner.get_best_models(num_models=2)
best_model = models[0]
best_model.summary()

You can also print a summary of the search results.

In [15]:
tuner.results_summary()

Results summary
Results in my_dir/helloworld
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 0 summary
Hyperparameters:
num_layers: 2
units_0: 512
activation: relu
dropout: True
lr: 0.0004404944375164228
units_1: 32
Score: 0.9695999920368195

Trial 1 summary
Hyperparameters:
num_layers: 3
units_0: 416
activation: relu
dropout: True
lr: 0.0031495380852312486
units_1: 96
units_2: 32
Score: 0.9656000137329102

Trial 2 summary
Hyperparameters:
num_layers: 1
units_0: 32
activation: relu
dropout: False
lr: 0.0017010977253402633
units_1: 96
units_2: 192
Score: 0.9569500088691711


You will find detailed logs, checkpoints, etc, in the folder
`my_dir/helloworld`, i.e. `directory/project_name`.

You can also visualize the tuning results using TensorBoard and HParams plugin.
For more information, please following
[this link](https://keras.io/guides/keras_tuner/visualize_tuning/).

### Retrain the model

If you want to train the model with the entire dataset, you may retrieve the
best hyperparameters and retrain the model by yourself.

In [16]:
# Get the top 2 hyperparameters.
best_hps = tuner.get_best_hyperparameters(5)
# Build the model with the best hp.
model = build_model(best_hps[0])
# Fit with the entire dataset.
x_all = np.concatenate((x_train, x_val))
y_all = np.concatenate((y_train, y_val))
model.fit(x=x_all, y=y_all, epochs=1)

[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 6ms/step - accuracy: 0.8241 - loss: 0.5920


<keras.src.callbacks.history.History at 0x7951b35e9d50>

## Tune model training

To tune the model building process, we need to subclass the `HyperModel` class,
which also makes it easy to share and reuse hypermodels.

We need to override `HyperModel.build()` and `HyperModel.fit()` to tune the
model building and training process respectively. A `HyperModel.build()`
method is the same as the model-building function, which creates a Keras model
using the hyperparameters and returns it.

In `HyperModel.fit()`, you can access the model returned by
`HyperModel.build()`,`hp` and all the arguments passed to `search()`. You need
to train the model and return the training history.

In the following code, we will tune the `shuffle` argument in `model.fit()`.

It is generally not needed to tune the number of epochs because a built-in
callback is passed to `model.fit()` to save the model at its best epoch
evaluated by the `validation_data`.

> **Note**: The `**kwargs` should always be passed to `model.fit()` because it
contains the callbacks for model saving and tensorboard plugins.

In [17]:

class MyHyperModel(keras_tuner.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam",
            loss="categorical_crossentropy",
            metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            # Tune whether to shuffle the data in each epoch.
            shuffle=hp.Boolean("shuffle"),
            **kwargs,
        )


Again, we can do a quick check to see if the code works correctly.

In [18]:
hp = keras_tuner.HyperParameters()
hypermodel = MyHyperModel()
model = hypermodel.build(hp)
hypermodel.fit(hp, model, np.random.rand(100, 28, 28), np.random.rand(100, 10))

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.0799 - loss: 12.8186


<keras.src.callbacks.history.History at 0x7951b3a60a10>

## Tune data preprocessing

To tune data preprocessing, we just add an additional step in
`HyperModel.fit()`, where we can access the dataset from the arguments. In the
following code, we tune whether to normalize the data before training the
model. This time we explicitly put `x` and `y` in the function signature
because we need to use them.

In [19]:

class MyHyperModel(keras_tuner.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam",
            loss="categorical_crossentropy",
            metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, x, y, **kwargs):
        if hp.Boolean("normalize"):
            x = layers.Normalization()(x)
        return model.fit(
            x,
            y,
            # Tune whether to shuffle the data in each epoch.
            shuffle=hp.Boolean("shuffle"),
            **kwargs,
        )


hp = keras_tuner.HyperParameters()
hypermodel = MyHyperModel()
model = hypermodel.build(hp)
hypermodel.fit(hp, model, np.random.rand(100, 28, 28), np.random.rand(100, 10))

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.0832 - loss: 12.5962


<keras.src.callbacks.history.History at 0x7951b3b8bc90>

If a hyperparameter is used both in `build()` and `fit()`, you can define it in
`build()` and use `hp.get(hp_name)` to retrieve it in `fit()`. We use the
image size as an example. It is both used as the input shape in `build()`, and
used by data prerprocessing step to crop the images in `fit()`.

In [20]:

class MyHyperModel(keras_tuner.HyperModel):
    def build(self, hp):
        image_size = hp.Int("image_size", 10, 28)
        inputs = keras.Input(shape=(image_size, image_size))
        outputs = layers.Flatten()(inputs)
        outputs = layers.Dense(
            units=hp.Int("units", min_value=32, max_value=512, step=32),
            activation="relu",
        )(outputs)
        outputs = layers.Dense(10, activation="softmax")(outputs)
        model = keras.Model(inputs, outputs)
        model.compile(
            optimizer="adam",
            loss="categorical_crossentropy",
            metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, x, y, validation_data=None, **kwargs):
        if hp.Boolean("normalize"):
            x = layers.Normalization()(x)
        image_size = hp.get("image_size")
        cropped_x = x[:, :image_size, :image_size, :]
        if validation_data:
            x_val, y_val = validation_data
            cropped_x_val = x_val[:, :image_size, :image_size, :]
            validation_data = (cropped_x_val, y_val)
        return model.fit(
            cropped_x,
            y,
            # Tune whether to shuffle the data in each epoch.
            shuffle=hp.Boolean("shuffle"),
            validation_data=validation_data,
            **kwargs,
        )


tuner = keras_tuner.RandomSearch(
    MyHyperModel(),
    objective="val_accuracy",
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="tune_hypermodel",
)

tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 3 Complete [00h 00m 22s]
val_accuracy: 0.972599983215332

Best val_accuracy So Far: 0.972599983215332
Total elapsed time: 00h 01m 03s


### Retrain the model

Using `HyperModel` also allows you to retrain the best model by yourself.

In [21]:
hypermodel = MyHyperModel()
best_hp = tuner.get_best_hyperparameters()[0]
model = hypermodel.build(best_hp)
hypermodel.fit(best_hp, model, x_all, y_all, epochs=1)

[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.8956 - loss: 0.3671


<keras.src.callbacks.history.History at 0x7951b2471290>

## Specify the tuning objective

In all previous examples, we all just used validation accuracy
(`"val_accuracy"`) as the tuning objective to select the best model. Actually,
you can use any metric as the objective. The most commonly used metric is
`"val_loss"`, which is the validation loss.

### Built-in metric as the objective

There are many other built-in metrics in Keras you can use as the objective.
Here is [a list of the built-in metrics](https://keras.io/api/metrics/).

To use a built-in metric as the objective, you need to follow these steps:

* Compile the model with the the built-in metric. For example, you want to use
`MeanAbsoluteError()`. You need to compile the model with
`metrics=[MeanAbsoluteError()]`. You may also use its name string instead:
`metrics=["mean_absolute_error"]`. The name string of the metric is always
the snake case of the class name.

* Identify the objective name string. The name string of the objective is
always in the format of `f"val_{metric_name_string}"`. For example, the
objective name string of mean squared error evaluated on the validation data
should be `"val_mean_absolute_error"`.

* Wrap it into `keras_tuner.Objective`. We usually need to wrap the objective
into a `keras_tuner.Objective` object to specify the direction to optimize the
objective. For example, we want to minimize the mean squared error, we can use
`keras_tuner.Objective("val_mean_absolute_error", "min")`. The direction should
be either `"min"` or `"max"`.

* Pass the wrapped objective to the tuner.

You can see the following barebone code example.

### Example with metrics=["mean_absolute_error"]

In [22]:

def build_regressor(hp):
    model = keras.Sequential(
        [
            layers.Dense(units=hp.Int("units", 32, 128, 32), activation="relu"),
            layers.Dense(units=1),
        ]
    )
    model.compile(
        optimizer="adam",
        loss="mean_squared_error",
        # Objective is one of the metrics.
        metrics=[keras.metrics.MeanAbsoluteError()],
    )
    return model


tuner = keras_tuner.RandomSearch(
    hypermodel=build_regressor,
    # The objective name and direction.
    # Name is the f"val_{snake_case_metric_class_name}".
    objective=keras_tuner.Objective("val_mean_absolute_error", direction="min"),
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="built_in_metrics",
)

tuner.search(
    x=np.random.rand(100, 10),
    y=np.random.rand(100, 1),
    validation_data=(np.random.rand(20, 10), np.random.rand(20, 1)),
)

tuner.results_summary()

Trial 3 Complete [00h 00m 02s]
val_mean_absolute_error: 0.4522615373134613

Best val_mean_absolute_error So Far: 0.38685232400894165
Total elapsed time: 00h 00m 06s
Results summary
Results in my_dir/built_in_metrics
Showing 10 best trials
Objective(name="val_mean_absolute_error", direction="min")

Trial 0 summary
Hyperparameters:
units: 32
Score: 0.38685232400894165

Trial 2 summary
Hyperparameters:
units: 96
Score: 0.4522615373134613

Trial 1 summary
Hyperparameters:
units: 64
Score: 0.5142424702644348


### Example with metrics=["root_mean_squared_error"]

In [31]:

def build_regressor(hp):
    model = keras.Sequential(
        [
            layers.Dense(units=hp.Int("units", 32, 128, 32), activation="relu"),
            layers.Dense(units=1),
        ]
    )
    model.compile(
        optimizer="adam",
        loss="mean_squared_error",
        # Objective is one of the metrics.
        metrics=[keras.metrics.RootMeanSquaredError()],
    )
    return model


tuner = keras_tuner.RandomSearch(
    hypermodel=build_regressor,
    # The objective name and direction.
    # Name is the f"val_{snake_case_metric_class_name}".
    objective=keras_tuner.Objective("val_root_mean_squared_error", direction="min"),
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="built_in_metrics",
)

tuner.search(
    x=np.random.rand(100, 10),
    y=np.random.rand(100, 1),
    validation_data=(np.random.rand(20, 10), np.random.rand(20, 1)),
)

tuner.results_summary()

Trial 3 Complete [00h 00m 03s]
val_root_mean_squared_error: 0.566635251045227

Best val_root_mean_squared_error So Far: 0.452985018491745
Total elapsed time: 00h 00m 08s
Results summary
Results in my_dir/built_in_metrics
Showing 10 best trials
Objective(name="val_root_mean_squared_error", direction="min")

Trial 0 summary
Hyperparameters:
units: 96
Score: 0.452985018491745

Trial 1 summary
Hyperparameters:
units: 128
Score: 0.463850200176239

Trial 2 summary
Hyperparameters:
units: 64
Score: 0.566635251045227


## KerasTuner includes pre-made tunable applications: HyperResNet and HyperXception

These are ready-to-use hypermodels for computer vision.

They come pre-compiled with `loss="categorical_crossentropy"` and
`metrics=["accuracy"]`.

In [29]:
from keras_tuner.applications import HyperResNet

hypermodel = HyperResNet(input_shape=(28, 28, 1), classes=10)

tuner = keras_tuner.RandomSearch(
    hypermodel,
    objective="val_accuracy",
    max_trials=2,
    overwrite=True,
    directory="my_dir",
    project_name="built_in_hypermodel",
)