# Automatic Hyperparameter Tuning

### What is hyperparameter tuning?


A machine learning model has two types of parameters:

**Trainable parameters**, which are learned by the algorithm during training. For instance, the weights of a neural network are trainable parameters.

**Hyperparameters**, which need to be set before launching the learning process. The learning rate or the number of units in a dense layer are hyperparameters.

Hyperparameters can be numerous even for small models. Tuning them can be a real brain teaser but worth the challenge: a good hyperparameter combination can highly improve your model's performance. Here we'll see that on a simple CNN model, it can help you gain 10% accuracy on the test set!

### Hyperparameter tuning with Keras Tuner


![tuner_image](https://images.ctfassets.net/be04ylp8y0qc/1m1AB8NPTcKkuEaqg2Zgyg/e56d45a2c85b99820ead8159a280323f/hp_tuning_flow_1f745fd5e0ae8830804bab6a66e2c917_1000.png?fm=webp)

#### How does Kera-Tuner work?

First, a tuner is defined. Its role is to determine which hyperparameter combinations should be tested. The library search function performs the iteration loop, which evaluates a certain number of hyperparameter combinations. Evaluation is performed by computing the trained model's accuracy on a held-out validation set.

Finally, the best hyperparameter combination in terms of validation accuracy can be tested on a held-out test set.

#### Building end-to-end pipeline to tune a simple convolutional network's hyperparameters for object classification on the CIFAR10 dataset.

**Install Keras Tuner**

    pip install keras-tuner


Here we use the **CIFAR10 dataset**. CIFAR10 is a common benchmarking dataset in computer vision. It contains 10 classes and is relatively small, with 60000 images. This size allows for a relatively short training time which we'll take advantage of to perform multiple hyperparameter tuning iterations.

In [3]:
#### Load and pre-process data

In [None]:

from tensorflow.keras.datasets import cifar10
# Load data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Pre-processing
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

In [None]:
The tuner expects floats as inputs, and the division by 255 is a data normalization step.

In [None]:
### Model definition

Here, we will work with a simple convolutional model to classify each image into one of the 10 available classes.

![arch](https://images.ctfassets.net/be04ylp8y0qc/1vDpd4pZYAcSLrN5lr525T/eb3a5873f1a2e4eb0df72bbae1f74448/simple_cnn_c0a8b6073d06e3c72809af2d91fd4cf9_800.jpeg?fm=webp)

Each input image will go through two convolutional blocks (2 convolution layers followed by a pooling layer) and a dropout layer for regularization purposes. Finally, each output is flattened and goes through a dense layer that classify the image into one of the 10 classes.

In Keras, this model can be defined as below :

In [None]:
from tensorflow import keras
from tensorflow.keras.layers import (
    Conv2D,
    Dense,
    Dropout,
    Flatten,
    MaxPooling2D
)

INPUT_SHAPE = (32, 32, 3)
NUM_CLASSES = 10

model = keras.Sequential()
model.add(
    Conv2D(
        filters=16,
        kernel_size=3,
        activation='relu',
        input_shape=INPUT_SHAPE
    )
)
model.add(Conv2D(16, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(rate=0.25))
model.add(Conv2D(32, 3, activation='relu'))
model.add(Conv2D(64, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(units=128, activation='relu'))
model.add(Dropout(rate=0.25))
model.add(Dense(NUM_CLASSES, activation='softmax'))


In [None]:
### Search Space definition

To perform hyperparameter tuning, we need to define the search space, that is to say which hyperparameters need to be optimized and in what range. Here, for this relatively small model, there are already 6 hyperparameters that can be tuned:

the dropout rate for the three dropout layers

the number of filters for the convolutional layers

the number of units for the dense layer

its activation function

In Keras Tuner, hyperparameters have a type (possibilities are Float, Int, Boolean, and Choice) and a unique name. Then, a set of options to help guide the search need to be set:

a minimal, a maximal and a default value for the Float and the Int types

a set of possible values for the Choice type

optionally, a sampling method within linear, log or reversed log. Setting this parameter allows to add prior knowledge you might have about the tuned parameter. We'll see in the next section how it can be used to tune the learning rate for instance

optionally, a step value, i.e the minimal step between two hyperparameter values

For instance, to set the hyperparameter 'number of filters' you can use:

In [None]:

filters=hp.Choice(
    'num_filters',
    values=[32, 64],
    default=64,
),

In [None]:
The dense layer has two hyperparameters, the number of units and the activation function:

In [None]:

Dense(
    units=hp.Int(
        'units',
        min_value=32,
        max_value=512,
        step=32,
        default=128
    ),
    activation=hp.Choice(
        'dense_activation',
        values=['relu', 'tanh', 'sigmoid'],
        default='relu'
    )
)

### Model Compilation

Then let's move to model compilation, where other hyperparameters are also present. The compilation step is where the optimizer along with the loss function and the metric are defined. Here, we'll use categorical entropy as a loss function and accuracy as a metric. For the optimizer, different options are available. We'll use the popular Adam.

In [None]:

model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Here, the learning rate, which represents how fast the learning algorithm progresses, is often an important hyperparameter. Usually, the learning rate is chosen on a log scale. This prior knowledge can be incorporated in the search through the setting of the sampling method:

In [None]:

hp.Float(
    'learning_rate',
    min_value=1e-5,
    max_value=1e-2,
    sampling='LOG',
    default=1e-3
)

### Keras Tuner Hypermodels

To put the whole hyperparameter search space together and perform hyperparameter tuning, Keras Tuners uses `HyperModel` instances. Hypermodels are reusable class object introduced with the library, defined as follows:

In [None]:
from kerastuner import HyperModel


class CNNHyperModel(HyperModel):
    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape
        self.num_classes = num_classes

    def build(self, hp):
        model = keras.Sequential()
        model.add(
            Conv2D(
                filters=16,
                kernel_size=3,
                activation='relu',
                input_shape=self.input_shape
            )
        )
        model.add(
            Conv2D(
                filters=16,
                activation='relu',
                kernel_size=3
            )
        )
        model.add(MaxPooling2D(pool_size=2))
        model.add(
            Dropout(rate=hp.Float(
                'dropout_1',
                min_value=0.0,
                max_value=0.5,
                default=0.25,
                step=0.05,
            ))
        )
        model.add(
            Conv2D(
                filters=32,
                kernel_size=3,
                activation='relu'
            )
        )
        model.add(
            Conv2D(
                filters=hp.Choice(
                    'num_filters',
                    values=[32, 64],
                    default=64,
                ),
                activation='relu',
                kernel_size=3
            )
        )
        model.add(MaxPooling2D(pool_size=2))
        model.add(
            Dropout(rate=hp.Float(
                'dropout_2',
                min_value=0.0,
                max_value=0.5,
                default=0.25,
                step=0.05,
            ))
        )
        model.add(Flatten())
        model.add(
            Dense(
                units=hp.Int(
                    'units',
                    min_value=32,
                    max_value=512,
                    step=32,
                    default=128
                ),
                activation=hp.Choice(
                    'dense_activation',
                    values=['relu', 'tanh', 'sigmoid'],
                    default='relu'
                )
            )
        )
        model.add(
            Dropout(
                rate=hp.Float(
                    'dropout_3',
                    min_value=0.0,
                    max_value=0.5,
                    default=0.25,
                    step=0.05
                )
            )
        )
        model.add(Dense(self.num_classes, activation='softmax'))

        model.compile(
            optimizer=keras.optimizers.Adam(
                hp.Float(
                    'learning_rate',
                    min_value=1e-4,
                    max_value=1e-2,
                    sampling='LOG',
                    default=1e-3
                )
            ),
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        return model

hypermodel = CNNHyperModel(input_shape=INPUT_SHAPE, num_classes=NUM_CLASSES)


##### The library already offers two on-the-shelf hypermodels for computer vision, HyperResNet and HyperXception.

### Choose the tuner

Keras Tuner offers the main hyperparameter tuning methods: random search, Hyperband, and Bayesian optimization.

In this tutorial, we'll focus on random search and Hyperband. We won't go into theory, but if you want to know more about random search and Bayesian Optimization, I wrote a post about it: Bayesian optimization for hyperparameter tuning. As for Hyperband, its main idea is to optimize Random Search in terms of search time.

For every tuner, a seed parameter can be defined for experiments reproducibility: SEED = 1.

### Random Search
The most intuitive way to perform hyperparameter tuning is to randomly sample hyperparameter combinations and test them out. This is exactly what the RandomSearch tuner does!

In [None]:
from kerastuner.tuners import RandomSearch

NUM_CLASSES = 10  # cifar10 number of classes
INPUT_SHAPE = (32, 32, 3)  # cifar10 images input shape

hypermodel = CNNHyperModel(input_shape=INPUT_SHAPE, num_classes=NUM_CLASSES)

tuner = RandomSearch(
    hypermodel,
    objective='val_accuracy',
    seed=SEED,
    max_trials=MAX_TRIALS,
    executions_per_trial=EXECUTION_PER_TRIAL,
    directory='random_search',
    project_name='cifar10'
)

The objective is the function to optimize. The tuner infers if it is a maximization or a minimization problem based on its value.

Then, the max_trials variable represents the number of hyperparameter combinations that will be tested by the tuner, while the execution_per_trial variable is the number of models that should be built and fit for each trial for robustness purposes. The next section explains how to set them

### Hyperband

In [None]:

from kerastuner.tuners import Hyperband


tuner = Hyperband(
    hypermodel,
    max_epochs=HYPERBAND_MAX_EPOCHS,
    objective='val_accuracy',
    seed=SEED,
    executions_per_trial=EXECUTION_PER_TRIAL,
    directory='hyperband',
    project_name='cifar10'
)

Hyperband is an optimized version of random search which uses early-stopping to speed up the hyperparameter tuning process. The main idea is to fit a large number of models for a small number of epochs and to only continue training for the models achieving the highest accuracy on the validation set. The max_epochs variable is the max number of epochs that a model can be trained for.

The problem is slightly different than the determination of hyperparameters. **Indeed, these settings here will mostly depend on your computing time and resources**. The highest number of trials you can perform, the better! Regarding the number of epochs, it is best if you know how many epochs your model needs to converge. You can also use early-stopping to prevent overfitting.

### Hyperparameter tuning

Once the model and the tuner are set up, a summary of the task is easily available

In [None]:
tuner.search_space_summary()

![tuner](https://images.ctfassets.net/be04ylp8y0qc/RTQVnnpte1eH6A6kYjZnI/ba934cba168fd193c88daae416301d32/search_space_summary_85ac4bdeb6c76e0b63af13fa9aa69199_800.png?fm=webp)

#### Tuning can start!

In [None]:
N_EPOCH_SEARCH = 40

tuner.search(x_train, y_train, epochs=N_EPOCH_SEARCH, validation_split=0.1)

In [None]:
The search function takes as input the training data and a validation split to perform hyperparameter combinations evaluation. The epochs parameter is used in random search and Bayesian Optimization to define the number of training epochs for each hyperparameter combination.

Finally, the search results can be summarized and used as follows:

In [None]:

# Show a summary of the search
tuner.results_summary()

# Retrieve the best model.
best_model = tuner.get_best_models(num_models=1)[0]

# Evaluate the best model.
loss, accuracy = best_model.evaluate(x_test, y_test)


The following results were obtained :-
    
![img](https://images.ctfassets.net/be04ylp8y0qc/7LUFbWdFpTMzMF5vOXEDem/e11364fbea6c0070e43b60fd73bf3dae/tuning_results_f60a58eb6a16fe496b4819d75a7ed838_800.png?fm=webp)

These results are far from the 99.3% accuracy achieved by state-of-the-art models on the CIFAR10 dataset but not so bad for such a simple network structure. You can already see notable improvement between the baselines and the tuned models, with a boost of more than 10% in accuracy between Random Search and the first baseline.

Overall, the Keras Tuner library is a nice and easy to learn option to perform hyperparameter tuning for your Keras and Tensorflow 2.O models. **The main step is you have to work on is adapting your model to fit the hypermodel format**.