# Hyperparameter optimization with KerasTuner

<a href="https://colab.research.google.com/drive/1moOGHGc48OAHJIPCbhuIjGpJCwxrSr7v" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

Return to the [castle](https://github.com/Nkluge-correa/TeenyTinyCastle).

Many seemingly arbitrary decisions must be made when developing a deep-learning model, including:

- How many `layers` should you use in your network?
- How many `units` should each layer contain?
- Should you choose a different function or use `ReLU` as an activation function?
- How much `dropout` ought you to employ?

These architecture-level parameters are therefore referred to as hyperparameters. In practice, researchers develop intuition over time about what choices work and don't work when it comes to these choices. Hence, many of these values are found via experimentation. However, we can also try to automate this process, which leads us to the study of automatic hyperparameter optimization.

![hyper_optimization](https://miro.medium.com/max/1142/1*5mStLTnIxsANpOHSwAFJhg.png)

[Source](https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d).

And for models built using Keras, we can use `KerasTuner` to automatize this work.

> _[KerasTuner](https://keras.io/keras_tuner/) is a general-purpose hyperparameter tuning library that helps you choose the optimal set of hyperparameters for your nerual network. It has strong integration with Keras workflows but isn't limited to them: you could use it to tune scikit-learn models for example. To learn more about these optimization algorithms, visit [keras-team](https://github.com/keras-team/keras-tuner/tree/master/keras_tuner/tuners)._

In this notebook, we show how to use `KerasTuner` using the [Chest X-Ray Pneumonia dataset](https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia). This dataset sums to 5,856 X-ray images in JPEG format from pediatric patients from Guangzhou Women and Children's Medical Center.

> **Note**: all datasets and models related to the course and repo are in the Hub 🤗.


Before starting, let us install the `KerasTuner` library.

In [None]:
%pip install keras-tuner -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.0/128.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h

To start `KerasTuner`, we first create a function that returns a compiled Keras model, taking a `hp` argument to configure hyperparameters during model construction. Defining these hyperparameters and their potential ranges enables us to search for the optimal combination, leading to the best model performance.

For our model, we seek to optimize hyperparameters such as the number of layers, dense layer filters, activation functions, dropout rates, and learning rates. For instance, we toggle the use of a Dropout layer with `hp.Boolean()`, specify the activation function with `hp.Choice()`, and fine-tune the optimizer's learning rate using `hp.Float()`. These are just a subset of the available hyperparameters that can be customized according to your problem.

You can define multiple hyperparameters within the function, and for more details, refer to the [Keras Tuner documentation](https://keras.io/guides/keras_tuner/getting_started/).

In [None]:
import keras_tuner
from tensorflow import keras

def build_model(hp):
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(32,32)))
    # Tune the number of layers.
    for i in range(hp.Int("num_layers", 2, 3, 4)):
        model.add(
            keras.layers.Dense(
                # Tune number of units separately.  Keras Tuner will consider values in increments of 32 within the range specified by min_value and max_value.
                # it might consider values like 32, 64, 96, 128, and so on.
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation=hp.Choice("activation", ["relu", "tanh"]),
            )
        )
    if hp.Boolean("dropout"):
        model.add(keras.layers.Dropout(rate=0.20))
    model.add(keras.layers.Dense(1, activation="sigmoid"))
    learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    return model

# Testing the model
build_model(keras_tuner.HyperParameters()).summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 1024)              0         
                                                                 
 dense_3 (Dense)             (None, 32)                32800     
                                                                 
 dense_4 (Dense)             (None, 32)                1056      
                                                                 
 dense_5 (Dense)             (None, 1)                 33        
                                                                 
Total params: 33889 (132.38 KB)
Trainable params: 33889 (132.38 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


After defining the search space, we must select a tuner class to run the search. You may choose from `RandomSearch`, `BayesianOptimization`, and `Hyperband`, which correspond to different tuning algorithms. Here, we will use `RandomSearch`.

> First, we need to specify several arguments to initialize the tuner:

- `hypermodel`: The model-building function, in our case, is build_model.

- `objective`: The name of the objective to optimize (whether to minimize or maximize is automatically inferred for built-in metrics).

- `max_trials`: The total number of trials to run during the search.

- `executions_per_trial`: The number of models that should be built and fit for each trial. Different trials have different hyperparameter values. If you want to get results faster, you could set executions_per_trial=1 (single round of training for each model configuration).

- `overwrite`: Control whether to overwrite the previous results in the same directory or resume the previous search instead. We set `overwrite = True` to start a new search and ignore any previous results.

- `directory`: A path to a directory for storing the search results.

- `project_name`: The name of the sub-directory in the directory.

In [None]:
import os
project = "keras-tuner"

os.makedirs(project, exist_ok=True)

tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_accuracy",
    max_trials=5,
    executions_per_trial=2,
    overwrite=True,
    directory="/content/",
    project_name="/content/keras-tuner",
)

# Print a summary of the search space
tuner.search_space_summary()

Search space summary
Default search space size: 6
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 3, 'step': 4, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh'], 'ordered': False}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
dropout (Boolean)
{'default': False, 'conditions': []}
lr (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


Before starting the search, let us load from the Hub our dataset 🤗.

In [None]:
!pip install datasets -q

from datasets import load_dataset

dataset = load_dataset('AiresPucrs/chest-xray', split='train')

# turn the dataset into a pandas.DataFrame
df = dataset.to_pandas()

display(df)

Downloading data files: 0it [00:00, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.23G [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Unnamed: 0,image,label
0,"{'bytes': None, 'path': '/root/.cache/huggingf...",0
1,"{'bytes': None, 'path': '/root/.cache/huggingf...",0
2,"{'bytes': None, 'path': '/root/.cache/huggingf...",0
3,"{'bytes': None, 'path': '/root/.cache/huggingf...",0
4,"{'bytes': None, 'path': '/root/.cache/huggingf...",0
...,...,...
5851,"{'bytes': None, 'path': '/root/.cache/huggingf...",1
5852,"{'bytes': None, 'path': '/root/.cache/huggingf...",1
5853,"{'bytes': None, 'path': '/root/.cache/huggingf...",1
5854,"{'bytes': None, 'path': '/root/.cache/huggingf...",1


Transforming the dataset into a pandas.dataframe gives a table with two columns `(image, label)`. The image column has the path for all of the images we downloaded. These images are quite sizeable (1857 x 1317). To make this tutorial less computationally costly, let us resize them to 32 x 32 images (grayscale) and flatten them to a vector of length 1024.

These vectors and labels will be the inputs and targets for our dense network.

In [None]:
from PIL import Image
import pandas as pd
import numpy as np
import tqdm

image_arrays = list()

for image in tqdm.tqdm(df.image):
  # Open the image
  img = Image.open(image['path'])

  # Turn images into Black & White (lose the color channels)
  img = img.convert("L")

  # Resize the image
  small_img = img.resize((32,32))

  # Turn image into an array
  small_img_array = np.array(small_img)

  # Append the array to the image_arrays list
  image_arrays.append(small_img_array)

flatten_df = pd.DataFrame({"image": image_arrays, "label": df.label})

display(flatten_df.head())

100%|██████████| 5856/5856 [01:31<00:00, 63.88it/s] 


Unnamed: 0,image,label
0,"[[31, 28, 27, 27, 26, 28, 105, 44, 35, 74, 104...",0
1,"[[36, 104, 147, 178, 186, 173, 170, 182, 180, ...",0
2,"[[35, 33, 31, 32, 32, 30, 27, 43, 86, 109, 133...",0
3,"[[82, 89, 97, 111, 111, 116, 117, 119, 123, 12...",0
4,"[[35, 28, 28, 28, 25, 35, 86, 38, 51, 70, 96, ...",0


After organizing our data into a `pandas.DataFrame`, we will split it into train and test using `sklearn.model_selection.train_test_split` and convert the splits into `numpy` arrays.

In [None]:
from sklearn.model_selection import train_test_split

# Turn the lists into numpy arrays of type float32
flatten_df['image'] = flatten_df.image.apply(lambda x: np.asarray(x).astype('float32'))
flatten_df['label'] = flatten_df.label.apply(lambda x: np.asarray(x).astype('float32'))

# Give them the right dimensionality (batch, height, width)
X = np.transpose(np.dstack(flatten_df.image.values))
y = np.transpose(np.dstack(flatten_df.label.values))

# Split the dataset into train and test splits
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# As a sanity check, is always good to print your shapes!
print("Number of samples in the training set (x_train):", len(x_train))
print("X_train shape: ", x_train.shape)
print("Number of samples in the testing set (x_test):", len(x_test))
print("X_test shape: ", x_test.shape)

Then, we will start the search for the best hyperparameter configuration. All the arguments are passed to `model.fit()` in each execution. We are also passing our `test_split` to evaluate the model.

In [None]:
tuner.search(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Trial 5 Complete [00h 00m 22s]
val_accuracy: 0.717576801776886

Best val_accuracy So Far: 0.9343003332614899
Total elapsed time: 00h 02m 08s


Now,  we can visualize the entire search for the best hyperparameters.

In [None]:
# Print a summary of the search results
tuner.results_summary()

Results summary
Results in /content/keras_tuner
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 2 summary
Hyperparameters:
num_layers: 2
units_0: 384
activation: relu
units_1: 128
dropout: False
lr: 0.00022141149517017307
Score: 0.9343003332614899

Trial 1 summary
Hyperparameters:
num_layers: 2
units_0: 32
activation: tanh
units_1: 256
dropout: True
lr: 0.00013088220863233485
Score: 0.8634812235832214

Trial 3 summary
Hyperparameters:
num_layers: 2
units_0: 416
activation: relu
units_1: 384
dropout: True
lr: 0.003788350537374789
Score: 0.8255119621753693

Trial 0 summary
Hyperparameters:
num_layers: 2
units_0: 256
activation: tanh
units_1: 160
dropout: False
lr: 0.0007323512392313405
Score: 0.717576801776886

Trial 4 summary
Hyperparameters:
num_layers: 2
units_0: 256
activation: relu
units_1: 32
dropout: True
lr: 0.0027547362874968547
Score: 0.717576801776886


Now, we can retrain a model using the best hyperparameters we found in our search! 🚀

In [None]:
# Get the top 2 hyperparameters
best_hps = tuner.get_best_hyperparameters(2)

# Build the model with the best hp
model = build_model(best_hps[0])

# Train the model
model.fit(x_train, y_train, epochs=10)

# Evaluate the model
test_loss_score, test_acc_score = model.evaluate(x_test, y_test)

print(f'Final Loss: {test_loss_score:.2f}.')
print(f'Final Performance: {test_acc_score * 100:.2f} %.')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Final Loss: 0.38.
Final Performance: 93.17 %.


Hyperparameter optimization allows for systematically exploring various hyperparameter combinations, helping the model reach its highest potential. Without proper hyperparameter tuning, a model may underperform or fail to learn effectively, even with a well-curated dataset and architecture.

Overall, hyperparameter optimization is a powerful method for developing cutting-edge models for any task. 🙃

---

Return to the [castle](https://github.com/Nkluge-correa/TeenyTinyCastle).