# Optimization of the hyperparameters of a Neural Network with Keras

### Level: Intermediate

In this notebook, we show how to use Keras tuner to optimize ANN hiperparameters. 

First, we create a basic neural network for this task. Then, we adress how to optimize some inner and outer hiperparameters.

At the end of this notebook there is some useful documentation of this topic.

#### Dependencies

In [1]:
import keras
import keras_tuner as kt
import pandas as pd
from sklearn.model_selection import train_test_split

2024-11-20 19:18:09.338029: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-20 19:18:09.338773: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 19:18:09.343914: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 19:18:09.359397: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1732126689.382905   68955 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732126689.39

#### Data

Load train data and split into train/test groups.

In [2]:
try:
    X = pd.read_parquet("features.parquet")
    y = pd.read_parquet("targets.parquet")
except FileNotFoundError:
    X = pd.DataFrame(
        [
            [0, 0],
            [0, 1],
            [1, 0],
            [1, 0],
            [0, 1],
            [0, 1],
            [1, 1],
            [1, 0],
            [1, 1],
            [0, 0],
            [0, 0],
        ]
    )
    y = pd.Series([0, 1, 2, 2, 1, 1, 3, 2, 3, 0, 0])


# we set a fixed random state for reproducibility and teaching purposes,
# but our results have to be consistent across multiple seeds to be relevant
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.9, test_size=0.1, random_state=0
)

# also fix keras random seed
keras.utils.set_random_seed(0)

# shape of input features and output predictions
features_shape = X_train.iloc[0].shape
target_shape = y_train.iloc[0].shape if bool(y_train.iloc[0].shape) else 1

#### Some parameters

In [3]:
# model optimizer
loss = keras.losses.MeanAbsoluteError()  # or "mae"
lr = {"min_value": 1e-4, "max_value": 1e-2}  # values from 0.0001 to 0.01
metrics = ["accuracy"]

# training
epochs = 200
batch_size = 64

# hp tuner
max_trials = 50

# early stopping
monitor = "val_loss"
patience = int(0.1 * epochs)

#### Neural Network model

We employ the `Functional API` inside of a function to define a model for hyperparameter search [1], with an input for hyperparameters to search as indicated.

In this example, we just go for the optimization of the learning rate, this is, how much we change the NN parameters each train step.

In [4]:
def model_builder(hp):
    """Build a neural network model."""
    input_layer = keras.Input(shape=features_shape)
    inner_layer_1 = keras.layers.Dense(64, activation="selu")(input_layer)
    inner_layer_2 = keras.layers.Dense(32, activation="selu")(inner_layer_1)
    inner_layer_3 = keras.layers.Dense(16, activation="selu")(inner_layer_2)
    output_layer = keras.layers.Dense(target_shape)(inner_layer_3)

    model = keras.Model(inputs=input_layer, outputs=output_layer, name="NN_model")

    # Tune the learning rate for the optimizer, choose an optimal value [2]
    hp_learning_rate = hp.Float(
        "learning_rate", min_value=lr["min_value"], max_value=lr["max_value"]
    )

    model.compile(
        loss=loss,
        optimizer=keras.optimizers.Nadam(learning_rate=hp_learning_rate),
        metrics=metrics,
    )

    return model

#### Inner Hyperparameter optimization

Defined the model, select the method to achieve the tuning [3] and perform it.

In [5]:
# we set a fixed random state for reproducibility and teaching purposes, but our
# results have to be consistent across multiple seeds to be relevant, which can
# be easily implemented avoiding the fixing of a `seed` and setting
# `executions_per_trial` to more than one
tuner = kt.GridSearch(
    hypermodel=model_builder,  # model to tune that takes hyperparameters and returns a Model instance
    objective=metrics,  # direction of the optimization
    max_trials=max_trials,  # total number of model configurations to test
    tune_new_entries=True,  # if hyperparameter entries requested by the hypermodel should be added to the search space
    allow_new_entries=True,
    seed=0,
    project_name="KerasTuner",  # prefix for files saved by this Tuner, which are control points for each model configuration
    # executions_per_trial = 10
)
# NOTE: if hyperparameter search is executed again, Keras Tuner will use
# `project_name` saved files to resume the search. To avoid that, set
# `overwrite=True`.

# set an early stopping for training
callbacks = [keras.callbacks.EarlyStopping(monitor=monitor, patience=patience)]

# finally tune the hyperpararmeters, using the 10 percent of train data for
# validation. Input arguments are the same than those for keras.model.fit [4]
tuner.search(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.1,
    callbacks=callbacks,
)

# print search space summary
tuner.search_space_summary(extended=False)

# get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(
    f"""\nThe hyperparameter search is complete. The optimal learning 
    rate for the optimizer is {best_hps.get('learning_rate')}."""
)

Reloading Tuner from ./KerasTuner/tuner0.json
Search space summary
Default search space size: 1
learning_rate (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'linear'}

The hyperparameter search is complete. The optimal learning 
    rate for the optimizer is 0.008515000000000002.


2024-11-20 19:18:12.385752: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


#### Outer Hyperparameter optimization

Found the optimal inner hyperparameters, use them for training and find the optimal outer hyperparameters, such as the number of epochs during training.

This epoch optimization can be safely skipped if, for example, early stopping is considered to be enough to avoid the lost of performance or the saving of the models at different epochs is implemented. [5]

In [6]:
# build the model with the optimal hyperparameters
model = tuner.hypermodel.build(best_hps)

# train the model again
history = model.fit(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.1,
    callbacks=callbacks,
)

# obtain the epoch with the best loss
val_loss_per_epoch = history.history[monitor]
best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
print("\nBest epoch: %d" % (best_epoch,))

Epoch 1/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2500 - loss: 1.4377 - val_accuracy: 1.0000 - val_loss: 0.1571
Epoch 2/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step - accuracy: 0.3750 - loss: 0.3712 - val_accuracy: 1.0000 - val_loss: 0.6839
Epoch 3/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step - accuracy: 0.3750 - loss: 0.2796 - val_accuracy: 1.0000 - val_loss: 0.7806
Epoch 4/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 83ms/step - accuracy: 0.3750 - loss: 0.7899 - val_accuracy: 1.0000 - val_loss: 0.0344
Epoch 5/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 88ms/step - accuracy: 0.3750 - loss: 0.0744 - val_accuracy: 1.0000 - val_loss: 0.0587
Epoch 6/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 92ms/step - accuracy: 0.3750 - loss: 0.5230 - val_accuracy: 0.0000e+00 - val_loss: 0.5108
Epoch 7/200
[1m1/1[0m [32m━━━━━━━━━

#### Last train with optimized hyperparameters

Finally, train the final version of the model with inner and outer optimized hyperparameters, evaluate and save it.

In [7]:
# build again the model with the optimal hyperparameters
hypermodel = tuner.hypermodel.build(best_hps)

# retrain the model for the last time
hypermodel.fit(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=best_epoch,
    validation_split=0.1,
    callbacks=callbacks,
)

score = hypermodel.evaluate(X_test, y_test, verbose=0)
print(f"\n test {hypermodel.metrics_names} : {score}")

model = hypermodel.save("../models/model__Hp_optimization_NN_Keras.keras")

Epoch 1/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2500 - loss: 1.5043 - val_accuracy: 1.0000 - val_loss: 0.2221
Epoch 2/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step - accuracy: 0.3750 - loss: 0.4666 - val_accuracy: 1.0000 - val_loss: 0.3304
Epoch 3/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 56ms/step - accuracy: 0.3750 - loss: 0.5005 - val_accuracy: 0.0000e+00 - val_loss: 0.6251
Epoch 4/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 56ms/step - accuracy: 0.2500 - loss: 0.4581 - val_accuracy: 1.0000 - val_loss: 0.0041
Epoch 5/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - accuracy: 0.3750 - loss: 0.1805 - val_accuracy: 1.0000 - val_loss: 0.7392
Epoch 6/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 92ms/step - accuracy: 0.3750 - loss: 0.1895 - val_accuracy: 1.0000 - val_loss: 0.2253
Epoch 7/19
[1m1/1[0m [32m━━━━━━━━━━━━━━━━

#### References

.. [1] https://www.tensorflow.org/tutorials/keras/keras_tuner?hl=es-419 

.. [2] https://keras.io/api/keras_tuner/hyperparameters/

.. [3] https://keras.io/api/keras_tuner/tuners/

.. [4] https://keras.io/api/keras_tuner/tuners/base_tuner/ 

.. [5] https://keras.io/api/callbacks/model_checkpoint/ 