### Introduction to Keras Tuner

Keras Tuner is a powerful tool for hyperparameter tuning in Keras models. Hyperparameter tuning is the process of searching for the optimal set of hyperparameters (parameters that govern the training process) for a machine learning model. This process is crucial because the choice of hyperparameters can significantly impact the performance of a model.

#### What Can Be Tuned?

With Keras Tuner, you can tune virtually any aspect of a model. Common hyperparameters include:

1. **Learning Rate:** One of the most critical hyperparameters for training neural networks.
2. **Number of Layers:** The depth of the network.
3. **Number of Neurons in Each Layer:** Influences the model's capacity.
4. **Activation Functions:** Such as ReLU, sigmoid, or tanh.
5. **Batch Size:** Number of samples processed before the model is updated.
6. **Optimizer:** Such as Adam, SGD, etc.
7. **Dropout Rate:** Used in dropout layers to prevent overfitting.

#### Types of Tuners Available

Keras Tuner offers several algorithms for hyperparameter optimization:

1. **Random Search:** Randomly selects a combination of hyperparameters to construct a model. This method is simple and can be surprisingly effective.

2. **Hyperband:** An optimized version of random search which uses a bandit-based approach to allocate resources. It's more efficient as it early-stops underperforming trials.

3. **Bayesian Optimization:** Uses probability to model the function and then makes decisions on where to sample next. It’s more systematic compared to random search and often yields better results.

4. **Sklearn:** If you are using Keras models with the Scikit-learn API, this tuner adapts the hyperparameter search to this interface.

### Example: Hyperparameter Tuning with Fashion MNIST

#### Fashion MNIST Dataset

We'll demo the hyperparameter tuning using the Fashion MNIST data.


In [1]:
# If necessary, install keras tuner
%pip install keras_tuner

Collecting keras_tuner
  Obtaining dependency information for keras_tuner from https://files.pythonhosted.org/packages/2b/39/21f819fcda657c37519cf817ca1cd03a8a025262aad360876d2a971d38b3/keras_tuner-1.4.6-py3-none-any.whl.metadata
  Downloading keras_tuner-1.4.6-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras_tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Downloading keras_tuner-1.4.6-py3-none-any.whl (128 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.9/128.9 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hInstalling collected packages: kt-legacy, keras_tuner
Successfully installed keras_tuner-1.4.6 kt-legacy-1.0.5

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to 

In [8]:
import tensorflow as tf

#Load the data
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

#Clear the session for a fresh start
tf.keras.backend.clear_session()
tf.random.set_seed(42)


import keras_tuner as kt

def build_model(hp):
    # Set up parameters in the "hp" object, which is passed in by the keras tuner when iterating through
    # different models
    n_hidden = hp.Int("n_hidden", min_value=0, max_value=8, default=2)
    n_neurons = hp.Int("n_neurons", min_value=16, max_value=256)

    # Log sampling samples evenly across a log scale variable
    learning_rate = hp.Float("learning_rate", min_value=1e-4, max_value=1e-2,
                             sampling="log")
    optimizer = hp.Choice("optimizer", values=["sgd", "adam"])
    if optimizer == "sgd":
        optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
    else:
        optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

    # The following builds the model based on the parameters from the hp model
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten())
    for _ in range(n_hidden):
        model.add(tf.keras.layers.Dense(n_neurons, activation="relu"))
    model.add(tf.keras.layers.Dense(10, activation="softmax"))
    model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
                  metrics=["accuracy"])
    return model

Now, to do a search, you initialize the search object by passing in the above function.

In [9]:
# Note that max trials is 5=, indicating only 5 tests will be made
random_search_tuner = kt.RandomSearch(build_model, objective="val_accuracy", max_trials=5, overwrite=True, directory="my_fashion_mnist", project_name="my_rnd_search", seed=42)

# Finally, this kicks it all off
random_search_tuner.search(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

Trial 5 Complete [00h 00m 28s]
val_accuracy: 0.8370000123977661

Best val_accuracy So Far: 0.8583999872207642
Total elapsed time: 00h 02m 24s


In each trial, the tuner builds a model using hyperparameters sampled randomly within their respective ranges, then trains that model for 10 epochs and saves it to a subdirectory of the `my_fashion_mnist/my_rnd_search` directory. Since `overwrite=True`, the my_rnd_search directory is deleted before training starts. If you run this code a second time but with `overwrite=False` and `max_tri⁠als=10`, the tuner will continue tuning where it left off, running 5 more trials: this means you don’t have to run all the trials in one shot. Lastly, since objective is set to `val_accuracy`, the tuner prefers models with a higher validation accuracy, so once the tuner has finished searching, you can get the best models like this:

In [11]:
top3_models = random_search_tuner.get_best_models(num_models=3) 
best_model = top3_models[0]



Or you can get the best hyperparameters directly like:

In [12]:
top3_params = random_search_tuner.get_best_hyperparameters(num_trials=3)
top3_params[0].values # best hyperparameter values

{'n_hidden': 7,
 'n_neurons': 100,
 'learning_rate': 0.0012482904754698163,
 'optimizer': 'sgd'}

You can also get a richer summary by querying the "oracle" associated with a tuner:

In [14]:
best_trial = random_search_tuner.oracle.get_best_trials(num_trials=1)[0]
best_trial.summary()

Trial 1 summary
Hyperparameters:
n_hidden: 7
n_neurons: 100
learning_rate: 0.0012482904754698163
optimizer: sgd
Score: 0.8583999872207642


You can also get your best performing model and continue training with the full set if you'd like:

In [15]:

best_model.fit(X_train_full, y_train_full, epochs=10) 
test_loss, test_accuracy = best_model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### Tuning preprocessing

It is also possible to use the Keras Tuner to explore different pre-processing strategies.  However, to do this, it is necessary to extend the HyperModel class and define the `build()` and `fit()` methods.  The `fit()` method takes a hyperparameters object and a compiled model, as well as all other arguments to fit, and returns a history object.  The `fit()` methods determines how to preprocess data, tweak the batch size, etc.  For instance:


In [16]:

class MyClassificationHyperModel(kt.HyperModel): 
    def build(self, hp):
        # Just using the build_model function we defined previously
        return build_model(hp)
    
    def fit(self, hp, model, X, y, **kwargs): 
        # Here, we decide whether or not to include a normalilzation layer based on a hyperparameter setting.
        if hp.Boolean("normalize"):
            # Note, the Normalization "layer" here is just being use to normalize the data.  It is not actually added to the network.
            # This means that it is only applied to training data!  We'd probably want it as part of the model in a production scenario.
            norm_layer = tf.keras.layers.Normalization()
            X = norm_layer(X)
        return model.fit(X, y, **kwargs)

Now we can pass this to a tuner of our choice.  We'll use the hyperband tuner, which works by training several models for a few epochs, and then discards the worst models and continues with the top 1/factor (a parameter) models.

In [17]:
hyperband_tuner = kt.Hyperband(
MyClassificationHyperModel(), objective="val_accuracy", seed=42, max_epochs=10, factor=3, hyperband_iterations=2,
overwrite=True, directory="my_fashion_mnist", project_name="hyperband")

We can also use TensorBoard here to analyze our results for each of the different trials.

In [20]:
from pathlib import Path 
root_logdir = Path(hyperband_tuner.project_dir) / "tensorboard" 
tensorboard_cb = tf.keras.callbacks.TensorBoard(root_logdir) 
#Also using an early stopping callback here
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=2) 
# Finally, search
hyperband_tuner.search(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid), callbacks=[early_stopping_cb, tensorboard_cb])

Trial 60 Complete [00h 00m 42s]
val_accuracy: 0.8479999899864197

Best val_accuracy So Far: 0.8804000020027161
Total elapsed time: 00h 16m 16s


Run tensorboard...

In [22]:
%load_ext tensorboard 
%tensorboard --logdir=my_fashion_mnist/hyperband/tensorboard 
from IPython.display import display, HTML
# Display this inline here
display(HTML('<a href="http://localhost:6006/">http://localhost:6006/</a>'))

Launching TensorBoard...

### Guide on Selecting Hyperparameters

Selecting the right hyperparameters is crucial for training effective neural network models. Here's a quick guide covering key hyperparameters, along with rules of thumb and intuitions for training:

#### 1. Number of Hidden Layers

- **General Rule:** Start with one or two hidden layers for simple problems. For more complex problems, gradually increase the number of layers.
- **Deep Networks:** More layers allow the network to learn more complex patterns (hierarchical feature learning). However, deeper networks are harder to train and more prone to overfitting.
- **Transfer Learning:** Re-using hidden layers from a pre-trained model can significantly boost performance, especially when data is limited. This is effective when the new task is similar to the task originally trained on.

#### 2. Number of Neurons per Hidden Layer

- **Size Configurations:** Often configured in a funnel shape (e.g., decreasing number of neurons in each successive layer).
- **Capacity of the Model:** More neurons increase the learning capacity but can lead to overfitting and longer training times.
- **Rule of Thumb:** A common practice is to use a number of neurons that is between the number of input and output neurons, experimenting with values to find the optimal size.

#### 3. Learning Rate

- **Impact:** One of the most critical hyperparameters. Too low, and training will be slow; too high, and the network may not converge.
- **Adaptive Learning Rates:** Techniques like learning rate annealing (gradually reducing the learning rate) can be very effective.
- **Optimization Techniques:** Upcoming topics like learning rate schedules and adaptive learning rate methods (e.g., Adam, RMSprop) offer more nuanced control.

#### 4. Optimizer

- **Role of Optimizer:** Determines how the network will be updated based on the loss gradient. It affects the speed and quality of the learning process.
- **Choices:** Common optimizers include SGD (Stochastic Gradient Descent), Adam, RMSprop. Adam is a good default choice, as it combines the benefits of other extensions of SGD.

#### 5. Batch Size

- **Trade-offs:** Larger batches provide more accurate estimates of the gradient, but smaller batches offer a regularizing effect and less stable, noisier updates, which can help escape local minima.
- **Hardware Constraints:** Limited by memory constraints. Larger batches require more memory.
- **Typical Values:** Common batch sizes range from 32 to 256, but the optimal size depends on the specific problem and hardware.

#### 6. Activation Function

- **Non-linearity:** Functions like ReLU (and its variants like Leaky ReLU) are common because they help with faster training and reduce the likelihood of vanishing gradients in deep networks.
- **Problem Specific:** For the output layer, use softmax for multi-class classification and sigmoid for binary classification.

#### 7. Number of Iterations and Early Stopping

- **Epochs vs. Iterations:** An epoch is a complete pass over the entire training dataset, while an iteration is a single update of the model's parameters.
- **Early Stopping:** A technique to prevent overfitting by stopping training when the model's performance on a validation set starts to degrade.
- **Monitoring Performance:** Adjust the number of epochs based on monitoring performance metrics on a validation set.

### Conclusion

Selecting hyperparameters is often more art than science, requiring experimentation and iteration. It's common to start with certain defaults and adjust based on the specific problem and observed performance. Transfer learning, adaptive learning rates, and early stopping are advanced strategies that can significantly improve model performance and training efficiency. Remember, the goal is to find a balance that allows the network to learn effectively without overfitting, underfitting, or requiring excessive computational resources.