# Introduction to Deep Learning Hyperparameter Tuning

Deep Learning models are highly sensitive to their hyperparameters, which significantly influence the model's performance. Hyperparameter tuning is the process of selecting the set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is set before the learning process begins.

Hyperparameters can broadly be classified into two categories:

- **Model hyperparameters** which influence model selection such as the number and width of hidden layers in a neural network.
- **Algorithm hyperparameters** which influence the speed and quality of the learning algorithm such as learning rate or batch size.

The goal of hyperparameter tuning is to find the combination of hyperparameters that yields the best model performance, often measured through validation data.


### Importance of Hyperparameter Tuning

Hyperparameter tuning is critical in deep learning for several reasons:

1. **Improves Model Performance:** Proper tuning can lead to significant improvements in model accuracy.
2. **Controls Overfitting:** By tuning regularization parameters, we can control model complexity and mitigate overfitting.
3. **Efficiency:** Optimal hyperparameters can make training faster and more efficient, saving time and computational resources.

### Key Hyperparameters

- **Learning Rate**: Controls the speed at which the model's weights are adjusted during training.
- **Batch Size**: The number of samples processed before the model's weights are updated.
- **Number of Epochs**: The number of times the model iterates through the entire training dataset.
- **Optimizer**: The algorithm used to update the model's weights (e.g., Stochastic Gradient Descent, Adam).
- **Network Architecture**: Number of layers, neurons per layer, activation functions.

### Techniques for Hyperparameter Tuning

Several techniques exist for hyperparameter tuning, each with its own advantages and trade-offs:

1. **Grid Search:** Exhaustively tries every combination of hyperparameters specified in a grid.
2. **Random Search:** Randomly selects combinations of hyperparameters to try.
3. **Bayesian Optimization:** Uses a probabilistic model to guide the search for the best hyperparameters.
4. **Gradient-based Optimization:** Adjusts hyperparameters using gradient information when available.

# Hyperparameter Tuning Example with Keras and Keras Tuner

This example demonstrates how to perform hyperparameter tuning for a simple feedforward neural network (FNN) on the MNIST dataset. We use Keras for model construction and Keras Tuner for the tuning process. The goal is to find the best combination of hyperparameters that yields the highest accuracy on the validation set.

## Hyperparameters Tuned

- **Number of Layers (`num_layers`):** Determines the depth of the neural network. Deep networks can model complex patterns, but are also more computationally expensive and prone to overfitting.
- **Number of Neurons in Each Layer (`units`):** Controls the width of the layers. More neurons can capture more information but also make the network more complex and prone to overfitting.
- **Learning Rate (`learning_rate`):** Affects how quickly or slowly a neural network updates its parameters. The right learning rate can make training faster and more stable.

## Required Libraries

First, ensure you have Keras and Keras Tuner installed. You can install Keras Tuner via pip:

```bash
pip install keras-tuner


In [1]:
import keras_tuner as kt
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist


### Loading and Preprocessing the Data
The MNIST dataset is a collection of handwritten digits, which we'll use for a classification task.

In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize and flatten the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)


### Defining the Model
We define a function that constructs a neural network model. This function will be used by Keras Tuner to create different models with various hyperparameters.

In [3]:
def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Input(shape=(784,)))
    for i in range(hp.Int('num_layers', 1, 5)):
        model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=512,
                                            step=32),
                               activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))

    # Tuning the optimizer
    optimizer_choice = hp.Choice('optimizer', values=['adam', 'sgd', 'rmsprop'])

    if optimizer_choice == 'adam':
        optimizer = keras.optimizers.Adam(learning_rate=hp.Float('learning_rate', 1e-4, 1e-2, sampling='log'))
    elif optimizer_choice == 'sgd':
        optimizer = keras.optimizers.SGD(learning_rate=hp.Float('learning_rate', 1e-4, 1e-2, sampling='log'))
    else: # 'rmsprop'
        optimizer = keras.optimizers.RMSprop(learning_rate=hp.Float('learning_rate', 1e-4, 1e-2, sampling='log'))
    
    model.compile(optimizer=optimizer,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

### Tuning the Model
We use Keras Tuner's RandomSearch class to search the hyperparameter space. This method randomly selects combinations of hyperparameters to construct and evaluate different models.
The tuner searches for the best hyperparameter set by training the model on the training data (x_train, y_train), with a specified number of epochs (10) and validation split (0.2).

In [4]:
tuner = kt.RandomSearch(build_model,
                        objective='val_accuracy',
                        max_trials=10, # The number of different hyperparameter combinations to try
                        executions_per_trial=1, # The number of models that should be built and fit for each trial
                        directory='my_dir',
                        project_name='mnist_keras_tuner')

tuner.search(x_train, y_train, epochs=10, validation_split=0.2)


Reloading Tuner from my_dir/mnist_keras_tuner/tuner0.json


### Reviewing the Results
After the search is complete, we can review the best models and their performances:

In [5]:
# Get the top 3 models.
models = tuner.get_best_models(num_models=3)

# Get the hyperparameters of the best model.
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

# Evaluate the best model.
best_model = models[0]
loss, accuracy = best_model.evaluate(x_test, y_test)
print(f"Best Loss: {loss}, Best Accuracy: {accuracy}")



Best Loss: 0.08258586376905441, Best Accuracy: 0.9789999723434448


In [6]:
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(best_hps.values)

{'num_layers': 1, 'units_0': 448, 'learning_rate': 0.0011107930392350116, 'units_1': 32, 'units_2': 32, 'units_3': 384, 'units_4': 192, 'optimizer': 'adam'}


The results represents the optimal hyperparameters found through the hyperparameter optimization process using Keras Tuner for a neural network model. Each key-value pair in the dictionary has a specific meaning regarding the model's configuration:

- **`'num_layers': 1`**: Indicates that the optimal model configuration includes 1 dense layer (excluding the input and output layers). Despite the presence of parameters for up to 5 layers, only the first layer's configuration is applied, based on this optimal number of layers.

- **`'units_0': 448`**: Specifies that the first and only dense layer in the optimal model configuration should have 448 neurons. This suggests that having a large number of neurons in this layer is beneficial for the model's performance on the task.

- **`'learning_rate': 0.0011107930392350116`**: The learning rate for the Adam optimizer in the optimal model configuration. A learning rate around this value is typical and indicates that moderate step sizes during training are optimal for this specific task.

- **`'units_1': 32, 'units_2': 32, 'units_3': 384, 'units_4': 192`**: These values represent the number of units for additional dense layers that were considered during the optimization search but are not used in the final model configuration due to the optimal number of dense layers being 1. These values are artifacts of the search process.

- **`'optimizer': 'adam'`**: The Adam optimizer was determined to be the most effective for this task out of the options tested (adam, sgd, rmsprop), indicating its suitability for the model given the task and data.

In summary, the optimal model configuration includes one dense layer with 448 units and utilizes the Adam optimizer with a learning rate of approximately 0.0011. The specified units for additional layers beyond the first are not applicable to the best model configuration.
