<a href="https://colab.research.google.com/github/Gjeffroy/mnist-keras-tuner/blob/main/MNIST_CNN_hyperparameter_tuning_witth_keras_tuner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing Libraries

In [5]:
! pip install keras
! pip install keras-tuner



## Loading libraries

In [6]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from keras_tuner.tuners import RandomSearch, BayesianOptimization
from keras_tuner import HyperParameters
import os
import json

##  The different methods for NAS and hyperparameters tuning

1. **Random Search**: Randomly samples hyperparameter configurations or neural network architectures within predefined ranges or search spaces, often serving as a baseline optimization technique.

2. **Grid Search**: Exhaustively searches through a manually specified subset of the hyperparameter space, guaranteeing the optimal solution within the search space but can be computationally expensive for high-dimensional spaces.

3. **Bayesian Optimization**: Utilizes probabilistic models, such as Gaussian processes, to intelligently select hyperparameters or architectures based on past evaluations, efficiently balancing exploration and exploitation.

4. **Genetic Algorithms (GA)**: Inspired by natural selection, GA maintains a population of candidate solutions and evolves them over generations using selection, crossover, and mutation operations.

5. **NeuroEvolution of Augmenting Topologies (NEAT)**: A genetic algorithm specifically designed for evolving neural network architectures, starting with simple architectures and evolving them over generations by adding or removing neurons and connections.

6. **Gradient-Based Optimization**: Utilizes gradient information to optimize neural network architectures or hyperparameters, often using gradient descent variants like Adam and RMSProp.

7. **Reinforcement Learning (RL)**: Formulates the architecture search problem as a reinforcement learning task, where an agent learns to sequentially select architectural decisions based on rewards obtained from evaluating candidate architectures.

8. **Model-Based Optimization**: Constructs surrogate models of the objective function, such as Bayesian neural networks or Gaussian processes, to guide the search for optimal architectures or hyperparameters.

9. **Evolutionary Strategies**: A family of optimization algorithms inspired by biological evolution, maintaining a population of candidate solutions and iteratively evolving them using mutation and recombination operators.

10. **Meta-Learning**: Utilizes meta-learning techniques to learn optimization algorithms that adaptively search for optimal architectures or hyperparameters across different tasks or datasets.


## Functions and helpers

#### Loading and preparing data

In [7]:
# Load and preprocess the dataset
def load_and_prepare_mnist():
    # Load the MNIST dataset
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()

    # Normalize pixel values to between 0 and 1
    train_images = train_images / 255.0
    test_images = test_images / 255.0

    # Reshape images to the format (batch_size, height, width, channels)
    train_images = train_images.reshape((-1, 28, 28, 1))
    test_images = test_images.reshape((-1, 28, 28, 1))

    return (train_images, train_labels), (test_images, test_labels)

#### Building the model

In [8]:
# Define the model-building function
def build_model(hp):
    model = tf.keras.Sequential()

    # Tune the number of convolutional layers
    model.add(tf.keras.layers.Conv2D(filters=hp.Int('conv_1_filters', min_value=32, max_value=256, step=32),
                                     kernel_size=3,
                                     activation='relu',
                                     input_shape=(28, 28, 1)))

    for i in range(hp.Int('num_conv_layers', 1, 3)):
        model.add(tf.keras.layers.Conv2D(filters=hp.Int(f'conv_{i+2}_filters', min_value=32, max_value=256, step=32),
                                         kernel_size=3,
                                         activation='relu'))
        model.add(tf.keras.layers.MaxPooling2D(pool_size=2))

    model.add(tf.keras.layers.Flatten())

    # Tune the number of dense layers
    for i in range(hp.Int('num_dense_layers', 1, 3)):
        model.add(tf.keras.layers.Dense(units=hp.Int(f'dense_{i}_units', min_value=32, max_value=512, step=32),
                                         activation='relu'))

    # Output layer
    model.add(tf.keras.layers.Dense(10, activation='softmax'))

    # Tune learning rate
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    return model

#### Printing out the results

In [9]:
def summarize_tuner_attempts(directory):
    tuner_summaries = []

    # Iterate over each subdirectory in 'my_dir'
    for subdir in os.listdir(directory):
        subdir_path = os.path.join(directory, subdir)

        # Check if it's a directory
        if os.path.isdir(subdir_path):
            # Check if it contains a 'trial.json' file
            trial_file = os.path.join(subdir_path, 'trial.json')
            if os.path.exists(trial_file):
                # Load hyperparameters from 'trial.json'
                with open(trial_file, 'r') as f:
                    trial_data = json.load(f)
                hp = HyperParameters.from_config(trial_data['hyperparameters'])

                # Get the validation accuracy from 'trial.json'
                val_accuracy = trial_data.get('score')

                # Add hyperparameters and validation accuracy to the summaries list
                tuner_summaries.append((hp, val_accuracy))

    # Sort tuner summaries by the number of convolutional layers
    tuner_summaries.sort(key=lambda x: x[0].values['num_conv_layers'])

    return tuner_summaries

# Function to print CNN hyperparameters as a table
def print_hyperparameters_table(hp):
    print("Number of Convolutional Layers:", hp.values['num_conv_layers'])
    print("Number of Dense Layers:", hp.values['num_dense_layers'])
    print("Learning Rate:", hp.values['learning_rate'])
    print("\nCNN Hyperparameters:")
    sorted_conv_keys = sorted([key for key in hp.values.keys() if key.startswith('conv')])
    for key in sorted_conv_keys:
        print(f"| {key}: {hp.values[key]} |")
    print("\nDense Layer Hyperparameters:")
    sorted_dense_keys = sorted([key for key in hp.values.keys() if key.startswith('dense')])
    for key in sorted_dense_keys:
        print(f"| {key}: {hp.values[key]} |")
    print("\n")

# Function to print summary table
def print_summary_table(summary):
    print("Summary:")
    print("| Attempt | Accuracy | Num Conv Layers | Num Dense Layers |")
    print("|---------|----------|-----------------|-------------------|")
    for i, (hp, val_accuracy) in enumerate(summary, 1):
        num_conv_layers = hp.values['num_conv_layers']
        num_dense_layers = hp.values['num_dense_layers']
        print(f"| {i} | {val_accuracy} | {num_conv_layers} | {num_dense_layers} |")
    print("\n")

  from kerastuner import HyperParameters


#### Global parameters

`OPTIM_OBJ`: This variable specifies the metric to optimize during the tuning process. It could be a string representing a metric such as accuracy or loss.

`MAX_TRIAL_TUNER`: This variable determines the maximum number of different hyperparameter combinations that will be tested during the tuning process. Once this number is reached, the tuning process stops.

`NUM_EXEC_PER_TRIAL`: This variable specifies the number of executions to run for each trial (i.e., each set of hyperparameters). This is useful for reducing the effect of randomness in the training process.

`RESULT_DIR`: This variable specifies the directory where the tuning results will be saved.

`NUM_EPOCH_SEARCH`: This variable specifies the number of epoch to run in each execution during the tuning process

`NUM_EPOCH_VAL`: This variable specifies the number of epoch to run in each execution when validating the best set of hyperparameter found

`VAL_SPLIT`: This variable specifies the size of the validation set as a ratio of the entire set


In [10]:
# Global parameter
MAX_TRIAL_TUNER = 5
NUM_EXEC_PER_TRIAL = 3
NUM_EPOCH_SEARCH = 5
NUM_EPOCH_VAL = 10
VAL_SPLIT = 0.1
OPTIM_OBJ = 'val_accuracy'
RESULT_DIR = "results"

## Random Search optimisation

Random search is a fundamental technique in neural architecture search (NAS) and hyperparameter optimization. It operates by randomly sampling hyperparameter configurations or neural network architectures within predefined ranges or search spaces. Unlike more sophisticated methods like Bayesian optimization or evolutionary algorithms, random search does not leverage past evaluations to inform subsequent samples. Despite its simplicity, random search is remarkably effective, often outperforming more complex methods in practice due to its ability to explore a wide range of configurations efficiently. However, its effectiveness heavily depends on the size and structure of the search space. While random search may not guarantee finding the optimal solution, it serves as a strong baseline and is widely used in both NAS and hyperparameter optimization experiments.

More on the RandomSearch : https://keras.io/api/keras_tuner/tuners/random/

In [None]:
def main():
    # Load and prepare the MNIST dataset
    (train_images, train_labels), (test_images, test_labels) = load_and_prepare_mnist()

    # Initialize tuner
    tuner = RandomSearch(
        build_model,
        objective=OPTIM_OBJ,
        max_trials=MAX_TRIAL_TUNER,
        executions_per_trial=NUM_EXEC_PER_TRIAL,
        directory=RESULT_DIR,
        project_name='mnist_tuning_random'
    )

    # Perform the hyperparameter search
    tuner.search(train_images, train_labels, epochs=NUM_EPOCH_SEARCH, validation_split=VAL_SPLIT)

    # Get the best model
    best_model = tuner.get_best_models(num_models=1)[0]

    # Print the best hyperparameters
    print("\nBest Hyperparameters:")
    print(best_model.get_config())

    # Train the best model on the full training dataset
    print("\nTraining the best model...")
    best_model.fit(train_images, train_labels, epochs=NUM_EPOCH_VAL, validation_split=VAL_SPLIT)

    # Evaluate the best model on the test dataset
    print("\nEvaluating the best model on the test dataset...")
    loss, accuracy = best_model.evaluate(test_images, test_labels)
    print(f'Test accuracy: {accuracy}')

if __name__ == "__main__":
    main()


Trial 4 Complete [00h 03m 33s]
val_accuracy: 0.39455554882685345

Best val_accuracy So Far: 0.991611103216807
Total elapsed time: 00h 15m 45s

Search: Running Trial #5

Value             |Best Value So Far |Hyperparameter
96                |32                |conv_1_filters
3                 |3                 |num_conv_layers
192               |96                |conv_2_filters
1                 |2                 |num_dense_layers
416               |448               |dense_0_units
0.01              |0.001             |learning_rate
32                |192               |conv_3_filters
192               |160               |conv_4_filters
256               |32                |dense_1_units
224               |None              |dense_2_units

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5

In [None]:
# Print the outcomes and summary
summaries = summarize_tuner_attempts('results/mnist_tuning_random')
print_summary_table(summaries)
for i, (hp, val_accuracy) in enumerate(summaries, 1):
    print(f"Attempt {i}:")
    print_hyperparameters_table(hp)
    print(f"Validation Accuracy: {val_accuracy}\n")

## Bayesian Optimization
Bayesian optimization is a powerful technique for optimizing expensive-to-evaluate black-box functions, commonly used in neural architecture search (NAS) and hyperparameter optimization. Unlike random search, Bayesian optimization intelligently selects the next set of hyperparameters or neural network architectures based on past evaluations, aiming to maximize the expected improvement in performance. It models the objective function using a surrogate probabilistic model, typically a Gaussian process, which provides insights into the function's behavior and uncertainty. By iteratively balancing exploration (sampling from uncertain regions) and exploitation (sampling around promising regions), Bayesian optimization efficiently converges to the optimal solution with fewer evaluations compared to random search. Its ability to incorporate prior knowledge and adaptively explore the search space makes it highly effective, especially in scenarios where the evaluation budget is limited.

In [None]:
def main():
    # Load and prepare the MNIST dataset
    (train_images, train_labels), (test_images, test_labels) = load_and_prepare_mnist()

    # Initialize tuner
    tuner = BayesianOptimization(
        build_model,
        objective=OPTIM_OBJ,
        max_trials=MAX_TRIAL_TUNER,
        directory=RESULT_DIR,
        project_name='mnist_tuning_baye'
    )

    # Perform the hyperparameter search
    tuner.search(train_images, train_labels, epochs=5, validation_split=VAL_SPLIT)

    # Get the best model
    best_model = tuner.get_best_models(num_models=1)[0]

    # Print the best hyperparameters
    print("\nBest Hyperparameters:")
    print(best_model.get_config())

    # Train the best model on the full training dataset
    print("\nTraining the best model...")
    best_model.fit(train_images, train_labels, epochs=10, validation_split=VAL_SPLIT)

    # Evaluate the best model on the test dataset
    print("\nEvaluating the best model on the test dataset...")
    loss, accuracy = best_model.evaluate(test_images, test_labels)
    print(f'Test accuracy: {accuracy}')

if __name__ == "__main__":
    main()


In [None]:
# Print the outcomes and summary
summaries = summarize_tuner_attempts('results/mnist_tuning_baye')
print_summary_table(summaries)
for i, (hp, val_accuracy) in enumerate(summaries, 1):
    print(f"Attempt {i}:")
    print_hyperparameters_table(hp)
    print(f"Validation Accuracy: {val_accuracy}\n")