# 5. Hyper-parameter tuning

> <div class="alert alert-block alert-info">
<b>Now that we've implemented a data augmentation technique, we need find the optimal hyper-parameter settings to maximize model performance. We will use the keras-tuner library, which is a hyperparameter optimization framework containing multiple tuning algorithms including RandomSearch, HyperBand and BayesianOptimization.</b>
</div>

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import tensorflow as tf
from tensorflow import keras
import seaborn as sns

2023-02-23 08:53:42.523455: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-23 08:53:44.236412: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-23 08:53:44.236525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64


### Load data

In [2]:
X_train = np.load('./tmp/X_train.npy')
y_train = np.load('./tmp/y_train.npy')
X_val = np.load('./tmp/X_val.npy')
y_val = np.load('./tmp/y_val.npy')
X_test = np.load('./tmp/X_test.npy')

### 4.1 Creating an ImageDataGenerator

In [3]:
from keras.preprocessing.image import ImageDataGenerator

In [4]:
batch_size = 32

In [5]:
datagen = ImageDataGenerator(
        rotation_range=10,  
        width_shift_range=0.1, 
        height_shift_range=0.1,
        zoom_range = 0.10,  
)

In [6]:
train_generator = datagen.flow(X_train, y_train, batch_size=batch_size)

### Scores

In [7]:
# Precision (using keras backend)
def precision_metric(y_true, y_pred):
    threshold = 0.5  # Training threshold 0.5
    y_pred_y = K.cast(K.greater(K.clip(y_pred, 0, 1), threshold), K.floatx())

    true_positives = K.sum(K.clip(y_true * y_pred, 0, 1))
    false_negatives = K.sum(K.clip(y_true * (1-y_pred), 0, 1))
    false_positives = K.sum(K.clip((1-y_true) * y_pred, 0, 1))
    true_negatives = K.sum(K.clip((1 - y_true) * (1-y_pred), 0, 1))

    precision = true_positives / (true_positives + false_positives + K.epsilon())
    return precision

# Recall (using keras backend)
def recall_metric(y_true, y_pred):
    threshold = 0.5 #Training threshold 0.5
    y_pred = K.cast(K.greater(K.clip(y_pred, 0, 1), threshold), K.floatx())

    true_positives = K.sum(K.clip(y_true * y_pred, 0, 1))
    false_negatives = K.sum(K.clip(y_true * (1-y_pred), 0, 1))
    false_positives = K.sum(K.clip((1-y_true) * y_pred, 0, 1))
    true_negatives = K.sum(K.clip((1 - y_true) * (1-y_pred), 0, 1))

    recall = true_positives / (true_positives + false_negatives + K.epsilon())
    return recall

# F1-score (using keras backend)
def f1_metric(y_true, y_pred):
    precision = precision_metric(y_true, y_pred)
    recall = recall_metric(y_true, y_pred)
    f1 = 2 * ((precision * recall) / (recall+precision+K.epsilon()))
    return f1

### 5.1 Building hyper-parameter model

In [8]:
import keras_tuner as kt

In [9]:
def build_model_hp(hp):
    inp = keras.layers.Input(shape=[28,28,1])
    
    dropout = hp.Choice('conv_block_dropout', [0.125,0.25,0.375,0.5])
    conv_kernel_size = hp.Choice('conv_kernel_size', [5]) # Kernel size 5 is optimal after mutliple testing experiments
    
    n_layers = hp.Choice('n_conv_blocks', [2,3,4])

    filter_choice = hp.Choice('filter_combination_choice', [0,1,2,3])
    
    filter_combinations_2 = [[16,32],[32,64],[64,128],[128,256]]
    filter_combinations_3 = [[16,32,48],[16,32,64],[32,64,128],[64,128,256]]
    filter_combinations_4 = [[16,16,32,32],[32,32,64,64],[64,64,128,128],[128,128,256,256]]

    if n_layers==2:
        filter_settings = filter_combinations_2[filter_choice]
    elif n_layers==3:
        filter_settings = filter_combinations_3[filter_choice]
    elif n_layers==4:
        filter_settings = filter_combinations_4[filter_choice]
        
    for i in range(n_layers):
        if i == 0:
            x = keras.layers.Conv2D(filters=filter_settings[i], 
                            kernel_size=conv_kernel_size,
                            strides=1, padding='SAME', 
                            activation='relu')(inp)
        else:
            x = keras.layers.Conv2D(filters=filter_settings[i], 
                            kernel_size=conv_kernel_size,
                            strides=1, padding='SAME', 
                            activation='relu')(x)

        x = keras.layers.MaxPool2D(pool_size=2)(x)
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.Dropout(dropout)(x)
            
    x = keras.layers.Flatten()(x) 
    
    n_fc_layers = hp.Choice('n_fc_layers', [1,2,3])
    
    fc_choice = hp.Choice('fc_units_combination_choice', [0,1])
    
    fc_combinations_1 = [[128],[256]]
    fc_combinations_2 = [[128,64],[256,128]]
    fc_combinations_3 = [[512,256,128],[256,128,64]]
    
    if n_fc_layers==1:
        fc_units = fc_combinations_1[fc_choice]
    elif n_fc_layers==2:
        fc_units = fc_combinations_2[fc_choice]
    elif n_fc_layers==3:
        fc_units = fc_combinations_3[fc_choice]
    
    for j in range(n_fc_layers):
        x = keras.layers.Dense(fc_units[j], activation='relu')(x)
        x = keras.layers.Dropout(hp.Choice('fc_dropout', [0.125,0.25,0.5]))(x)
    
    out = keras.layers.Dense(10, activation='softmax')(x)
    
    model = keras.Model(inputs=inp, outputs=out)
    
    model.compile(loss=keras.losses.CategoricalCrossentropy(), optimizer=keras.optimizers.Adam(learning_rate=0.0001),
                 metrics=['accuracy', f1_metric, recall_metric, precision_metric])
    
    return model

In [10]:
# tuner = kt.RandomSearch(hypermodel=build_model_hp, objective='val_loss', max_trials=200, 
#                         overwrite=False, project_name='random_search')

In [11]:
tuner = kt.Hyperband(hypermodel=build_model_hp, objective='val_loss', max_epochs=50, executions_per_trial=2,
                        overwrite=False, project_name='hyperband_results')

INFO:tensorflow:Reloading Tuner from ./hyperband_results/tuner0.json


> <div class="alert alert-block alert-info">
<b>I tested both RandomSearch and HyperBand, and found HyperBand to be much more successful so we will use that.</b>
</div>

In [12]:
tuner.search_space_summary()

Search space summary
Default search space size: 7
conv_block_dropout (Choice)
{'default': 0.125, 'conditions': [], 'values': [0.125, 0.25, 0.375, 0.5], 'ordered': True}
conv_kernel_size (Choice)
{'default': 5, 'conditions': [], 'values': [5], 'ordered': True}
n_conv_blocks (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
filter_combination_choice (Choice)
{'default': 0, 'conditions': [], 'values': [0, 1, 2, 3], 'ordered': True}
n_fc_layers (Choice)
{'default': 1, 'conditions': [], 'values': [1, 2, 3], 'ordered': True}
fc_units_combination_choice (Choice)
{'default': 0, 'conditions': [], 'values': [0, 1], 'ordered': True}
fc_dropout (Choice)
{'default': 0.125, 'conditions': [], 'values': [0.125, 0.25, 0.5], 'ordered': True}


### 5.2 Hyper-parameter search (using HyperBand)

> <div class="alert alert-block alert-info">
<b>The hyperband tuning algorithm is an extension of the Successive Halving Algorithm(SHA) for adaptive resource allocation with early stopping. </b>
<br></br>
<b>Here is the original paper:* https://jmlr.org/papers/volume18/16-558/16-558.pdf</b>
<br></br>
<b>Essentially, what this means is at the start of tuning, all parameter sets get equal opportunity (uniform allocation of resources). The tuning algorithm only uses limited resources at the start for efficiency (i.e. 2-3 epochs). After the first stage, the top-half of best performing hyper-parameter sets are then progressed onto the next stage, with more resources allocated them (i.e. 10 epochs). The process is continued until the optimal configuration is found. Pretty smart huh!</b>
</div>

> <div class="alert alert-block alert-info">
<b>We will now start the HyperBand tuning search, this might take a while...</b>
</div>

#### *We will now start the HyperBand tuning search, this might take a while...*

In [13]:
from keras import backend as K

In [14]:
steps_per_epoch = train_generator.n // train_generator.batch_size

In [15]:
tuner.search(train_generator, validation_data=(X_val, y_val), epochs=30, steps_per_epoch=steps_per_epoch,
             callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss',mode='min',patience=10,
                                                      min_delta=0.005, restore_best_weights=True),
                       keras.callbacks.ReduceLROnPlateau(monitor = 'val_loss', patience = 3)])


Trial 47 Complete [00h 02m 28s]
val_loss: 0.08093061298131943

Best val_loss So Far: 0.02467612363398075
Total elapsed time: 01h 04m 09s

Search: Running Trial #48

Value             |Best Value So Far |Hyperparameter
0.25              |0.375             |conv_block_dropout
5                 |5                 |conv_kernel_size
3                 |4                 |n_conv_blocks
0                 |3                 |filter_combination_choice
2                 |2                 |n_fc_layers
1                 |0                 |fc_units_combination_choice
0.5               |0.125             |fc_dropout
6                 |17                |tuner/epochs
0                 |6                 |tuner/initial_epoch
2                 |3                 |tuner/bracket
0                 |2                 |tuner/round

Epoch 1/6


2023-02-23 09:58:02.426699: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inmodel/dropout/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer


Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6

KeyboardInterrupt: 

> <div class="alert alert-block alert-info">
    <b>After almost 5.5 hours, the search is done! Let's check the results.</b>
</div>

### 5.3 Hyper-parameter results

In [None]:
top_model = tuner.get_best_models(1)[0]
top_model_hps = tuner.get_best_hyperparameters(1)[0]
print(top_model_hps.values)
top_model.summary()

> <div class="alert alert-block alert-info">
<b>Above shows the hyper-parameters and model architecture of the best performing model from hyper-parameter tuning.</b>
<br></br>
<b>Lets view the validation accuracy for the best model.</b>
</div>

In [None]:
y_val_true = np.argmax(y_val,axis=1)
y_val_pred = np.argmax(top_model.predict(X_val), axis=1)
accuracy_score(y_val_true, y_val_pred)


> <div class="alert alert-block alert-info">
<b>Wow, the accuracy score of our top model is significantly better than our previous model. This shows the importance of hyper-parameter tuning, and was definitely worth the time spent tuning.</b>
</div>

In [None]:
tuner.results_summary(5)

> <div class="alert alert-block alert-info">
<b>Here are some more models for comparison, showing the network configurations of the top 5 models (in order of best performing).</b>
</div>