# BaseNet Prototype Hyperparameter Optimization

* The prupose of this notebook is to create an environment for optimizing the BaseNet prototype's architecture and hyperparameters with the Keras-tuner library.
* The goal of the optimization algorithm is to maximize the performance of the BaseNet configuration on the COVIDx training data.

*Author: Dominik Chodounský, FIT CTU in Prague (Last edit: 05/11/21)*

## Library Imports

In [1]:
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
from tensorflow.keras import callbacks, layers, Model, metrics, losses
from tensorflow.keras.layers import Dense, Input, Layer, Flatten, Activation, GlobalAveragePooling2D, Dropout, ZeroPadding2D, Conv2D, DepthwiseConv2D, concatenate, MaxPool2D, Reshape, Conv2DTranspose, LeakyReLU, UpSampling2D, BatchNormalization, add
from tensorflow.keras.optimizers import SGD, Nadam, Adam
from tensorflow.python.client import device_lib
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import cv2
from matplotlib.colors import Normalize
import matplotlib.pyplot as plt
import time
import random
import numpy as np
import os
import sys

import kerastuner as kt

## Constants and Settings
* Set variable *ROOT_DIR* to contain path to root *BI-BAP* folder.
* Set variables *TRAIN_DIR* and *TEST_DIR* to contain paths to training and test data folders (default values are meant for training on the COVIDx8B dataset).
* Set variable *SAVE_DIR* to contain the path to *BI-BAP/models/CNN/hyperparameter-optimization* folder, where the optimization algorithm saves its progress.
* You may change the default parameters for the optimization as well, their description is provided in the following cell.

In [2]:
ROOT_PATH = '../'
TRAIN_DIR = os.path.join(ROOT_PATH, 'data/COVIDx8B/train')
TEST_DIR = os.path.join(ROOT_PATH, 'data/COVIDx8B/test')
SAVE_DIR = os.path.join(ROOT_PATH, 'models/CNN/hyperparameter-optimization')

IMG_SIZE = 224   # target width and height of images that they will be resized to
BATCH_SIZE = 16  # size of generated batches of images
CHANNEL_CNT = 3  # number of channels in images (3 = RGB, 1 = Grayscale)
EPOCH_CNT = 60   # number of epochs available to the tuner during optimization
VAL_SPLIT = 0.2  # training/validation split
RAND_SEED = 111  # random seed for reproducibility

# ------------------------------------------------------------------------------------------------------------------------#
np.random.seed(RAND_SEED)
tf.random.set_seed(RAND_SEED)
random.seed(RAND_SEED)
os.environ['PYTHONHASHSEED'] = str(RAND_SEED)

warnings.filterwarnings("ignore")

## File Imports


In [3]:
from utils.utils import get_generators, get_class_weights

## Preprocessor definition

In [4]:
# Default preprocessor, only performs resize and scaling
def preprocessor(img):
    new_img = cv2.resize(img.astype('uint8'), (IMG_SIZE, IMG_SIZE))
    return new_img / 255

## Image Data Generators
* Create generators for training and validation set.

In [5]:
datagen = ImageDataGenerator(
            validation_split=VAL_SPLIT,
            preprocessing_function=preprocessor
          )
    
train_gen, valid_gen, _ = get_generators(datagen, TRAIN_DIR, TEST_DIR, IMG_SIZE, BATCH_SIZE, CHANNEL_CNT, True, RAND_SEED)

Found 12763 images belonging to 2 classes.
Found 3189 images belonging to 2 classes.
Found 400 images belonging to 2 classes.


In [6]:
train_gen.class_indices

{'negative': 0, 'positive': 1}

## Hyperparameter Optimization

* Create function to build the BaseNet architecture with specified hyperparameters that will then be optimized by the tuner.
* The HPs are: dropout rate, activation function, usage of a residual connection, and width of dense layer in the classifier.

In [None]:
def basenet_builder(hp):
    dropout_hp = hp.Choice('dropout', values=[0.2, 0.3, 0.5])
    
    activation_functions = [tf.keras.activations.relu,
                            tf.keras.layers.LeakyReLU(alpha=0.1), 
                            tf.keras.activations.tanh]
    
    activation_id = hp.Choice('activation_function', values=[0, 1, 2])
    activation_hp = activation_functions[activation_id]

    input_layer = Input(shape=(IMG_SIZE, IMG_SIZE, CHANNEL_CNT), name='input')
                
    x1 = Conv2D(filters=64, kernel_size=5, strides=1, activation=activation_hp, padding="same")(input_layer)
    x1 = Conv2D(filters=64, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x1)
    x1 = MaxPool2D(2, strides=2)(x1)
    x1 = Dropout(dropout_hp)(x1)
        
    x2 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x1)
    x2 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x2)
    x2 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x2)
    x2 = MaxPool2D(2, strides=2)(x2)
    x2 = Dropout(dropout_hp)(x2)
        
    residual = Conv2D(filters=128, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x2)
    residual = MaxPool2D(2, strides=8)(residual)
        
    x3 = Conv2D(filters=256, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x2)
    x3 = Conv2D(filters=256, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x3)
    x3 = MaxPool2D(2, strides=2)(x3)
    x3 = Dropout(dropout_hp)(x3)
        
    x4 = Conv2D(filters=256, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x3)
    x4 = Conv2D(filters=256, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x4)
    x4 = MaxPool2D(2, strides=2)(x4)
    x4 = Dropout(dropout_hp)(x4)
        
    x5 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x4)
    x5 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x5)
    x5 = Conv2D(filters=128, kernel_size=3, strides=1, activation=activation_hp, padding="same")(x5)
    x5 = MaxPool2D(2, strides=2)(x5)
    x5 = Dropout(dropout_hp)(x5)
        
    add_residual = hp.Boolean('add_residual')
    if add_residual:
        x6 = Conv2D(filters=128, kernel_size=5, strides=1, activation=activation_hp, padding="same", name='residual')(add([residual, x5]))
    else:
        x6 = Conv2D(filters=128, kernel_size=5, strides=1, activation=activation_hp, padding="same", name='residual')(x5)

    x6 = Conv2D(filters=128, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x6)
    x6 = Conv2D(filters=128, kernel_size=5, strides=1, activation=activation_hp, padding="same")(x6)
    x6 = Dropout(dropout_hp)(x6)
        
    x = GlobalAveragePooling2D(name='GAP')(x6)        
    x = Dense(hp.Choice('dense_neurons', values=[128, 256, 512]), activation=activation_hp)(x)
    x = Dropout(dropout_hp)(x)

    output = Dense(1, activation='sigmoid')(x)

    model = Model(inputs=[input_layer], outputs=[output])
    model.compile(optimizer=Adam(learning_rate=0.0001),
                  metrics=[metrics.BinaryAccuracy()], 
                  loss=losses.BinaryCrossentropy()
                 )
    return model

* Define a tuner to perform the optimization. 
* The pre-selected algorithm is the Hyperband, but the Keras-tuner library also allows for using Grid search, Random search,...

In [None]:
tuner = kt.Hyperband(basenet_builder,
                    objective='val_binary_accuracy',
                    max_epochs=7, # max number of epochs to train one model
                    executions_per_trial=1, # number of trials per one HP configuration
                    directory=SAVE_DIR,
                    project_name='basenet_hpo',
                    seed=RAND_SEED
)

INFO:tensorflow:Reloading Oracle from existing project ../models/CNN/hyperparameter-optimization\basenet_hpo\oracle.json


* Callbacks that implement Early Stopping regularization and regular checkpointing of models with highest validataion accuracy.

In [None]:
# model description which determines the name of the model when it is saved
model_description = {
    'name': 'basenet',
    'preprocessor': datagen.preprocessing_function.__name__,
    'optimizer': 'Adam_0.0001',
    'weights': 'random',
    'notes': '' # start with '-'
}
model_save = os.path.join(SAVE_DIR, model_description['name'] + '-' + model_description['preprocessor'] + '-' \
                          + model_description['optimizer'] + '-' + model_description['weights'] + model_description['notes'] + '.h5')

# callback for early stopping regularization
early_stopping = callbacks.EarlyStopping(
    monitor='val_binary_accuracy',
    min_delta=0,
    patience=5,
    verbose=1,
    mode='max',
    restore_best_weights=True
)

# callback for continuous checkpointing of best model configuration
checkpointing = callbacks.ModelCheckpoint(
    filepath=model_save,
    monitor='val_binary_accuracy',
    verbose=1,
    save_best_only=True,
    mode='max'
)

* Run the search for optimal hyperparameters.

In [None]:
tuner.search(x=train_gen, validation_data=valid_gen, epochs=EPOCH_CNT, verbose=2, callbacks=[early_stopping, checkpointing], class_weight=get_class_weights(TRAIN_DIR))

Epoch 1/3

Epoch 00001: val_binary_accuracy improved from -inf to 0.13515, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 687s - loss: 0.6939 - binary_accuracy: 0.6855 - val_loss: 0.6973 - val_binary_accuracy: 0.1352
Epoch 2/3

Epoch 00002: val_binary_accuracy improved from 0.13515 to 0.79492, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 783s - loss: 0.7001 - binary_accuracy: 0.4233 - val_loss: 0.6643 - val_binary_accuracy: 0.7949
Epoch 3/3

Epoch 00003: val_binary_accuracy did not improve from 0.79492
798/798 - 692s - loss: 0.6449 - binary_accuracy: 0.4874 - val_loss: 0.7274 - val_binary_accuracy: 0.4904


Epoch 1/3

Epoch 00001: val_binary_accuracy improved from -inf to 0.86453, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 722s - loss: 0.7243 - binary_accuracy: 0.5099 - val_loss: 0.6352 - val_binary_accuracy: 0.8645
Epoch 2/3

Epoch 00002: val_binary_accuracy did not improve from 0.86453
798/798 - 741s - loss: 0.7218 - binary_accuracy: 0.5068 - val_loss: 0.6959 - val_binary_accuracy: 0.1352
Epoch 3/3

Epoch 00003: val_binary_accuracy did not improve from 0.86453
798/798 - 807s - loss: 0.7174 - binary_accuracy: 0.4874 - val_loss: 0.7728 - val_binary_accuracy: 0.1352


Epoch 1/3

Epoch 00001: val_binary_accuracy improved from -inf to 0.28410, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 742s - loss: 0.6936 - binary_accuracy: 0.5214 - val_loss: 0.6929 - val_binary_accuracy: 0.2841
Epoch 2/3

Epoch 00002: val_binary_accuracy improved from 0.28410 to 0.70680, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 759s - loss: 0.6807 - binary_accuracy: 0.4807 - val_loss: 0.5815 - val_binary_accuracy: 0.7068
Epoch 3/3

Epoch 00003: val_binary_accuracy improved from 0.70680 to 0.79304, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 742s - loss: 0.5277 - binary_accuracy: 0.7317 - val_loss: 0.4952 - val_binary_accuracy: 0.7930


Epoch 1/3

Epoch 00001: val_binary_accuracy improved from -inf to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 635s - loss: 0.6934 - binary_accuracy: 0.3540 - val_loss: 0.6910 - val_binary_accuracy: 0.8648
Epoch 2/3

Epoch 00002: val_binary_accuracy did not improve from 0.86485
798/798 - 591s - loss: 0.6933 - binary_accuracy: 0.8098 - val_loss: 0.6930 - val_binary_accuracy: 0.8648
Epoch 3/3

Epoch 00003: val_binary_accuracy did not improve from 0.86485
798/798 - 504s - loss: 0.6933 - binary_accuracy: 0.7532 - val_loss: 0.6935 - val_binary_accuracy: 0.1352


Epoch 1/3

Epoch 00001: val_binary_accuracy improved from -inf to 0.13515, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 656s - loss: 0.6934 - binary_accuracy: 0.5654 - val_loss: 0.6938 - val_binary_accuracy: 0.1352
Epoch 2/3

Epoch 00002: val_binary_accuracy improved from 0.13515 to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 704s - loss: 0.6933 - binary_accuracy: 0.3372 - val_loss: 0.6923 - val_binary_accuracy: 0.8648
Epoch 3/3

Epoch 00003: val_binary_accuracy did not improve from 0.86485
798/798 - 683s - loss: 0.6933 - binary_accuracy: 0.6922 - val_loss: 0.6962 - val_binary_accuracy: 0.1352


Epoch 4/7

Epoch 00004: val_binary_accuracy improved from -inf to 0.13515, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 571s - loss: 0.6933 - binary_accuracy: 0.6415 - val_loss: 0.6968 - val_binary_accuracy: 0.1352
Epoch 5/7

Epoch 00005: val_binary_accuracy did not improve from 0.13515
798/798 - 597s - loss: 0.6933 - binary_accuracy: 0.4081 - val_loss: 0.6952 - val_binary_accuracy: 0.1352
Epoch 6/7

Epoch 00006: val_binary_accuracy improved from 0.13515 to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 561s - loss: 0.6933 - binary_accuracy: 0.4138 - val_loss: 0.6894 - val_binary_accuracy: 0.8648
Epoch 7/7

Epoch 00007: val_binary_accuracy did not improve from 0.86485
798/798 - 499s - loss: 0.6933 - binary_accuracy: 0.5023 - val_loss: 0.6959 - val_binary_accuracy: 0.1352


Epoch 4/7

Epoch 00004: val_binary_accuracy improved from -inf to 0.13515, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 531s - loss: 0.6934 - binary_accuracy: 0.6576 - val_loss: 0.6966 - val_binary_accuracy: 0.1352
Epoch 5/7

Epoch 00005: val_binary_accuracy did not improve from 0.13515
798/798 - 619s - loss: 0.6933 - binary_accuracy: 0.6434 - val_loss: 0.6949 - val_binary_accuracy: 0.1352
Epoch 6/7

Epoch 00006: val_binary_accuracy did not improve from 0.13515
798/798 - 535s - loss: 0.6933 - binary_accuracy: 0.5485 - val_loss: 0.6976 - val_binary_accuracy: 0.1352
Epoch 7/7

Epoch 00007: val_binary_accuracy improved from 0.13515 to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 600s - loss: 0.6934 - binary_accuracy: 0.2738 - val_loss: 0.6923 - val_binary_accuracy: 0.8648


Epoch 1/7

Epoch 00001: val_binary_accuracy improved from -inf to 0.13515, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 533s - loss: 0.7767 - binary_accuracy: 0.4949 - val_loss: 0.7622 - val_binary_accuracy: 0.1352
Epoch 2/7

Epoch 00002: val_binary_accuracy improved from 0.13515 to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 543s - loss: 0.7426 - binary_accuracy: 0.5013 - val_loss: 0.6781 - val_binary_accuracy: 0.8648
Epoch 3/7

Epoch 00003: val_binary_accuracy did not improve from 0.86485
798/798 - 571s - loss: 0.7311 - binary_accuracy: 0.4955 - val_loss: 0.5690 - val_binary_accuracy: 0.8648
Epoch 4/7

Epoch 00004: val_binary_accuracy did not improve from 0.86485
798/798 - 522s - loss: 0.6656 - binary_accuracy: 0.6017 - val_loss: 0.4795 - val_binary_accuracy: 0.8109
Epoch 5/7

Epoch 00005: val_binary_accuracy did not improve from 0.86485
798/79

Epoch 1/7

Epoch 00001: val_binary_accuracy improved from -inf to 0.42302, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 549s - loss: 0.6901 - binary_accuracy: 0.7135 - val_loss: 0.6389 - val_binary_accuracy: 0.4230
Epoch 2/7

Epoch 00002: val_binary_accuracy improved from 0.42302 to 0.67639, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 560s - loss: 0.5680 - binary_accuracy: 0.6736 - val_loss: 0.6944 - val_binary_accuracy: 0.6764
Epoch 3/7

Epoch 00003: val_binary_accuracy improved from 0.67639 to 0.82001, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 528s - loss: 0.3995 - binary_accuracy: 0.8324 - val_loss: 0.4502 - val_binary_accuracy: 0.8200
Epoch 4/7

Epoch 00004: val_binary_accuracy improved from 0.82001 to 0.88899, saving model to ../models/CNN/hyperparameter-optimization\basenet-preproc

Epoch 1/7

Epoch 00001: val_binary_accuracy improved from -inf to 0.86485, saving model to ../models/CNN/hyperparameter-optimization\basenet-preprocessor-Adam_0.0001-random.h5
798/798 - 609s - loss: 0.6933 - binary_accuracy: 0.3663 - val_loss: 0.6915 - val_binary_accuracy: 0.8648
Epoch 2/7

Epoch 00002: val_binary_accuracy did not improve from 0.86485
798/798 - 536s - loss: 0.6946 - binary_accuracy: 0.5735 - val_loss: 0.6950 - val_binary_accuracy: 0.1352
Epoch 3/7

Epoch 00003: val_binary_accuracy did not improve from 0.86485
798/798 - 614s - loss: 0.6933 - binary_accuracy: 0.4677 - val_loss: 0.6957 - val_binary_accuracy: 0.1352
Epoch 4/7

Epoch 00004: val_binary_accuracy did not improve from 0.86485
798/798 - 583s - loss: 0.6933 - binary_accuracy: 0.2535 - val_loss: 0.6938 - val_binary_accuracy: 0.1352
Epoch 5/7

Epoch 00005: val_binary_accuracy did not improve from 0.86485
798/798 - 574s - loss: 0.6933 - binary_accuracy: 0.2866 - val_loss: 0.6933 - val_binary_accuracy: 0.1352
Epoch 6

INFO:tensorflow:Oracle triggered exit


* Summarize the optimal hyperparameters found by the tuner.
* This configuration is then used to construct the BaseNet model definition used in other experiments.

In [None]:
# get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

activation_functions = ['ReLU', 'Parametrized ReLU (alpha=0.1)', 'Tanh']

print(f"""
Hyperparameter search complete after {(time.time() - start):.2f} seconds.\n
Optimal state for the residual connection: {best_hps.get('add_residual')}\n
Optimal dense layer neurons: {best_hps.get('dense_neurons')}\n
Optimal activation function: {activation_functions[best_hps.get('activation_function')]}\n
Optimal dropout rate: {best_hps.get('dropout')}
""")


Hyperparameter search complete after 46692.48 seconds.

Optimal state for the residual connection: True

Optimal dense layer neurons: 512

Optimal activation function: ReLU

Optimal dropout rate: 0.2

