# Neural Network Tuner

This notebook was made for tuning hyperparameters in a neural network. This is posible by using the library of `keras-tuner`; it runs multiple models for a set of configurations and chooses the best one depending on the performance metric selected.

We run the tuner in Google Colab due to how resource intensive the process is; and, to do so, we use this notebook. It contains all the objects and functions necessary to instantiate a neural network model similar to the ones used in the project. Some objects/functions have been simplify to the original as we don't need some parameters that they have.

## Install & import dependencies

First, install the `keras-tuner` in the notebook.

In [None]:
!pip install -U keras-tuner

Requirement already up-to-date: keras-tuner in /usr/local/lib/python3.6/dist-packages (1.0.1)


Import the libraries needed to run the notebook.

In [None]:
import io                                       # Read and writing files in the notebook session
import matplotlib.pyplot as plt                 # Graphs used in WindowGenerator
import numpy as np                              # Numpy
import pandas as pd                             # Load datasets from csv
import kerastuner as kt                         # Tunner of the models
import tensorflow as tf                         # Tensorflow
import tensorflow.keras.layers as layers        # Easier way to call layers instances
import tensorflow_addons.layers as layers_addon # Used in WeightNormalization

from google.colab import files                  # To upload and download files from the colab session
import IPython                                  # To clean the output of a cell

## Upload files

In [None]:
uploaded = files.upload()

## Setup

### ConfigFile

The `ConfigFile` object used for splitting the datasets into training, validation and test. More information about how to use it is available in the original project.

In [None]:
class ConfigFile:
    def __init__(self):
        self.num_data = 0
        self.training = 1
        self.validation = 0.1


### Standardization

Apply a standardization to all the the dataset by using the z-score. We assume the weather data is stationary to apply this function.

In [None]:
def standardize(dataset):
    # Computes the mean and standard deviation
    ds_mean = dataset.mean()
    ds_std = dataset.std()

    # Apply the method to the dataset
    dataset = (dataset - ds_mean) / ds_std

    return dataset

### Split dataset

Split the dataset into the datetime index originally used, training set, validation set and test set. The amount is data to use for each is determined by the `config_file`.

In [None]:
def split_dataset(dataset, config_file):
    # Resets index to add datetime as a normal column
    dataset = dataset.reset_index()

    # Pops the datetime from the dataset. Not use in the NN explicitly
    datetime_index = dataset.pop("Datetime")

    # Accumulates the ratio to use in slices.
    # Validation set is taken from the training set.
    train_ratio = config_file.training - config_file.validation  # e.g. from 0 to 0.5
    validation_ratio = config_file.training  # e.g. from 0.5 to 0.7

    # Divides the dataset
    train_ds = dataset[0:int(config_file.num_data * train_ratio)]
    val_ds = dataset[int(config_file.num_data * train_ratio):
                     int(config_file.num_data * validation_ratio)]
    test_ds = dataset[int(config_file.num_data * validation_ratio):]

    return datetime_index, train_ds, val_ds, test_ds

### WindowGenerator

The `WindowGenerator` used to set the input and label lenght, saves the splitted dataset and convert them to a `tf.dataset` when called by its property.

In [None]:
class WindowGenerator:

    def __init__(self, input_width, label_width, shift,
                 train_ds=None, val_ds=None, test_ds=None,
                 label_columns=None, batch_size=128):
        # Store the raw data (whatever type it is, but generally DataFrame).
        self.train_ds = train_ds
        self.val_ds = val_ds
        self.test_ds = test_ds

        # Work out the label column indices.
        # Associate each label column with a number to use as internal reference
        self.label_columns = label_columns
        if label_columns is not None:
            self.label_columns_indices = {name: i for i, name in
                                          enumerate(label_columns)}

        # Do the same with the input columns for the network
        # Remember that not all columns in the input are predicted (only labels)
        if train_ds is not None:
            self.column_indices = {name: i for i, name in
                                   enumerate(train_ds.columns)}

        # Work out the window parameters.
        self.input_width = input_width
        self.label_width = label_width
        self.shift = shift

        # Size of the batches when creating the tensorflow.data.Dataset
        self.batch_size = batch_size

        # Calculates the total amount of time-steps the window will take
        self.total_window_size = input_width + shift

        # Slices used to travel the dataset
        self.input_slice = slice(0, input_width)
        self.input_indices = np.arange(self.total_window_size)[self.input_slice]

        # The label (prediction) will always start counting from the end
        self.label_start = self.total_window_size - self.label_width

        # Slices used to travel the dataset
        self.labels_slice = slice(self.label_start, None)
        self.label_indices = np.arange(self.total_window_size)[self.labels_slice]

    def __repr__(self):
        return '\n'.join([
            f'Total window size: {self.total_window_size}',
            f'Input indices: {self.input_indices}',
            f'Label indices: {self.label_indices}',
            f'Label column name(s): {self.label_columns}'])

    def split_window(self, features):
        inputs = features[:, self.input_slice, :]
        labels = features[:, self.labels_slice, :]
        if self.label_columns is not None:
            labels = tf.stack(
                [labels[:, :, self.column_indices[name]]
                 for name in self.label_columns],
                axis=-1)

        # Slicing doesn't preserve static shape information, so set the shapes
        # manually. This way the `tf.data.Datasets` are easier to inspect.
        inputs.set_shape([None, self.input_width, None])
        labels.set_shape([None, self.label_width, None])

        return inputs, labels

    def plot(self, plot_col, model=None, max_subplots=3):
        # Takes the inputs and labels from the example
        inputs, labels = self.example

        # Sets the figure size
        plt.figure(figsize=(12, 8))

        # Gets the number of the index to plot from the input dictionary
        plot_col_index = self.column_indices[plot_col]

        # Resolves the amount of plots to do
        max_n = min(max_subplots, len(inputs))

        # Plots the points in the graph
        for n in range(max_n):
            plt.subplot(3, 1, n + 1)
            plt.ylabel(f'{plot_col} [normed]')
            plt.plot(self.input_indices, inputs[n, :, plot_col_index],
                     label='Inputs', marker='.', zorder=-10)

            # Gets the number of the index to plot from the label dictionary
            if self.label_columns:
                label_col_index = self.label_columns_indices.get(
                    plot_col, None)

            # If column isn't a label, then no labels are show in the graph
            # Use the plot_col as only the input
            else:
                label_col_index = plot_col_index

            # Don't graph the labels and continue with the next iteration
            if label_col_index is None:
                continue

            # Graphs the label points
            plt.scatter(self.label_indices, labels[n, :, label_col_index],
                        edgecolors='k', label='Labels', c='#2ca02c', s=64)

            # Generates the predictions to graph
            if model is not None:
                predictions = model(inputs)
                plt.scatter(self.label_indices, predictions[n, :, label_col_index],
                            marker='X', edgecolors='k', label='Predictions',
                            c='#ff7f0e', s=64)

            if n == 0:
                plt.legend()

        plt.xlabel('Time [h]')
        plt.show()

    def make_dataset(self, data):
        data = np.array(data, dtype=np.float32)
        ds = tf.keras.preprocessing.timeseries_dataset_from_array(
            data=data,
            targets=None,
            sequence_length=self.total_window_size,
            sequence_stride=1,
            shuffle=False,
            batch_size=self.batch_size, )

        ds = ds.map(self.split_window)

        return ds

    @property
    def train(self):
        return self.make_dataset(self.train_ds)

    @property
    def val(self):
        return self.make_dataset(self.val_ds)

    @property
    def test(self):
        return self.make_dataset(self.test_ds)

    @property
    def example(self):
        """
        Get and cache an example batch of `inputs, labels` for plotting
        from the training set.
        """
        result = getattr(self, '_example', None)

        if result is None:
            # No example batch was found, so get one from the `.train` dataset
            result = next(iter(self.train))
            # And cache it for next time
            self._example = result

        return result


### Gated activation

Gated activation used in the `ResidualBlock` layers and the `Conv1D` layer to preserve the dimensionality.

In [None]:
# Taken from https://github.com/xadrianzetx/temporal-conv-net-keras/blob/master/tcnet/activations.py

def gated_activation(x):
    # Used in PixelCNN and WaveNet
    tanh = layers.Activation('tanh')(x)
    sigmoid = layers.Activation('sigmoid')(x)
    return layers.multiply([tanh, sigmoid])


### ResidualBlock

Implementation of the residual layer used in a temporal convolutional network.

In [None]:
class ResidualBlock(tf.keras.Model):
    def __init__(self, filters, kernel_size=4, stride=1, dilation=1,
                 padding="causal", dropout=0.2):
        super(ResidualBlock, self).__init__()

        self.conv1 = layers_addon.WeightNormalization(layers.Conv1D(filters=filters,
                                                                    kernel_size=kernel_size,
                                                                    strides=stride,
                                                                    padding=padding,
                                                                    dilation_rate=dilation))
        self.activation1 = layers.Activation("relu")
        self.dropout1 = layers.Dropout(dropout)

        self.conv2 = layers_addon.WeightNormalization(layers.Conv1D(filters=filters,
                                                                    kernel_size=kernel_size,
                                                                    strides=stride,
                                                                    padding=padding,
                                                                    dilation_rate=dilation))
        self.activation2 = layers.Activation("relu")
        self.dropout2 = layers.Dropout(dropout)

        # Residual layer of conv 1x1 for feature mapping
        self.residual = layers.Conv1D(filters=filters,
                                      kernel_size=1)
        
        # Layer to add the residual layer with the rest
        self.add = layers.Add()

    def call(self, inputs, training=False):
        x = self.conv1(inputs)
        x = self.activation1(x)
        if training:
            x = self.dropout1(x, training=training)

        x = self.conv2(x)
        x = self.activation2(x)
        if training:
            x = self.dropout2(x)

        # Adds the residual layer
        y = self.residual(inputs)
        x = self.add([x, y])

        return x

## Prepare the dataset

Uploads the dataset to used in the training and print a description of it.

In [None]:
# Loads the dataset
dataset =  pd.read_csv('benchmark_hour.csv',
                      engine="c", index_col="Datetime", parse_dates=True)

# Information of the dataset
print(dataset.info(verbose=True))
print(dataset.describe().transpose())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 90393 entries, 2010-04-07 00:00:00 to 2020-07-29 08:00:00
Data columns (total 19 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Temp Out        90393 non-null  float64
 1   Hi Temp         90393 non-null  float64
 2   Low Temp        90393 non-null  float64
 3   Out Hum         90393 non-null  float64
 4   Wind Speed      90393 non-null  float64
 5   Hi Speed        90393 non-null  float64
 6   Bar             90393 non-null  float64
 7   Rain            90393 non-null  float64
 8   Solar Rad.      90393 non-null  float64
 9   Hi Solar Rad.   90393 non-null  float64
 10  In Temp         90393 non-null  float64
 11  In Hum          90393 non-null  float64
 12  Soils 1 Moist.  90393 non-null  float64
 13  Leaf Wet 1      90393 non-null  float64
 14  Leaf Wet Accum  90393 non-null  float64
 15  day sin         90393 non-null  float64
 16  day cos         90393 non-null  float64
 

Prepares the objects used in the training. This means creating the config file to split the data, standardizing the dataset and dividing it, and creating the window used in the training.

In [None]:
# *** Config File
# Use in the dataset partition
config = ConfigFile()

# Use the entire data with the benchmarks, as the models won't be saved
config.num_data, config.num_features = dataset.shape

# *** Dataset preparation
# Normalize the dataset
dataset = standardize(dataset)

# Partition the dataset
_, train, val, test = split_dataset(dataset, config)

# *** Window
# A week in hours
input_width = 7 * 24
label_width = input_width
label_columns = dataset.columns.tolist()

# Removes th sin/cos columns from the labels
label_columns = label_columns[:-4]

# Window of 7 days for testing the NN
window = WindowGenerator(input_width=input_width,
                         label_width=label_width,
                         shift=label_width,
                         train_ds=train,
                         val_ds=val,
                         test_ds=test,
                         label_columns=label_columns,
                         batch_size=256)


## Define the models

### Convolutional model

Function that return a model of a convolutional network with the hyperparamaters tuner placed in the desired parameters.

The hyperparameters tuned are:

- Filter size for each of a 3-layer architecture
  - From 16 to 512 using steps of 32
- Learning rate
  - 0.01, 0.001 or 0.0001

In [None]:
def convolutional_builder(hp):
    # Shape => [batch, input_width, features]
    inputs = layers.Input(shape=(168, 19))

    # Tune the number of filters in the first convolutional layer
    hp_filters_1 = hp.Int('filters_1', min_value=16, max_value=512, step=32)
    x = layers.Conv1D(filters=hp_filters_1,
                      kernel_size=24,
                      activation="relu")(inputs)

    x = layers.Dropout(0.1)(x)
    x = layers.MaxPool1D(pool_size=2)(x)

    # Tune the number of filters in the second convolutional layer
    hp_filters_2 = hp.Int('filters_2', min_value=16, max_value=512, step=32)
    x = layers.Conv1D(filters=hp_filters_2,
                      kernel_size=24,
                      activation="relu")(x)

    x = layers.Dropout(0.1)(x)
    x = layers.MaxPool1D(pool_size=2)(x)

    # Tune the number of filters in the third convolutional layer
    hp_filters_3 = hp.Int('filters_3', min_value=16, max_value=512, step=32)
    x = layers.Conv1D(filters=hp_filters_3,
                      kernel_size=24,
                      activation="relu")(x)

    # Shape => [batch, 1,  label_width*label_columns]
    dense = layers.Dense(units=168*15,
                         activation="linear")(x)

    # Shape => [batch, label_width, label_columns]
    outputs = layers.Reshape([168, 15])(dense)

    model = tf.keras.Model(inputs=inputs, outputs=outputs, name="conv_model")

    # Tune the learning rate for the optimizer
    # Choose an optimal value from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss=tf.losses.MeanSquaredError(),
                  metrics=[tf.metrics.MeanAbsoluteError(),
                           tf.metrics.MeanAbsolutePercentageError()])

    return model

### Temporal convolutional model

Function that return a model of a temporal convolutional network with the hyperparamaters tuner placed in the desired parameters.

The hyperparameters tuned are:

- Filter size for each layer
  - From 32 to 256 using steps of 32
- Kernel size for each layer
  - From 2 to 12 using steps of 2
- Amount of dilations (influence the depth of the NN)
  - From 1 to 6 using steps of 2
- Learning rate
  - 0.01, 0.001 or 0.0001

In [None]:
def temporal_builder(hp):
    # Shape => [batch, input_width, features]
    inputs = layers.Input(shape=(168, 19))

    # Tune the number of filters in the first convolutional layer
    hp_filters_0 = hp.Int('filters_0', min_value=32, max_value=256, step=32)
    hp_kernels_0 = hp.Int('kernels_0', min_value=2, max_value=12, step=2)
    x = ResidualBlock(filters=hp_filters_0, kernel_size=hp_kernels_0)(inputs)

    # Tune the number of dilations that the network has
    dilations = hp.Int('dilations', min_value=1, max_value=6, step=2)

    for factor in range(1, dilations):
        # Tune the number of filters in the n-th convolutional layer
        hp_filters = hp.Int('filters_{}'.format(factor), min_value=32, max_value=256, step=32)
        hp_kernels = hp.Int('kernels_{}'.format(factor), min_value=2, max_value=12, step=2)

        dilation = 2 ** factor
        x = ResidualBlock(filters=hp_filters, kernel_size=hp_kernels,
                              dilation=dilation)(x)

    # Shape => [batch, label_width, label_columns]
    output = layers.Dense(15)(x)

    model = tf.keras.Model(inputs=inputs, outputs=output, name="tcn_model")

    # Tune the learning rate for the optimizer
    # Choose an optimal value from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss=tf.losses.MeanSquaredError(),
                  metrics=[tf.metrics.MeanAbsoluteError(),
                           tf.metrics.MeanAbsolutePercentageError()])

    return model

## Instantiate the tuner and perform hypertuning

Callback to clear the cell's output. Prevents a cluster of information during the tuning.

In [None]:
class ClearTrainingOutput(tf.keras.callbacks.Callback):
  def on_train_end(*args, **kwargs):
    IPython.display.clear_output(wait = True)

### Convolutional network

Instantiate a tuner object with the model to use, the objective to maximize/minimize, and other variables.

In [None]:
convolutional_tuner = kt.Hyperband(convolutional_builder,
                                   objective="val_loss",
                                   max_epochs=10,
                                   factor=3,
                                   directory="my_dir",
                                   project_name="tuner_convolutional")

Use the tuner to search for the best values. Then print the best model configuration and its hyperparameters.

In [None]:
convolutional_tuner.search(window.train, epochs=10, validation_data=window.val, callbacks = [ClearTrainingOutput()])

# Cleans the output
IPython.display.clear_output(wait = True)

# Get the optimal hyperparameters and model
best_conv_hps = convolutional_tuner.get_best_hyperparameters(3)
best_conv_model = convolutional_tuner.get_best_models(3)

#Prints the best three models
for i in range(3):
  print("The hyperparameter search is complete. The optimal model #{} is:".format(i))
  print("Learning rate: {}".format(best_conv_hps[i].get('learning_rate')))

  print("\nSummary of the convolutional layers:")
  conv_layer_counter = 0
  for model_layers in best_conv_model[i].layers:
    # Print the values of the convolutional layer
    if isinstance(model_layers, layers.Conv1D):
      print("Layer #{}. Filters: {}".format(conv_layer_counter, 
                                            model_layers.filters))
      conv_layer_counter += 1

  print(best_conv_model[i].summary())

### Temporal convolutional network

Instantiate a tuner object with the model to use, the objective to maximize/minimize, and other variables.

In [None]:
temporal_tuner = kt.Hyperband(temporal_builder,
                              objective="val_loss",
                              max_epochs=10,
                              factor=3,
                              seed=5,
                              directory="my_dir",
                              project_name="tuner_temporal")

INFO:tensorflow:Reloading Oracle from existing project my_dir/tuner_temporal/oracle.json
INFO:tensorflow:Reloading Tuner from my_dir/tuner_temporal/tuner0.json


Use the tuner to search for the best values. Then print the best model configuration and its hyperparameters.

In [None]:
temporal_tuner.search(window.train, epochs=10, validation_data=window.val, callbacks = [ClearTrainingOutput()])

# Cleans the output
IPython.display.clear_output(wait = True)

# Get the optimal hyperparameters and model
best_temp_hps = temporal_tuner.get_best_hyperparameters(3)
best_temp_model = temporal_tuner.get_best_models(3)

#Prints the best three models
for i in range(3):
  print("The hyperparameter search is complete. The optimal model #{} is:".format(i))
  print("Learning rate: {}".format(best_temp_hps[i].get('learning_rate')))
  print("Dilations: {}".format(best_temp_hps[i].get('dilations')))

  print("\nSummary of the residual layers:")
  residual_layer_counter = 0
  for model_layers in best_temp_model[i].layers:
    # Print the values of the Residual Layer
    if isinstance(model_layers, ResidualBlock):
      # Unwraps the weight normalization layer into the Conv1D
      print("Layer #{}. Filters: {}".format(residual_layer_counter, 
                                            model_layers.layers[0].layer.filters))
      print("Layer #{}. Kernels: {}".format(residual_layer_counter, 
                                            model_layers.layers[0].layer.kernel_size))
      print("Layer #{}. Dilation: {}".format(residual_layer_counter, 
                                            model_layers.layers[0].layer.dilation_rate))
      residual_layer_counter += 1

  print(best_temp_model[i].summary())

INFO:tensorflow:Oracle triggered exit
The hyperparameter search is complete. The optimal model is:
Learning rate: 0.0001
Dilations: 5

Summary of the residual layers:
Layer #0. Filters: 192
Layer #0. Kernels: (6,)
Layer #0. Dilation: (1,)
Layer #1. Filters: 64
Layer #1. Kernels: (12,)
Layer #1. Dilation: (2,)
Layer #2. Filters: 128
Layer #2. Kernels: (10,)
Layer #2. Dilation: (4,)
Layer #3. Filters: 96
Layer #3. Kernels: (6,)
Layer #3. Dilation: (8,)
Layer #4. Filters: 128
Layer #4. Kernels: (10,)
Layer #4. Dilation: (16,)
Model: "tcn_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 168, 19)]         0         
_________________________________________________________________
residual_block (ResidualBloc (None, 168, 192)          491138    
_________________________________________________________________
residual_block_1 (ResidualBl (None, 168, 64)           4


Summary of the residual layers:
Layer #0. Filters: 160
Layer #0. Kernels: (10,)
Layer #0. Dilation: (1,)
Layer #1. Filters: 160
Layer #1. Kernels: (2,)
Layer #1. Dilation: (2,)
Layer #2. Filters: 96
Layer #2. Kernels: (2,)
Layer #2. Dilation: (4,)
Model: "tcn_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 168, 19)]         0         
_________________________________________________________________
residual_block (ResidualBloc (None, 168, 160)          576962    
_________________________________________________________________
residual_block_1 (ResidualBl (None, 168, 160)          231522    
_________________________________________________________________
residual_block_2 (ResidualBl (None, 168, 96)           114338    
_________________________________________________________________
dense (Dense)                (None, 168, 15)           1455      
Total 


Summary of the residual layers:
Layer #0. Filters: 256
Layer #0. Kernels: (6,)
Layer #0. Dilation: (1,)
Layer #1. Filters: 96
Layer #1. Kernels: (10,)
Layer #1. Dilation: (2,)
Layer #2. Filters: 96
Layer #2. Kernels: (2,)
Layer #2. Dilation: (4,)
Layer #3. Filters: 192
Layer #3. Kernels: (6,)
Layer #3. Dilation: (8,)
Layer #4. Filters: 128
Layer #4. Kernels: (12,)
Layer #4. Dilation: (16,)
Model: "tcn_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 168, 19)]         0         
_________________________________________________________________
residual_block (ResidualBloc (None, 168, 256)          851458    
_________________________________________________________________
residual_block_1 (ResidualBl (None, 168, 96)           701090    
_________________________________________________________________
residual_block_2 (ResidualBl (None, 168, 96)           8361

## Download results

Creates a zip of the tests done and downloads it to the PC.

In [None]:
# Download log of the temporal test
!zip -r /content/temporal.zip /content/my_dir/tuner_temporal/
files.download("/content/temporal.zip") 

In [None]:
# Download log of the convolutional test
!zip -r /content/convolutional.zip /content/my_dir/tuner_convolutional/
files.download( "/content/convolutional.zip" ) 