# Hyperparameter Optimization

## Preliminaries
### Introduction
For the best performance for an ML-Model, we need to find the best hyperparameters. This file is for finding those. We want to evaluate
* Number of Neurons in each layer (for 20 Epochs, learning rate of 0.001 and lag of 60: best at 800)
* Number of epochs/Batch_Size
* learning rate
* lag

For each of these parameters, we conduct an experiment of 50 values within chosen boundaries while all other parameters stay the same. For each of the parameters we safe the top 3 parameters and conduct a cross analysis out of those which combination works best.

### Load libraries

In [1]:
# homemade libraries
import Global_Functions as gf
import Neuronal_Networks as nn
import Data_Processing as dp

# Processing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random

# ML libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.models import Sequential
from keras.callbacks import History

import time
timestr = time.strftime("%Y-%m-%d_%H-%M/")

### Load data

In [2]:
OPEN_FOLDER = '../Data/Preped_Data/'

In [3]:
ex_1 = gf.open_CSV_file('experiment_1_short.csv', OPEN_FOLDER)
ex_4 = gf.open_CSV_file('experiment_4_short.csv', OPEN_FOLDER)
ex_9 = gf.open_CSV_file('experiment_9_short.csv', OPEN_FOLDER)
ex_20 = gf.open_CSV_file('experiment_20_short.csv', OPEN_FOLDER)
ex_21 = gf.open_CSV_file('experiment_21_short.csv', OPEN_FOLDER)
ex_22 = gf.open_CSV_file('experiment_22_short.csv', OPEN_FOLDER)
ex_23 = gf.open_CSV_file('experiment_23_short.csv', OPEN_FOLDER)
ex_24 = gf.open_CSV_file('experiment_24_short.csv', OPEN_FOLDER)

In [4]:
experiments = [ex_1, ex_4, ex_9, ex_20, ex_21, ex_22, ex_23, ex_24]
names = ['1', '4', '9', '20', '21','22', '23', '24']

In [5]:
train = "20"
val = "21"
test = "22"

ex_train = ex_20
ex_val = ex_21
ex_test = ex_22

### Specify default parameters
Once again keep in mind, we change one of those parameters in every run through.

In [6]:
NEURONS = 128       #sensible boundaries are [8, 1024]
EPOCHS = 500        #sensible boundaries are [10,500]
LAG = 60            #sensible boundaries are [1,1000]
LEARNING_RATE = 0.001 #sensible boundaries are [0.001, 1]
BATCH_SIZE = 512    #sensible boundaries are [1, 1024]
ITERATIONS = 10000

In [7]:
image_path = "../Images/Hyperparameter_Optimization/" + timestr
image_subfolder = image_path
gf.check_folder(image_subfolder)

Creation of directory ../Images/Hyperparameter_Optimization/2021-05-27_14-39/ successful.


## Preprocessing of data

In [8]:
def prepare_data(data, lag = 1, all_inputs = True, include_power = False):
    length = len(data)
    input_scaled, scaler_input = dp.scale(data['input_voltage'])
    power_scaled, scaler_power = dp.scale(data['el_power'])
    
    scaler = [scaler_input, scaler_power]
    
    df = pd.DataFrame()
    
    for i in range(lag):
        df['input_voltage_delay_' + str(i)] = np.roll(input_scaled, i)[:length]
        df['el_power_delay_' + str(i)] = np.roll(power_scaled, i)[:length]
        
    # use either all lags or only the immediate input and the voltages just before.
    if all_inputs:
        filter_cols = [col for col in df if col.startswith('input_voltage_delay')]
        X = df[filter_cols]
    else:
        X = df[['input_voltage_delay_0', 'input_voltage_delay_' + str(lag-1)]]
    
    if include_power:
        X['el_power_delay_' + str(lag -1)] = df['el_power_delay_' + str(lag-1)]
        
    df['differences'] = df['el_power_delay_' + str(lag -1)] - df['el_power_delay_0']
    differences_scaled, scaler_differences = dp.scale(df['differences'])
    df['differences'] = differences_scaled
    scaler.append(scaler_differences)
    
    y = df[['el_power_delay_0', 'differences']]
    
    X = X.values
    X = X.reshape(X.shape[0],1 , X.shape[1])
    
    y = y.values
    y = y.reshape(y.shape[0], 1, y.shape[1])
    
    return scaler, X, y

In [9]:
def plot_performance(history):
    # Plot the loss function
    fig, ax = plt.subplots(1, 1, figsize=(10,6))
    ax.plot(np.sqrt(history.history['loss']), 'r', label='train')
    ax.plot(np.sqrt(history.history['val_loss']), 'b' ,label='val')
    ax.set_xlabel(r'Epoch', fontsize=20)
    ax.set_ylabel(r'Loss', fontsize=20)
    ax.legend()
    ax.tick_params(labelsize=20)

    # Plot the accuracy
    fig, ax = plt.subplots(1, 1, figsize=(10,6))
    ax.plot(np.sqrt(history.history['accuracy']), 'r', label='train')
    ax.plot(np.sqrt(history.history['val_accuracy']), 'b' ,label='val')
    ax.set_xlabel(r'Epoch', fontsize=20)
    ax.set_ylabel(r'Accuracy', fontsize=20)
    ax.legend()
    ax.tick_params(labelsize=20)

In [10]:
def predictions(experiment, model, image_fol, batch_size = 1, specs = ""):
    scaler, X, y = prepare_data(experiment, lag = LAG)
    
    preds_scaled = model.predict(X, batch_size = batch_size)
    preds = scaler[1].inverse_transform(preds_scaled)[:,0]
    preds_diff = scaler[1].inverse_transform(scaler[2].inverse_transform(preds_scaled)[:,1])
    true = scaler[1].inverse_transform(y[:,0])
    
    fig = plt.figure(figsize = (15,10))
    plt.plot(true[:,0], color = gf.get_color("grey"), label = "True")
    plt.plot(preds, color = gf.get_color("green"), label = "Predictions")
    plt.ylabel('Electric power [W]', fontsize = 18)
    plt.xlabel('Time [sec]', fontsize = 18)
    plt.legend()
    plt.title('Predictions using loss function ' + specs, fontsize = 25)
    fig.tight_layout()
    plt.show()
    fig.savefig(image_fol + specs + "predictions.png")
    fig.savefig(image_fol + specs + "predictions.svg")
    
    fig.clear()
    
    fig = plt.figure(figsize = (15,10))
    plt.plot(true[:,1], color = gf.get_color("grey"), label = "True")
    plt.plot(preds_diff, color = gf.get_color("green"), label = "Predictions")
    plt.ylabel('Difference in Electric power [W]', fontsize = 18)
    plt.xlabel('Time [sec]', fontsize = 18)
    plt.legend()
    plt.title('Predictions of Differences using loss function ' + specs, fontsize = 25)
    fig.tight_layout()
    plt.show()
    fig.savefig(image_fol + specs + "preds_differences.png")
    fig.savefig(image_fol + specs + "preds_differences.svg")
    
    return scaler, X, y, preds_scaled, preds, preds_diff

In [11]:
def make_predictions(model, image_fol):
    investigate_experiments = [ex_train, ex_val, ex_test]
    investigate_names = ['train', 'val', 'test']
    for i in range(len(investigate_experiments)):
        predictions(investigate_experiments[i], model, image_fol, specs = "on {0} data".format(investigate_names[i]))

## Find optimal hyperparameters

### Neurons
We start with the number of neurons in each layer. Note, that this number only holds for the first value since the second layer is only half of the neurons. The output layer only contains two nodes.

In [12]:
random.seed(123)

In [13]:
image_folder = image_subfolder + "BATCH_SIZE/"
gf.check_folder(image_folder)

Creation of directory ../Images/Hyperparameter_Optimization/2021-05-27_14-39/BATCH_SIZE/ successful.


In [14]:
possible_batch_sizes = []
# for i in range(40):
#     possible_batch_sizes.append(random.randint(16, 1024))
for i in range(12):
    possible_batch_sizes.append(int(2**(i)))


In [23]:
def train_all_models(top = 3):
    all_models = list()
    all_histories = list()
    all_accs = list()
    all_accs_val = list()
    all_losses = list()
    all_losses_val = list()
    
    scaler_train, X_train, y_train = dp.prepare_data(ex_train, lag = LAG)
    scaler_val, X_val, y_val = dp.prepare_data(ex_val, lag = LAG)

    for b in possible_batch_sizes:
        batches_per_run = int(len(X_train)//b)
        EPOCHS = int(ITERATIONS//batches_per_run)
        model, history = nn.fit_lstm(X_train, y_train, X_val, y_val, batch_size = b,
                                     nb_epochs = EPOCHS, neurons = NEURONS)
        model.save("../Model/Hyper_parameter/BATCH_SIZE/" + str(b) +"/model.h5")
        
        losses = []
        val_losses = []
        for i in history:
            losses.append(i.history['loss'])
            val_losses.append(i.history['val_loss'])
        all_models.append(model)
        all_histories.append(history)
        all_losses.append(losses[0])
        all_losses_val.append(val_losses[0])
        
    all_losses_val = [config_loss[0] for config_loss in all_losses_val]
    models_losses_val_sorted = all_losses_val.copy()
    models_losses_val_sorted.sort(reverse=True)
    top_losses = models_losses_val_sorted[:top]
    top_models = list()
    top_histories = list()
    indices = list()
    for rank in top_losses:
        index = all_losses_val.index(rank)
        top_models.append(all_models[index])
        top_histories.append(all_histories[index])
        indices.append(index)
    return top_models, top_histories, indices

In [24]:
def analysze_top3(history):
    for h in history:
        plot_performance(h)

In [25]:
top_3_models, top_3_histories, indices = train_all_models(top = 3)

Epoch 10 of 12 is done.
Epoch 10 of 24 is done.
Epoch 20 of 24 is done.
Epoch 10 of 49 is done.
Epoch 20 of 49 is done.
Epoch 30 of 49 is done.
Epoch 40 of 49 is done.
Epoch 10 of 99 is done.
Epoch 20 of 99 is done.
Epoch 30 of 99 is done.
Epoch 40 of 99 is done.
Epoch 50 of 99 is done.
Epoch 60 of 99 is done.
Epoch 70 of 99 is done.
Epoch 80 of 99 is done.
Epoch 90 of 99 is done.
Epoch 10 of 200 is done.
Epoch 20 of 200 is done.
Epoch 30 of 200 is done.
Epoch 40 of 200 is done.
Epoch 50 of 200 is done.
Epoch 60 of 200 is done.
Epoch 70 of 200 is done.
Epoch 80 of 200 is done.
Epoch 90 of 200 is done.
Epoch 100 of 200 is done.
Epoch 110 of 200 is done.
Epoch 120 of 200 is done.
Epoch 130 of 200 is done.
Epoch 140 of 200 is done.
Epoch 150 of 200 is done.
Epoch 160 of 200 is done.
Epoch 170 of 200 is done.
Epoch 180 of 200 is done.
Epoch 190 of 200 is done.
Epoch 200 of 200 is done.
Epoch 10 of 400 is done.
Epoch 20 of 400 is done.
Epoch 30 of 400 is done.
Epoch 40 of 400 is done.
Epoch

Epoch 1520 of 1666 is done.
Epoch 1530 of 1666 is done.
Epoch 1540 of 1666 is done.
Epoch 1550 of 1666 is done.
Epoch 1560 of 1666 is done.
Epoch 1570 of 1666 is done.
Epoch 1580 of 1666 is done.
Epoch 1590 of 1666 is done.
Epoch 1600 of 1666 is done.
Epoch 1610 of 1666 is done.
Epoch 1620 of 1666 is done.
Epoch 1630 of 1666 is done.
Epoch 1640 of 1666 is done.
Epoch 1650 of 1666 is done.
Epoch 1660 of 1666 is done.
Epoch 10 of 3333 is done.
Epoch 20 of 3333 is done.
Epoch 30 of 3333 is done.
Epoch 40 of 3333 is done.
Epoch 50 of 3333 is done.
Epoch 60 of 3333 is done.
Epoch 70 of 3333 is done.
Epoch 80 of 3333 is done.
Epoch 90 of 3333 is done.
Epoch 100 of 3333 is done.
Epoch 110 of 3333 is done.
Epoch 120 of 3333 is done.
Epoch 130 of 3333 is done.
Epoch 140 of 3333 is done.
Epoch 150 of 3333 is done.
Epoch 160 of 3333 is done.
Epoch 170 of 3333 is done.
Epoch 180 of 3333 is done.
Epoch 190 of 3333 is done.
Epoch 200 of 3333 is done.
Epoch 210 of 3333 is done.
Epoch 220 of 3333 is d

Epoch 2830 of 3333 is done.
Epoch 2840 of 3333 is done.
Epoch 2850 of 3333 is done.
Epoch 2860 of 3333 is done.
Epoch 2870 of 3333 is done.
Epoch 2880 of 3333 is done.
Epoch 2890 of 3333 is done.
Epoch 2900 of 3333 is done.
Epoch 2910 of 3333 is done.
Epoch 2920 of 3333 is done.
Epoch 2930 of 3333 is done.
Epoch 2940 of 3333 is done.
Epoch 2950 of 3333 is done.
Epoch 2960 of 3333 is done.
Epoch 2970 of 3333 is done.
Epoch 2980 of 3333 is done.
Epoch 2990 of 3333 is done.
Epoch 3000 of 3333 is done.
Epoch 3010 of 3333 is done.
Epoch 3020 of 3333 is done.
Epoch 3030 of 3333 is done.
Epoch 3040 of 3333 is done.
Epoch 3050 of 3333 is done.
Epoch 3060 of 3333 is done.
Epoch 3070 of 3333 is done.
Epoch 3080 of 3333 is done.
Epoch 3090 of 3333 is done.
Epoch 3100 of 3333 is done.
Epoch 3110 of 3333 is done.
Epoch 3120 of 3333 is done.
Epoch 3130 of 3333 is done.
Epoch 3140 of 3333 is done.
Epoch 3150 of 3333 is done.
Epoch 3160 of 3333 is done.
Epoch 3170 of 3333 is done.
Epoch 3180 of 3333 i

In [26]:
indices

[11, 10, 9]

In [27]:
for i in indices:
    print(possible_batch_sizes[i])

2048
1024
512


We found in our case the top 3 performing models had a 16, 795 or 800 neurons. Since 795 and 800 are quite similar, we will only compare 16 and 800 in the cross-optimization.

We also found the larger the batch_size (for 500 Epochs) the better are the results. the best results were achieved with a batch_size of 2048 and 1024 (for 500 Epochs with only 1 fully connected layer as well as LSTM + 2 dense).