# Optuna Hyperparameter Optimisation

This notebook is used to perform the Hyperparameter Optimisation (HO) of the centralised model on centralised data. The centralised model serves as a reference of the gold standard that can be achieved in Federated Learning. For this reason, we also run HO in the centralised data. The code of this notebook uses an interface that we developed for communication with [Optuna](https://optuna.org/).

## Imports

In [None]:
import os
import sys
module_path = os.path.abspath(os.path.join('..' + os.sep + '..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [None]:
import pandas as pd
import numpy as np
import optuna
from experiment_parameters.TrainerFactory import dataset_model_dictionary
import xgboost as xgb

In [None]:
from experiment_parameters.model_builder.ModelBuilder import Director, get_training_configuration
from metrics.Metrics import DictOfMetrics
from metrics.Evaluator import evaluator

If GPU is available, added this code for memory growth.

In [None]:
# import tensorflow as tf
#
# gpus = tf.config.experimental.list_physical_devices('GPU')
# try:
#     for gpu in gpus:
#         tf.config.experimental.set_memory_growth(gpu, True)
# except RuntimeError as e:
#     print(e)

If GPU is not available, or it does not work, I recommend running the code in the CPU. For this:

In [None]:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

## Model Factory

We use the director pattern for generating models with different configurations given by Optuna. The hyperparameters of the model are given by optuna using _trial_. We use [Cross Entropy Loss](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html#sklearn.metrics.log_loss) for classification and [Median Absolute Error](https://en.wikipedia.org/wiki/Mean_absolute_error) for regression.

In [None]:
director = Director()

In [None]:
def get_parameters(trial, model_type):
    parameters = get_training_configuration(trial=trial, model_type=model_type)
    return parameters

In [None]:
def get_mlp(input_dim, num_classes, parameters):
    return director.create_mlp(input_parameters=input_dim, num_classes=num_classes, parameters=parameters)

In [None]:
def get_xgboost_tree(input_dim, num_classes, parameters):
    return director.create_xgboost(input_parameters=input_dim, num_classes=num_classes, parameters=parameters)

## MLP Wine

We offer a quick training using the Wine dataset, so you can check if the code works.

In [None]:
metric_list = ["CrossEntropyLoss"]

In [None]:
dataset_factory = dataset_model_dictionary["wine"]()
X_train, y_train = dataset_factory.get_dataset().get_training_data()
X_test, y_test = dataset_factory.get_dataset().get_test_data()

In [None]:
def wine_mlp_optimization(trial):
    parameters = get_parameters(trial=trial, model_type="mlp")
    ml_model = get_mlp(input_dim=X_train.shape[1], num_classes=y_train.shape[1], parameters=parameters)
    ml_model.fit(X_train, y_train, batch_size=parameters["batch_size"]) 
    metrics: DictOfMetrics = evaluator(X_train, y_train, ml_model, metric_list)
    return metrics.get_value_of_metric("CrossEntropyLoss")

In [None]:
from util.OptunaConnection import optuna_create_study

# 3. Create a study object and optimize the objective function.
study = optuna_create_study("mlp_wine", direction=['minimize'])
study.optimize(wine_mlp_optimization, n_trials=60)

## MLP HAR

This is the code for optimisation on the Human Activity Recognition(HAR) dataset.

In [None]:
dataset_factory = dataset_model_dictionary["har"]()
X_train, y_train = dataset_factory.get_dataset().get_training_data()
X_test, y_test = dataset_factory.get_dataset().get_test_data()

In [None]:
def har_optimization(trial):
    parameters = get_parameters(trial=trial, model_type="mlp")
    ml_model = get_mlp(input_dim=X_train.shape[1], num_classes=y_train.shape[1], parameters=parameters)
    ml_model.fit(X_train, y_train, epochs=20, batch_size=parameters["batch_size"]) 
    metrics: DictOfMetrics = evaluator(X_train, y_train, ml_model, metric_list)
    return metrics.get_value_of_metric("CrossEntropyLoss")

In [None]:
from util.OptunaConnection import optuna_create_study

# 3. Create a study object and optimize the objective function.
study = optuna_create_study("mlp_har", direction=['minimize'])
study.optimize(har_optimization, n_trials=60)

## MLP Edge-IIOT-Coreset

This is the code for optimisation on the IIoT attack dataset, with the reduction using the _Coreset_ method.

In [None]:
## Grid Search MLP HAR
from experiment_parameters.TrainerFactory import dataset_model_dictionary
from util.OptunaConnection import optuna_create_study

dataset_factory = dataset_model_dictionary["edge-iot-coreset"]()
X_train, y_train = dataset_factory.get_dataset().get_training_data()
X_test, y_test = dataset_factory.get_dataset().get_test_data()

In [None]:
def edge_iiot_coreset_optimization(trial):
    parameters = get_parameters(trial=trial, model_type="mlp")
    ml_model = get_mlp(input_dim=X_train.shape[1], num_classes=y_train.shape[1], parameters=parameters)
    ml_model.fit(X_train, y_train, epochs=30, batch_size=parameters["batch_size"]) 
    metrics: DictOfMetrics = evaluator(X_train, y_train, ml_model, metric_list)
    return metrics.get_value_of_metric("CrossEntropyLoss")

In [None]:
study = optuna_create_study("mlp_edge_iiot_coreset", direction=['minimize'])
study.optimize(edge_iiot_coreset_optimization, n_trials=30)

## MLP Electric Consumption

This is the code for optimisation on the Electric Consumption dataset.

In [None]:
## Grid Search MLP HAR
from experiment_parameters.TrainerFactory import dataset_model_dictionary
from util.OptunaConnection import optuna_create_study

metric_list = ["MAE"]

dataset_factory = dataset_model_dictionary["electric-consumption"]()
X_train, y_train = dataset_factory.get_dataset().get_training_data()
X_test, y_test = dataset_factory.get_dataset().get_test_data()

In [None]:
def electric_consumption_optimisation(trial):
    parameters = get_parameters(trial=trial, model_type="mlp")
    shape: int
    try: shape = y_train.shape[1] 
    except: shape = 1
    ml_model = get_mlp(input_dim=X_train.shape[1], num_classes=shape, parameters=parameters)
    ml_model.fit(X_train, y_train, batch_size=parameters["batch_size"]) 
    metrics: DictOfMetrics = evaluator(X_train, y_train, ml_model, metric_list)
    return metrics.get_value_of_metric("MAE")

In [None]:
study = optuna_create_study("mlp_electric_consumption", direction=['minimize'])
study.optimize(electric_consumption_optimisation, n_trials=30)