# Hyperparameter tuning
In this laboratory, you will tune the hyperparameters of an MLP model for binary network traffic classification. The MLP model returns a value between 0 and 1, which is the probability of the input flow of being malicious. 
You will tune the model using Grid and Random search and you will train it on a dataset of benign traffic and DDoS attack traffic.

You will use a dataset of benign and various DDoS attacks from the CIC-DDoS2019 dataset (https://www.unb.ca/cic/datasets/ddos-2019.html).
The network traffic has been previously pre-processed in a way that packets are grouped in bi-directional traffic flows using the 5-tuple (source IP, destination IP, source Port, destination Port, protocol). Each flow is represented with 21 packet-header features computed from max 1000 packets:

| Feature nr.         | Feature Name |
|---------------------|---------------------|
| 00 | timestamp (mean IAT) | 
| 01 | packet_length (mean)| 
| 02 | IP_flags_df (sum) |
| 03 | IP_flags_mf (sum) |
| 04 | IP_flags_rb (sum) | 
| 05 | IP_frag_off (sum) |
| 06 | protocols (mean) |
| 07 | TCP_length (mean) |
| 08 | TCP_flags_ack (sum) |
| 09 | TCP_flags_cwr (sum) |
| 10 | TCP_flags_ece (sum) |
| 11 | TCP_flags_fin (sum) |
| 12 | TCP_flags_push (sum) |
| 13 | TCP_flags_res (sum) |
| 14 | TCP_flags_reset (sum) |
| 15 | TCP_flags_syn (sum) |
| 16 | TCP_flags_urg (sum) |
| 17 | TCP_window_size (mean) |
| 18 | UDP_length (mean) |
| 19 | ICMP_type (mean) |
| 20 | Packets (counter)|

In [None]:
# Author: Roberto Doriguzzi-Corin
# Project: Course on Network Intrusion and Anomaly Detection with Machine Learning
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import Adam,SGD
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import GridSearchCV,RandomizedSearchCV
from keras.regularizers import l1,l2
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score
import tensorflow as tf
import os
import random as rn
import numpy as np
import logging
import time
from util_functions import *

# disable GPUs for test reproducibility
tf.config.set_visible_devices([], 'GPU')

SEED=0

DATASET_FOLDER = "./DOS2019_Binary"

# Load your dataset (for example, Iris dataset)
data = load_iris()
X, y = data.data, data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train, y_train = load_dataset(DATASET_FOLDER + "/*" + '-train.hdf5')
X_val, y_val = load_dataset(DATASET_FOLDER + "/*" + '-val.hdf5')
X_test, y_test = load_dataset(DATASET_FOLDER + "/*" + '-test.hdf5')

In [None]:
def compileModel(model,optimizer='sgd', lr=0.001):
    if optimizer == 'sgd':
        optimizer = SGD(learning_rate=lr, momentum=0.0)
    else:
        optimizer = Adam(learning_rate=lr, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
    model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy']) 

# Model definition
The following method defines the MLP model with configurable hyperparameters. Each hyperparameter has a default value that can be set during the tuning process. 

In [None]:
# Function to create the MLP model (for GridSearchCV)
def create_model(optimizer='sgd', dense_layers=4, hidden_units=2, learning_rate = 0.001, dropout_rate=0, activation='relu'):
    model = Sequential(name  = "mlp")
    model.add(Dense(hidden_units, input_shape=(21,), activation='relu'))
    model.add(Dropout(dropout_rate))
    for layer in range(dense_layers):
        model.add(Dense(hidden_units, activation='relu', name='hidden-fc' + str(layer)))
        model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation='sigmoid', name='fc2'))
    compileModel(model, optimizer,learning_rate)
    return model


# Grid search

The code below performs hyperparameter tuning using the *grid search* strategy. Grid search is a hyperparameter tuning technique in machine learning that involves systematically searching for the optimal combination of hyperparameters by evaluating a model's performance on a grid of possible hyperparameter values. The grid of hyperparameters is already defined. 

Your first task is to add more parameters to tune, such as the number of **hidden units** (integers) and the **dropout rate** (floating point numbers between 0 and 1). Check the performance before and after adding the new parameters.

Note the total training time and the accuracy on training and validation sets, which are printed at the bottom of the output log. Then add **early stopping**, as done in the [Regularization](./Regularisation.ipynb) demonstration. Train again and compare training time and accuracies. 

NOTE: pay attention to the **cv** parameter of the *GridSearchCV* method. It indicates the number of folds in the k-fold cross-validation. ```cv=k``` means that the training set is split into ```k``` folds, which are used alternatively for training and for validation. Therefore, for each combination of hyperparameters, the model is trained ```k``` times.

In [None]:
# Create a KerasClassifier based on the create_model function
model = KerasClassifier(build_fn=create_model, batch_size=100, verbose=1)

PATIENCE = 10
k=2 # number of folds for cross-validation

# Define the hyperparameters to tune and their possible values
param_grid = {
    'learning_rate' : [0.001,0.01],
### ADD YOUR CODE HERE ###

##########################
    'optimizer' : ['sgd','adam']
}

# Perform grid search with 5-fold cross-validation
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=k)

### ADD early stopping HERE

###########################
start_time = time.time()

### ADD early stopping in the list of callbacks
grid_result = grid.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), callbacks= [])
stop_time = time.time()

# Total training time
print("Total training time (sec): ", stop_time-start_time)
# Print the best parameters and corresponding accuracy
print("Best parameters found: ", grid_result.best_params_)
print("Best cross-validated accuracy: {:.2f}".format(grid_result.best_score_))

# Evaluate the best model on the test set
best_model = grid.best_estimator_
test_accuracy = best_model.score(X_test, y_test)
print("Test accuracy of the best model: {:.2f}".format(test_accuracy))

# Random search
In the following cell, you will implement *random search* to tune the hyperparameters of the MLP model. Instead of trying every combination of hyperparameters like with grid search, random combinations of hyperparameters are sampled from a specified distribution. The number of trials is determined by the user (parameter *n_iter*), and the results of each trial are used to guide the search towards better-performing combinations of hyperparameters.

In the cell below, implement random search by reusing the code of the grid search approach above, and by taking inspiration from the implementation of the hyperparameter tuning for a [Random Forest model](./hyperparameter-tuning-RF.ipynb). Use the same list of hyperparameters as for the grid search, but use the *uniform* method for generating floating point values of **learning_rate** and **dropout_rate**, while use *randint* for generating the integer hyperparameters (hidden units). 

Add early-stopping and set the number of iterations to 20 or 30, to avoid waiting for a long time.

In [None]:
from scipy.stats import uniform, randint
k=2 # number of folds for cross-validation
PATIENCE = 10

# Create a KerasClassifier based on the create_model function
model = KerasClassifier(build_fn=create_model, batch_size=100, verbose=1)

### ADD YOUR CODE HERE ###



##########################

# Total training time
print("Total training time (sec): ", stop_time-start_time)

# Print the best parameters and corresponding accuracy
print("Best parameters found: ", random_search_result.best_params_)
print("Best cross-validated accuracy: {:.2f}".format(random_search_result.best_score_))

# Evaluate the best model on the test set
best_model = random_search.best_estimator_
test_accuracy = best_model.score(X_test, y_test)
print("Test accuracy of the best model: {:.2f}".format(test_accuracy))