# DDoS traffic classification
In this laboratory, you will evaluate the capability of a pre-trained deep learning model to differentiate between DDoS traffic and benign traffic. The provided script loads CNN model trained before. The model is evaluated on then test set, with the classification accuracy being displayed in the notebook and saved in a CSV file.

| <img src="../../Content/artworks/ml-workflow.png" width="60%"> |
|:--:|
| Convolutional Neural Network (LUCID) |

In [None]:
# Author: Roberto Doriguzzi-Corin
# Project: Course on Network Intrusion and Anomaly Detection with Deep Learning
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import numpy as np
import argparse
import h5py
import glob
import time
import sys
import csv
import os
import logging
from tensorflow.keras.models import load_model
from sklearn.metrics import  f1_score, accuracy_score
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
tf.get_logger().setLevel(logging.ERROR)

In [None]:
def load_dataset(path):
    filename = glob.glob(path)[0]
    dataset = h5py.File(filename, "r")
    set_x_orig = np.array(dataset["set_x"][:])  # features
    set_y_orig = np.array(dataset["set_y"][:])  # labels

    X_train = np.reshape(set_x_orig, (set_x_orig.shape[0], set_x_orig.shape[1], set_x_orig.shape[2], 1))
    Y_train = set_y_orig

    return X_train, Y_train

## Performance metrics
The following method computes the metrics to assess the performance of the models on the given datasets. Both accuracy and F1 Score are widely used metrics in many domains of the computer science. More information can be found in the ```sklearn``` documentation ([accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score), [F1 Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)) and in online documentation (e.g., [Metrics to Evaluate your Machine Learning Algorithm](https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234)). 

In [None]:
def compute_metrics(Y_true, Y_pred):
    Y_true = Y_true.reshape((Y_true.shape[0], 1))
    accuracy = accuracy_score(Y_true, Y_pred)
    f1 = f1_score(Y_true, Y_pred)
    return accuracy, f1

In [None]:
model_list = glob.glob("./output/10t-10n*.keras") # list of pre-trained models
print ("Model List: ",model_list)
dataset_filelist = glob.glob("./sample-dataset/*test.hdf5") # list of test sets
print ("Test sets List: ",dataset_filelist)

In [None]:
classify_fieldnames = ['Dataset', 'Samples', 'Time', 'Accuracy', 'F1Score', 'Model']
predict_file = open('./results.csv', 'a', newline='')
predict_file.truncate(0)  # clean the file content (as we open the file in append mode)
predict_writer = csv.DictWriter(predict_file, fieldnames=classify_fieldnames)
predict_writer.writeheader()
predict_file.flush()

## Inference

In the last cell, the script evaluates all the models on all the test sets available in the list.  The pre-trained models are loaded from the file system using the ```load_model``` method from Keras. The ```load_dataset``` method, previously defined, retrieves the test set and returns two numpy arrays: ```X```, containing the test examples, and ```Y```, providing the labels. 

The ```model.predict``` method is used to compute predictions for the entire test set in batches of 2048 examples. Note that adjusting this value will affect the prediction time, but not the accuracy scores. Lastly, the ```compute_metric``` method is utilized to calculate the accuracy and F1 score metrics, which are then used to evaluate the performance of the model.

In [None]:
for model_path in model_list:
    model = load_model(model_path)
    print (model.summary())
    model_filename = model_path.split('/')[-1].split('.')[0]
    for dataset_file in dataset_filelist:
        dataset_filename = dataset_file.split('/')[-1].split('.')[0]
        X, Y = load_dataset(dataset_file)
        Y_true = Y
        pt0 = time.time()
        Y_pred = np.squeeze(model.predict(X, batch_size=2048) > 0.5)
        pt1 = time.time()
        accuracy, f1 = compute_metrics(Y_true,Y_pred)
        row = {'Dataset': dataset_filename, 'Samples': Y_true.shape[0], 'Time': '{:10.3f}'.format(pt1-pt0), 'Accuracy': accuracy,
               'F1Score': f1, 'Model': model_filename}
        print (row)
        predict_writer.writerow(row)
predict_file.close()
print("Classification results saved in file: ", predict_file.name)