# Network intrusion detection with Softmax Regression
# Intrusion Detection with Softmax Regression
In this laboratory, we will use Softmax Regression to classify the network traffic flows as benign or as one DDoS attack class. The Softmax regression model returns the probability of the input flow of belonging to one of the target classes. We use the argmax operator to decide to which class the flow belongs to (either benign or one DDoS attack classes).
We will train a Softmax regression model on a dataset of benign traffic and DDoS attack traffic.

We will use a dataset of benign and various DDoS attacks from the CIC-DDoS2019 dataset (https://www.unb.ca/cic/datasets/ddos-2019.html).
The network traffic has been previously pre-processed in a way that packets are grouped in bi-directional traffic flows using the 5-tuple (source IP, destination IP, source Port, destination Port, protocol). Each flow is represented with 21 packet-header features computed from max 10 packets:

| Features           | Softmax Regression model           |
|---------------------|--------------------|
| timestamp (mean IAT)  <br> packet_length (mean) <br> IP_flags_df (sum) <br> IP_flags_mf (sum) <br> IP_flags_rb (sum) <br> IP_frag_off (sum) <br> protocols (mean) <br> TCP_length (mean) <br> TCP_flags_ack (sum) <br> TCP_flags_cwr (sum) <br> TCP_flags_ece (sum) <br> TCP_flags_fin (sum) <br> TCP_flags_push (sum) <br> TCP_flags_res (sum) <br> TCP_flags_reset (sum) <br> TCP_flags_syn (sum) <br> TCP_flags_urg (sum) <br> TCP_window_size (mean) <br> UDP_length (mean) <br> ICMP_type (mean) <br> Packets (counter) <br>| <img src="./softmax_regression_CIC2019.png" width="100%">  |

In [20]:
# Author: Roberto Doriguzzi-Corin
# Project: Course on Network Intrusion and Anomaly Detection with Machine Learning
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Import necessary libraries

import numpy as np
import glob
import h5py
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import Adam,SGD, RMSprop
from util_functions import *
DATASET_FOLDER = "./DOS2019"

In [27]:
# Load training, validation and test sets
feature_names = get_feature_names()
target_names = ['benign', 'dns',  'syn', 'udplag', 'webddos'] 
target_names_full = ['benign', 'dns', 'ldap', 'mssql', 'netbios', 'ntp', 'portmap', 'snmp', 'ssdp', 'syn', 'tftp', 'udp', 'udplag', 'webddos']
X_train, y_train = load_dataset(DATASET_FOLDER + "/*" + '-train.hdf5')
X_val, y_val = load_dataset(DATASET_FOLDER + "/*" + '-val.hdf5')
X_test, y_test = load_dataset(DATASET_FOLDER + "/*" + '-test.hdf5')

## Model definition
In the next cell, set the right activation function and the number of output classes

In [28]:
# Softmax Regression model
def SoftmaxRegression(model_name, input_shape,classes):
    ### ADD YOUR CODE HERE ### 
    activation_function = 
    
    model = Sequential(name=model_name)
    model.add(Dense( , input_shape=input_shape,activation=activation_function, name='fc1'))
    ##########################

    print(model.summary())
    return model

## Cost function and optimisation algorithm
Use the correct loss function and try different optimizers (SGD, SGD with momentum, NAG, RMSprop or Adam). 

In [29]:
def compileModel(model,lr):
    ### ADD YOUR CODE HERE ###
    loss_function = 'categorical_crossentropy'
    optimizer = 
    ##########################
    model.compile(loss=loss_function, optimizer=optimizer,metrics=['accuracy'])  # here we specify the loss function

## Train the model

In [None]:
model = SoftmaxRegression('log_reg', X_train.shape[1:4],len(target_names_full))
compileModel(model,0.001)

### ADD YOUR CODE HERE ###
EPOCHS = 
BATCH_SIZE = 
##########################

# Train the model
model.fit(X_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(X_val, y_val))

## Make prediction on unseen data

In [None]:
### make the prediction on the test set
y_pred = model.predict(X_test) 
# Convert one-hot encoded predictions back to categorical values
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test, axis=1)

### print the classification report
print(classification_report(y_test_labels, y_pred_labels, target_names=target_names))