## Flow-based features

While using one packet at a time to detect intrusions is very simple and straightforward, one can argue that relevant context is lost: The model does not consider the position of the packet in a flow. Therefor, most RTF-based machine learning models in literature consider the packet flows rather than individual packets. Recall that a network flow is identified by its 5-tuple: Source address, destination address, source port, destination port and protocol. The main idea is to sort flows based on their 5-tuple (flow ID), and then extract features from the resulting flows. In the literature, this is usually done as follows:

1. Iterate over the dataset in chronological order
    1. For each packet, extract the flow ID (if applicable)
    2. Store the entire packet (or a truncated version) in a data structure containing all packets with that flow ID
2. Decide on the dimensions of your input features
3. Iterate over each flow
    1. Extract the input features
    2. Store the extracted features

These extracted, stored features can then be used for training and or model evaluation. Deciding on the input feature dimensions not has become slightly more complicated however, when compared to using individual packets. Mainly, there are two decisions that need to be made:

1. How many packets do you include for one sample;
2. Do you only consider the beginning of the flow, or the entire flow.

As an example, the model we will use in this exercise was trained on samples of 4x64 bytes, created by taking the first 64 bytes of 4 subsequent packets in a flow. All packets in such a flow were considered.

### In hardware

The above approach does not work in hardware, as it requires having access to the entire dataset at once, and being able to store that entire dataset at once while processing its data. When performing network intrusion detection in hardware, there is only a limited amount of storage and it is impossible to predict what future packets will arrive. Therefor, another approach is necessary. One solution is to sort incoming packets into buckets according to their flow ID, and to empty those buckets whenever there needs to be made space for new incoming packets. Emptying a bucket then amounts to creating an input sample for the detection model. (Image: *n* individual buckets, packets are stored according to their flow ID FID).

<img src="./images/32_sorting.gif" width="600" height="150" />

### In softeware: exercise
In software, both approaches are possible: Either sorting the entire dataset beforehand, or filling and emptying buckets based on the incoming traffic. In this exercise, we will sort the entire dataset dataset for extracting features and conducting inference. 

In [2]:
import torch
import numpy as np
from lib.dataset import NIDSDataset


# Initialize the dataset
dset = NIDSDataset(
    packets_file="./data/dataset_packets_v2.npy", 
    labels_file="./data/dataset_labels_v1.npy")

# The flow_dict contains a list of packets for each flow ID:
# flow_dict = {
#    "id_1": [..., ...],
#    "id_2": [..., ...],
#    ...
# }
flow_dict = {}

# Also include the label in the flow ID to account for the possibility that different packets from the same flow 
# could be malicious.
# Iterate over the dataset, sort all valid input features to dictionary

for packet in dset:
    label = packet.get_label()
    
    # Own code here
    
    for word in packet:
        # Own code here
        pass
    
    
print("We extracted {} flow IDs.".format(len(flow_dict.keys())))

We extracted 0 flow IDs.


Once the packets have been sorted, you can use the dictionary to extract your actual features.
Tip: Initialize your input samples using *np.zeros(...)*, so that input samples that only have one, two or three instead of four packets available can account for the missing data. Similarly, in hardware the missing bytes would have to be zero-padded.

In [None]:
input_buffer = []
label_buffer = []

for flow_id, packet_list in flow_dict.items():
    # Your code here
    pass

print("We prepared {} input samples".format(len(input_buffer)))

# The items in the input_buffer should be 320-sized numpy arrays
# These are then transform to tensors
input_tensors = []

for input_sample in input_buffer:
    # Turn the Numpy array into a PyTorch tensor
    input_tensor = torch.from_numpy(input_sample)
    # Change input dimensionality
    input_tensor = input_tensor.view(1, 1, 4 * 64)
    
    input_tensors.append(input_tensor)

Now we repeat the process of loading the CNN, performance inference and interpreting the results:

In [4]:
from lib.nn_model import ExampleCNN1D4x64

model = ExampleCNN1D4x64(13)

# Load the trained parameters
model.load_state_dict(torch.load("./data/cnn1d4x64.model"))

# Set the Batch Normalization layers for inference
model.eval()

ExampleCNN1D4x64(
  (layer1): Sequential(
    (0): Conv1d(1, 16, kernel_size=(5,), stride=(1,), padding=(2,))
    (1): BatchNorm1d(16, eps=1e-05, momentum=0.95, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (maxpool1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (layer2): Sequential(
    (0): Conv1d(16, 32, kernel_size=(5,), stride=(1,), padding=(2,))
    (1): BatchNorm1d(32, eps=1e-05, momentum=0.95, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (maxpool2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (layer3): Sequential(
    (0): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=(2,))
    (1): BatchNorm1d(64, eps=1e-05, momentum=0.95, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (maxpool3): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
  (layer4): Sequential(
    (0): Conv1d(64, 96, kernel_size=(3,), stride=(1,), padding=(1,))
    (1): Batch

In [None]:
from lib.nn_model import label_mapping

predictions = []

for input_tensor in input_tensors:
    output_tensor = model(input_tensor.float())
    
    _, predicted = torch.max(output_tensor, 1)
    predictions.append(predicted)

predictions = torch.stack(predictions, 0).numpy()

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Transform the indices to their corresponding class label:
labelled_predictions = []
for prediction in predictions:
    labelled_predictions.append(label_mapping[prediction[0]])

# Choose output figure size
_, ax = plt.subplots(figsize=(20, 20))

# Calculate the confusion matrix
cm = confusion_matrix(label_buffer, labelled_predictions, labels=label_mapping)

# Display the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=label_mapping)
disp.plot(ax=ax)
plt.show()

In [None]:
from sklearn.metrics import classification_report

from sklearn.metrics import classification_report

print(classification_report(label_buffer, labelled_predictions, 
                            labels=label_mapping, 
                            target_names=label_mapping,
                            digits=4,
                            zero_division=0
                           ))

We expect the following results:
- BENIGN F1 = 0.9421
- Bot F1 = 0.8235
- PortScan = 0.4684
- DDoS = 0.5741
- F1_weighted_avg = 0.7353

Now, clearly the generated samples were significantly more poorly classified using this second model, when compared to the model with simple input features.

