### Data Encoding & Neural Network - Binary / Multiclass Classification V3 - CIIC Research - José P. Areia

**The table below represents all the fields of the dataset as well as their descriptions and the encoding process done in each one of the fields.**

Type of pre-processing / data encoding done: <mark>Categorical Values (Dummies)</mark>, <mark>Z-Score Normalization</mark>, <mark>Empty Cells Filling</mark>, and <mark>Value Replacing.</mark>

It's important to notice that the fields **attack_type** and **is_malicious** were later added to the dataset in order to distinguish normal traffic from the malicious one, and to identify what type of attack was done. The list below represents a numerical classification represented in the field **attack_type**.

- \[0\] - Normal
- \[1\] - Black Hole
- \[2\] - Hello Flood
- \[3\] - Version Number


| Field                               | Description                                                                                                                      |  Deleted | Categorical Values (Dummies) | Z-Score Normalization | Empty Cells Filling |
|:-------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|:--------:|:----------------------------:|:---------------------:|:-------------------:|
| frame.time_relative                 | Time since reference or first frame                                                                                              | &#x2714; | -                            | -                     | -                   |
| wpan.frame_length                   | Frame length                                                                                                                     | -        | -                            | &#x2714;              | -                   |
| ipv6.src                            | Source address                                                                                                                   | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.opt.length               | Option length                                                                                                                    | -        | -                            | &#x2714;              | &#x2714;            |
| frame.cap_len                       | Frame length stored into the capture file                                                                                        | -        | -                            | &#x2714;              | -                   |
| udp.checksum                        | Checksum                                                                                                                         | &#x2714; | -                            | -                     | -                   |
| udp.dstport                         | Destination port                                                                                                                 | &#x2714; | -                            | -                     | -                   |
| frame.time_delta                    | Time delta from previous captured frame                                                                                          | &#x2714; | -                            | -                     | -                   |
| frame.time_epoch                    | Epoch time                                                                                                                       | &#x2714; | -                            | -                     | -                   |
| ipv6.dst_host                       | Destination host                                                                                                                 | &#x2714; | -                            | -                     | -                   |
| icmpv6.type                         | ICMPv6 Type - [IANA Table Types](https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml#icmpv6-parameters-2) | -        | -                            | -                     | &#x2714;            |
| frame.time_delta_displayed          | Time delta from previous displayed frame                                                                                         | &#x2714; | -                            | -                     | -                   |
| frame.protocols                     | Protocols in frame                                                                                                               | &#x2714; | -                            | -                     | -                   |
| udp.stream                          | Stream index                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| coap.payload_length                 | Payload                                                                                                                          | -        | &#x2714;                     | -                     | -                   |
| udp.srcport                         | Source port                                                                                                                      | &#x2714; | -                            | -                     | -                   |
| wpan.seq_no                         | Sequence number                                                                                                                  | &#x2714; | -                            | -                     | -                   |
| icmpv6.checksum.status              | Checksum status                                                                                                                  | -        | -                            | -                     | &#x2714;            |
| 6lowpan.iphc.m                      | Multicast address compression (header compression)                                                                               | -        | -                            | -                     | &#x2714;            |
| 6lowpan.pattern                     | 1 Bit Pattern for different protocols                                                                                            | &#x2714; | -                            |                       | -                   |
| udp.length                          | UDP Frame length                                                                                                                 | -        | -                            | &#x2714;              | &#x2714;            |
| frame.number                        | Frame number                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| wpan.fcf                            | Frame control field                                                                                                              | &#x2714; | -                            | -                     | -                   |
| 6lowpan.udp.src                     | Source port                                                                                                                      | &#x2714; | -                            | -                     | -                   |
| wpan.dst64                          | Destination (EUI64 Destination)                                                                                                  | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.dio.version              | DIO Version                                                                                                                      | -        | -                            | &#x2714;              | &#x2714;            |
| wpan.dst_addr_mode                  | Destination addressing mode                                                                                                      | &#x2714; | -                            | -                     | -                   |
| 6lowpan.udp.checksum                | UDP checksum                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| coap.opt.uri_path                   | Incomplete uri-path                                                                                                              | -        | &#x2714;                     | -                     | -                   |
| icmpv6.checksum                     | Checksum                                                                                                                         | &#x2714; | -                            | -                     | -                   |
| ipv6.host                           | Source or destination host                                                                                                       | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.dao.sequence             | DAO Sequence                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| ipv6.addr                           | Source or destination address                                                                                                    | &#x2714; | -                            | -                     | -                   |
| wpan.addr64                         | Extended address                                                                                                                 | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.dio.rank                 | DIO Rank                                                                                                                         | -        | -                            | &#x2714;              | &#x2714;            |
| **is_malicious**\*                  | Distinguish normal from malicious traffic                                                                                        | -        | -                            | -                     | -                   |
| udp.port                            | Source or destination port                                                                                                       | &#x2714; | -                            | -                     | -                   |
| ipv6.src_host                       | Source host                                                                                                                      | &#x2714; | -                            | -                     | -                   |
| udp.time_relative                   | Time since first frame                                                                                                           | &#x2714; | -                            | -                     | -                   |
| udp.pdu.size                        | Protocol data unit size                                                                                                          | &#x2714; | -                            | -                     | -                   |
| udp.payload                         | Payload                                                                                                                          | &#x2714; | -                            | -                     | -                   |
| coap.opt.length                     | Opt length                                                                                                                       | -        | &#x2714;                     | -                     | -                   |
| coap.type                           | Type                                                                                                                             | -        | -                            | -                     | &#x2714;            |
| ipv6.dst                            | Destination address                                                                                                              | &#x2714; | -                            | -                     | -                   |
| ipv6.plen                           | Payload length                                                                                                                   | -        | -                            | &#x2714;              | &#x2714;            |
| frame.len                           | Frame length on the wire                                                                                                         | -        | -                            | &#x2714;              | &#x2714;            |
| icmpv6.rpl.opt.target.prefix        | Target                                                                                                                           | &#x2714; | -                            | -                     | -                   |
| **attack_type**\*                   | Type of attack                                                                                                                   | -        | -                            | -                     | -                   |
| ipv6.nxt                            | Next header                                                                                                                      | -        | &#x2714;                     | -                     | -                   |
| 6lowpan.nhc.udp.ports               | Ports                                                                                                                            | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.opt.type                 | RPL Options type                                                                                                                 | -        | &#x2714;                     | -                     | -                   |
| coap.payload_length                 | Payload length                                                                                                                   | &#x2714; | -                            | -                     | -                   |
| 6lowpan.iphc.nh                     | Next header                                                                                                                      | -        | -                            | -                     | &#x2714;            |
| frame.time                          | Frame arrival time                                                                                                               | &#x2714; | -                            | -                     | -                   |
| wpan.src64                          | Extended source                                                                                                                  | &#x2714; | -                            | -                     | -                   |
| coap.opt.uri_path_recon             | Complete uri-path                                                                                                                | &#x2714; | -                            | -                     | -                   |
| <ins>icmpv6.code</ins>                         | ICMPv6 Code - [IANA Table Code](https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml#icmpv6-parameters-3)  | -        | -                            | -                     | &#x2714;            |
| wpan.fcs                            | WPAN Frame check sequence OK                                                                                                     | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.opt.transit.pathlifetime | Path lifetime (Default: 30 Days)                                                                                                 | -        | -                            | -                     | &#x2714;            |
| coap.mid                            | Message ID                                                                                                                       | &#x2714; | -                            | -                     | -                   |
| 6lowpan.src                         | Source                                                                                                                           | &#x2714; | -                            | -                     | -                   |
| coap.code                           | Coap code - [RFC 7252](https://www.rfc-editor.org/rfc/rfc7252#page-86)                                                             | -        | &#x2714;                     | -                     | -                   |
| icmpv6.rpl.dio.dtsn                 | Destination advertisement trigger sequence number (DTSN)                                                                         | -        | -                            | &#x2714;              | &#x2714;            |
| 6lowpan.dst                         | Destination                                                                                                                      | &#x2714; | -                            | -                     | -                   |
| udp.time_delta                      | Time since previous frame                                                                                                        | &#x2714; | -                            | -                     | -                   |

In [1]:
# Tensorflow logging: OFF
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

**The next blocks of code are for defining the methods used for the pre-processing and data encoding.**

The methods are the following: <mark>read_data</mark>, <mark>delete_fields</mark>, <mark>fill_empty_cells</mark>, <mark>zscore_fields_normalization</mark>, <mark>coap_payload_length</mark>, <mark>dummy_encode</mark>, and <mark>value_replacing</mark>. More details in the blocks of code below.

In [2]:
# Pre-processing & Data Encoding
import pandas as pd

# Import dataset
def read_data(dataset):
    df = pd.read_csv(dataset)
    return df

    print(f'[DONE] Imported Dataset')

In [3]:
# Delete Unnecessary Fields
def delete_fields(df, fields):
    for i in fields:
        df.drop(i, axis = 1, inplace = True)
        
    print(f'[DONE] Fields Deleted')

In [4]:
# Fill Empty Cells With N Value
def fill_empty_cells(df, fields, n):
    for i in fields:
        df[i] = df[i].fillna(n)
    
    print(f'[DONE] Empty Cells Filling')

In [5]:
from scipy.stats import zscore

# Z-Score Normalization
def zscore_normalization(df, fields):
    for i in fields:
        df[i] = zscore(df[i])

    print(f'[DONE] Z-Score Normalization')

In [6]:
# Extract the CoAP Payload Length into a new column
def coap_payload_length(df):
    df[["coap.payload", "coap.payload.format", "coap.payload_length"]] = df["coap.payload"].str.split(':', expand = True)
    df["coap.payload_length"] = df["coap.payload_length"].fillna(0)
    df.drop('coap.payload', axis = 1, inplace = True)
    df.drop('coap.payload.format', axis = 1, inplace = True)
    
    print(f'[DONE] CoAP Payload Length Extraction')

In [7]:
# Replace old value for a new one 
def value_replacing(df, field, old_value, new_value):
    df[field] = df[field].replace(old_value, new_value)

    print(f'[DONE] Value Replacing')

In [8]:
# Convert fields to dummy variables
def dummy_encode(df, fields):
    for i in fields:
        df = pd.concat([df, pd.get_dummies(df[i], prefix = i)], axis = 1)
        df.drop(i, axis = 1, inplace = True)
    
    return df

    print(f'[DONE] Categorical Values (Dummies)')

In [9]:
# Import Dataset & Read Data
df = read_data('Datasets/Anomalous_Traffic_VF2.csv')

In [10]:
# Fields -- Deleted Fields
f_delete = [
    'frame.time_relative', 'ipv6.src', 'udp.checksum', 'frame.time_delta', 'frame.time_epoch', 
    'ipv6.dst_host', 'frame.time_delta_displayed', 'udp.stream', 'wpan.seq_no', '6lowpan.pattern',
    'frame.number', 'wpan.fcf', '6lowpan.udp.src', 'wpan.dst64', 'wpan.dst_addr_mode', '6lowpan.udp.checksum',
    'icmpv6.checksum', 'ipv6.host', 'icmpv6.rpl.dao.sequence', 'ipv6.addr',  'wpan.addr64', 'udp.port', 
    'ipv6.src_host', 'udp.time_relative', 'udp.pdu.size', 'udp.payload', 'ipv6.dst', 'udp.srcport', 'frame.len',
    'icmpv6.rpl.opt.target.prefix', '6lowpan.nhc.udp.ports', 'coap.payload_length', 'frame.time', 'wpan.src64',
    'coap.opt.uri_path_recon', 'wpan.fcs', 'coap.mid', '6lowpan.src', '6lowpan.dst', 'udp.time_delta',
    'icmpv6.rpl.opt.length', 'udp.dstport', 'frame.protocols', 'icmpv6.code'
]

delete_fields(df, f_delete)

[DONE] Fields Deleted


In [11]:
# Fields -- Empty Cells Filling
fill_0 = [
    'icmpv6.type', 'icmpv6.checksum.status', 'udp.length', 'icmpv6.rpl.dio.version',
    'icmpv6.rpl.dio.rank', 'ipv6.plen', 'icmpv6.rpl.dio.dtsn'
]

fill_1 = [
    '6lowpan.iphc.m', 'coap.type', '6lowpan.iphc.nh', 'icmpv6.rpl.opt.transit.pathlifetime'
]

fill_empty_cells(df, fill_0, 0)  # Fill: 0
fill_empty_cells(df, fill_1, -1) # Fill: -1

[DONE] Empty Cells Filling
[DONE] Empty Cells Filling


In [12]:
# Fields -- Z-Score Normalization
zscore_fields = [
    'wpan.frame_length', 'frame.cap_len', 'udp.length', 'icmpv6.rpl.dio.version',
    'icmpv6.rpl.dio.rank', 'ipv6.plen', 'icmpv6.rpl.dio.dtsn'
]

zscore_normalization(df, zscore_fields)

[DONE] Z-Score Normalization


In [13]:
# Fields -- Extract CoAP Payload Length
coap_payload_length(df)

[DONE] CoAP Payload Length Extraction


In [14]:
# Fields -- Standard Normalization (Value Replacing)
value_replacing(df, 'icmpv6.type', 155, 1)
value_replacing(df, 'coap.type', 2, 1)
value_replacing(df, 'icmpv6.rpl.opt.transit.pathlifetime', 30, 1)

[DONE] Value Replacing
[DONE] Value Replacing
[DONE] Value Replacing


In [15]:
# Fields -- Categorical Values
dummy_fields = [
    'coap.payload_length', 'coap.opt.uri_path', 'coap.opt.length', 'ipv6.nxt',
    'icmpv6.rpl.opt.type', 'coap.code'
]

df = dummy_encode(df, dummy_fields)

In [16]:
# Displays all the fields in te dataset as well as their values
display(df)

Unnamed: 0,wpan.frame_length,frame.cap_len,icmpv6.type,icmpv6.checksum.status,6lowpan.iphc.m,udp.length,icmpv6.rpl.dio.version,icmpv6.rpl.dio.rank,is_malicious,coap.type,...,coap.opt.uri_path_sensors,coap.opt.length_7.0,coap.opt.length_9.0,ipv6.nxt_17.0,ipv6.nxt_58.0,icmpv6.rpl.opt.type_4.0,icmpv6.rpl.opt.type_5.0,coap.code_2.0,coap.code_69.0,coap.code_132.0
0,-0.856025,-0.855887,1.0,1.0,-1.0,-0.541894,-0.531387,-0.084635,0,-1.0,...,0,0,0,0,1,0,0,0,0,0
1,-0.856025,-0.855887,1.0,1.0,-1.0,-0.541894,-0.531387,-0.084635,1,-1.0,...,0,0,0,0,1,0,0,0,0,0
2,-0.856025,-0.855887,1.0,1.0,-1.0,-0.541894,-0.531387,-0.084635,0,-1.0,...,0,0,0,0,1,0,0,0,0,0
3,-0.856025,-0.855887,1.0,1.0,-1.0,-0.541894,-0.531387,-0.084635,0,-1.0,...,0,0,0,0,1,0,0,0,0,0
4,1.366917,1.366764,1.0,1.0,1.0,-0.541894,2.448346,0.080501,0,-1.0,...,0,0,0,0,1,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
668211,-0.047682,-0.047650,0.0,0.0,-1.0,0.133871,-0.531387,-0.084635,0,1.0,...,0,0,0,1,0,0,0,0,0,1
668212,-0.047682,-0.047650,0.0,0.0,0.0,2.386420,-0.531387,-0.084635,0,0.0,...,1,1,0,1,0,0,0,1,0,0
668213,-0.047682,-0.047650,0.0,0.0,-1.0,0.133871,-0.531387,-0.084635,0,1.0,...,0,0,0,1,0,0,0,0,1,0
668214,-0.047682,-0.047650,0.0,0.0,0.0,2.386420,-0.531387,-0.084635,0,0.0,...,1,1,0,1,0,0,0,1,0,0


In [17]:
import numpy as np

# Convert to Numpy Classification

# Classification Type: 0 - Binary / 1 - Multiclass
classification_type = 1

# For a multiclass classification, drop both 'is_malicious' and 'attack_type' collumn

# For a binary classification, use 'is_malicious' as target collumn
# For a multiclass classification, use 'attack_type' as target collumn

if (classification_type):
    x_columns = df.columns.drop(['attack_type', 'is_malicious'])
    dummies = pd.get_dummies(df['attack_type'])
else:
    x_columns = df.columns.drop('is_malicious')
    dummies = pd.get_dummies(df['is_malicious'])
    
x = df[x_columns].values
attack = dummies.columns
y = dummies.values

print(f'[DONE] Numpy Classification')

TypeError: drop() got an unexpected keyword argument 'axis'

In [None]:
from sklearn.model_selection import train_test_split

# Training validation splitting 
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 42)

print(f'[DONE] Training validation splitting')

In [None]:
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

# Neural Network Model
model = Sequential()
model.add(Dense(50, input_dim = x.shape[1], activation = 'relu')) # Hidden 1
model.add(Dense(25, activation = 'relu')) # Hidden 2
model.add(Dense(y.shape[1], activation = 'softmax')) # Output
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')

print(f'[DONE] Neural Network Model')

# Early Stopping
monitor = EarlyStopping(monitor = 'val_loss', min_delta = 1e-3, patience = 10, verbose = 1, mode = 'auto', restore_best_weights = True)
model.fit(x_train, y_train, validation_data = (x_test, y_test), callbacks = [monitor], verbose = 2, epochs = 1000)

print(f'[DONE] Early Stopping')

In [None]:
# Prediction
pred = model.predict(x_test)

In [None]:
from sklearn import metrics

# Metrics for the classification
def compute_metrics(pred, y_test):
    predict_classes = np.argmax(pred, axis = 1)
    expected_classes = np.argmax(y_test, axis = 1)
    
    correct = metrics.accuracy_score(expected_classes, predict_classes)
    print(f"Accuracy: {correct}")
    
    recall = metrics.recall_score(expected_classes, predict_classes, average = 'weighted')    
    print(f"Recall: {recall}")
       
    precision = metrics.precision_score(expected_classes, predict_classes, average = 'weighted')
    print(f"Precision: {precision}")
    
    f1score = metrics.f1_score(expected_classes, predict_classes, average = 'weighted')
    print(f"F1Score: {f1score}")
    
compute_metrics(pred, y_test)

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Confusion Matrix
predict_classes = np.argmax(pred, axis = 1)
expected_classes = np.argmax(y_test, axis = 1)    
    
cm = confusion_matrix(expected_classes, predict_classes)
cmd = ConfusionMatrixDisplay(cm)

# Plot size
fig, ax = plt.subplots(figsize = (5, 5))

cmd.plot(ax = ax)

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

# Usage of ExtraTreesClassifier for feature selection
extra_tree_forest = ExtraTreesClassifier(n_estimators = 5, criterion ='entropy', max_features = 2)
extra_tree_forest.fit(x, y)
feature_importance = extra_tree_forest.feature_importances_
feature_importance_normalized = np.std([tree.feature_importances_ for tree in  extra_tree_forest.estimators_], axis = 0)

print(f'[DONE] Extra Trees Classifier')

In [None]:
import matplotlib.pyplot as plot

# Plor for the ExtraTreesClassifier output
plot.bar(x_columns, feature_importance_normalized)
plot.xlabel('Feature Labels')
plot.ylabel('Feature Importances')
plot.title('Comparison of different feature importances in the current dataset')
plot.xticks(rotation = 90)

# Plot size
plot.rcParams["figure.figsize"] = (60, 40)

plot.show()