### Data Encoding & Neural Network - Binary / Multiclass Classification V1 - CIIC Research - José P. Areia

**The table below represents all the fields of the dataset as well as their descriptions and the encoding process done in each one of the fields.**

Type of pre-processing / data encoding done: <mark>Categorical Values (Dummies)</mark>, <mark>Z-Score Normalization</mark>, <mark>Empty Cells Filling</mark>, and <mark>Value Replacing.</mark>

It's important to notice that the fields **attack_type** and **is_malicious** were later added to the dataset in order to distinguish normal traffic from the malicious one, and to identify what type of attack was done. The list below represents a numerical classification represented in the field **attack_type**.


| Field                        | Description                                                                                                                     |  Deleted | Categorical Values (Dummies) | Z-Score Normalization | Empty Cells Filling |
|:------------------------------|:---------------------------------------------------------------------------------------------------------------------------------|:--------:|:----------------------------:|:---------------------:|:-------------------:|
| ipv6.plen                    | Payload length                                                                                                                  | -        | -                            | &#x2714;              | &#x2714;            |
| icmpv6.rpl.dao.sequence      | DAO Sequence                                                                                                                    | &#x2714; | -                            | -                     | -                   |
| frame.protocols              | Protocols in frame                                                                                                              | &#x2714; | -                            | -                     | -                   |
| ipv6.host                    | Source or destination host                                                                                                      | &#x2714; | -                            | -                     | -                   |
| ip.len                       | Total Length                                                                                                                    | -        | &#x2714;                     | -                     | -                   |
| icmpv6.rpl.dio.dagid         | DODAGID - Identifies a DODAG [RFC 9009](https://www.rfc-editor.org/rfc/rfc9009.html#name-destination-cleanup-object-)           | &#x2714; | -                            | -                     | -                   |
| ip.src                       | Source Address                                                                                                                  | &#x2714; | -                            | -                     | -                   |
| frame.time_delta_displayed   | Time delta from previous displayed frame                                                                                        | &#x2714; | -                            | -                     | -                   |
| ipv6.tclass.dscp             | Differentiated services codepoint [IANA Documentation](https://www.iana.org/assignments/dscp-registry/dscp-registry.xhtml)      | -        | -                            | -                     | &#x2714;            |
| udp.checksum                 | Checksum                                                                                                                        | &#x2714; | -                            | -                     | -                   |
| icmpv6.checksum              | Checksum                                                                                                                        | &#x2714; | -                            | -                     | -                   |
| ipv6.dst                     | Destination address                                                                                                             | &#x2714; | -                            | -                     | -                   |
| ip.dst                       | Destination address                                                                                                             | &#x2714; | -                            | -                     | -                   |
| ipv6.tclass.ecn              | Explicit congestion notification                                                                                                | -        | -                            | -                     | &#x2714;            |
| frame.number                 | Frame number                                                                                                                    | &#x2714; | -                            | -                     | -                   |
| tcp.completeness             | Conversation completeness                                                                                                       | &#x2714; | -                            | -                     | -                   |
| ipv6.dst_host                | Destination host                                                                                                                | &#x2714; | -                            | -                     | -                   |
| udp.time_relative            | Time since first frame                                                                                                          | &#x2714; | -                            | -                     | -                   |
| ip.version                   | Version                                                                                                                         | -        | &#x2714;                     | -                     | -                   |
| icmpv6.rpl.opt.target.prefix | Target                                                                                                                          | &#x2714; | -                            |                       | -                   |
| ip.ttl                       | Time to Live                                                                                                                    | -        | &#x2714;                     | -                     | -                   |
| tcp.time_relative            | Time since first frame in this TCP stream                                                                                       | &#x2714; | -                            | -                     | -                   |
| frame.len                    | Frame length on the wire                                                                                                        | -        | -                            | &#x2714;              | -                   |
| tcp.time_delta               | Time since previous frame in this TCP stream                                                                                    | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.opt.type          | RPL Options type                                                                                                                | -        | &#x2714;                     | -                     | -                   |
| frame.time                   | Frame arrival time                                                                                                              | &#x2714; | -                            | -                     | -                   |
| frame.time_delta             | Time delta from previous captured frame                                                                                         | &#x2714; | -                            | -                     | -                   |
| icmpv6.code                  | ICMPv6 Code - [IANA Table Code](https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml#icmpv6-parameters-3) | -        | &#x2714;                     | -                     | -                   |
| ipv6.addr                    | Source or destination address                                                                                                   | &#x2714; | -                            | -                     | -                   |
| ipv6.src_host                | Source host                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| ip.host                      | Source or Destination Host                                                                                                      | &#x2714; | -                            | -                     | -                   |
| ip.addr                      | Source or Destination Address                                                                                                   | &#x2714; | -                            | -                     | -                   |
| udp.stream                   | Stream index                                                                                                                    | &#x2714; | -                            | -                     | -                   |
| udp.srcport                  | Source port                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| frame.time_epoch             | Epoch time                                                                                                                      | &#x2714; | -                            | -                     | -                   |
| ipv6.flow                    | Flow Label [RFC 2460](https://www.rfc-editor.org/rfc/rfc2460#page-25)                                                           | &#x2714; | -                            | -                     | -                   |
| ip.dst_host                  | Destination Host                                                                                                                | &#x2714; | -                            | -                     | -                   |
| tcp.analysis.rto_frame       | Retransmission timeouts (RTO) based on delta from frame                                                                         | &#x2714; | -                            | -                     | -                   |
| tcp.stream                   | Stream index                                                                                                                    | &#x2714; | -                            | -                     | -                   |
| ipv6.hlim                    | Hop Limit                                                                                                                       | -        | -                            | &#x2714;              | &#x2714;            |
| ipv6.nxt                     | Next header                                                                                                                     | -        | &#x2714;                     | -                     | -                   |
| tcp.checksum                 | Checksum                                                                                                                        | &#x2714; | -                            | -                     | -                   |
| frame.time_relative          | Time since reference or first frame                                                                                             | &#x2714; | -                            | -                     | -                   |
| udp.length                   | UDP Frame length                                                                                                                | &#x2714; | -                            | -                     | -                   |
| ip.proto                     | Protocol                                                                                                                        | -        | -                            | -                     | &#x2714;            |
| ip.id                        | Identification                                                                                                                  | &#x2714; | -                            | -                     | -                   |
| ipv6.src                     | Source address                                                                                                                  | &#x2714; | -                            | -                     | -                   |
| tcp.analysis.rto             | The RTO for this segment was                                                                                                    | &#x2714; | -                            | -                     | -                   |
| icmpv6.rpl.opt.length        | Option length                                                                                                                   | -        | -                            | &#x2714;              | &#x2714;            |
| frame.cap_len                | Frame length stored into the capture file                                                                                       | -        | -                            | &#x2714;              | -                   |
| ipv6.tclass                  | Traffic class (QoS) [RFC 2460](https://www.rfc-editor.org/rfc/rfc2460#page-25)                                                  | &#x2714; | -                            | -                     | -                   |
| udp.port                     | Source or destination port                                                                                                      | &#x2714; | -                            | -                     | -                   |
| udp.time_delta               | Time since previous frame                                                                                                       | &#x2714; | -                            | -                     | -                   |
| ip.src_host                  | Source host                                                                                                                     | &#x2714; | -                            | -                     | -                   |
| udp.dstport                  | Destination port                                                                                                                | &#x2714; | -                            | -                     | -                   |
| udp.payload                  | Payload                                                                                                                         | &#x2714; | -                            | -                     | -                   |
| ip.checksum                  | Checksum                                                                                                                        | &#x2714; | -                            | -                     | -                   |
| udp.checksum.status          | Checksum status                                                                                                                 | -        | -                            | -                     | &#x2714;            |
| icmpv6.rpl.dio.rank          | DIO Rank                                                                                                                        | -        | -                            | &#x2714;              | &#x2714;            |
| **is_malicious**\*           | Distinguish normal traffic from malicious one                                                                                   | -        | -                            | -                     | -                   |
| **attack_type**\*            | Type of attack                                                                                                                  | -        | -                            | -                     | -                   |

In [1]:
# Tensorflow logging: OFF
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

**The next blocks of code are for defining the methods used for the pre-processing and data encoding.**

The methods are the following: <mark>read_data</mark>, <mark>delete_fields</mark>, <mark>fill_empty_cells</mark>, <mark>zscore_fields_normalization</mark>, <mark>dummy_encode</mark>, and <mark>value_replacing</mark>. More details in the blocks of code below.

In [2]:
import pandas as pd

# Import dataset
def read_data(dataset):
    df = pd.read_csv(dataset)
    return df

    print(f'[DONE] Dataset Import')

In [3]:
# Delete Unnecessary Fields
def delete_fields(df, fields):
    for i in fields:
        df.drop(i, axis = 1, inplace = True)
        
    print(f'[DONE] Fields Deleted')

In [4]:
# Fill Empty Cells With N Value
def fill_empty_cells(df, fields, n):
    for i in fields:
        df[i] = df[i].fillna(n)
    
    print(f'[DONE] Empty Cells Filling')

In [5]:
from scipy.stats import zscore

# Z-Score Normalization
def zscore_normalization(df, fields):
    for i in fields:
        df[i] = zscore(df[i])

    print(f'[DONE] Z-Score Normalization')

In [6]:
# Replace old value for a new one 
def value_replacing(df, field, old_value, new_value):
    df[field] = df[field].replace(old_value, new_value)

    print(f'[DONE] Value Replacing')

In [7]:
# Convert fields to dummy variables
def dummy_encode(df, fields):
    for i in fields:
        df = pd.concat([df, pd.get_dummies(df[i], prefix = i)], axis = 1)
        df.drop(i, axis = 1, inplace = True)
    
    return df

    print(f'[DONE] Categorical Values (Dummies)')

In [10]:
# Import Dataset & Read Data
df = read_data('Datasets/NETSIM_Anomalous_Traffic.csv')

  df = pd.read_csv(dataset)


In [11]:
display(df)

Unnamed: 0,ipv6.plen,icmpv6.rpl.dao.sequence,frame.protocols,ipv6.host,ip.len,icmpv6.rpl.dio.dagid,ip.src,frame.time_delta_displayed,ipv6.tclass.dscp,udp.checksum,...,udp.port,udp.time_delta,ip.src_host,udp.dstport,udp.payload,ip.checksum,udp.checksum.status,icmpv6.rpl.dio.rank,is_malicious,attack_type
0,20.0,,raw:ipv6:udp:data,fdec:3017:e256:9bb8:1fe7:590a:f163:9af2,,,,0.000319,0.0,0xf845,...,39108.0,0.000000,,19550.0,6162636465666768696a6b6c,,2.0,,0,0
1,44.0,,raw:ipv6:icmpv6,fdec:3017:e256:9bb8:1fe7:67a4:88ea:13fe,,fdec:3017:e256:9bb8:1fe7:ca28:7f00:e482,,0.015784,0.0,,...,,,,,,,,16.0,0,0
2,20.0,,raw:ipv6:udp:data,fdec:3017:e256:9bb8:1fe7:590a:f163:9af2,,,,0.001182,0.0,0xa857,...,38414.0,0.000000,,40706.0,6162636465666768696a6b6c,,2.0,,0,0
3,44.0,,raw:ipv6:icmpv6,fdec:3017:e256:9bb8:1fe7:ca28:7f00:e482,,fdec:3017:e256:9bb8:1fe7:ca28:7f00:e482,,0.002626,0.0,,...,,,,,,,,1.0,0,0
4,44.0,,raw:ipv6:icmpv6,fdec:3017:e256:9bb8:1fe7:ca28:7f00:e482,,fdec:3017:e256:9bb8:1fe7:ca28:7f00:e482,,0.000000,0.0,,...,,,,,,,,1.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048570,,,raw:ip:udp:data,,40.0,,0.0.0.40,0.000000,,0x41e5,...,49930.0,0.000000,0.0.0.40,34878.0,6162636465666768696a6b6c,0xb099,2.0,,0,0
1048571,,,raw:ip:udp:data,,40.0,,0.0.0.40,0.000011,,0x41e4,...,49930.0,0.000011,0.0.0.40,34878.0,6162636465666768696a6b6c,0xb198,2.0,,0,0
1048572,,,raw:ip:udp:data,,43.0,,11.3.1.2,0.000208,,0x9b71,...,10562.0,0.000000,11.3.1.2,57384.0,6162636465666768696a6b6c6d6e6f,0xa3b9,2.0,,0,0
1048573,,,raw:ip:udp:data,,43.0,,11.3.1.2,0.000010,,0x9b71,...,10562.0,0.000010,11.3.1.2,57384.0,6162636465666768696a6b6c6d6e6f,0xa3b9,2.0,,0,0
