# Objective: 

The aim is to analyze network traffic to detect any anomalies or suspicious activities.

## Data: 

CICIDS 2017 dataset, which includes a wide range of attacks and normal traffic. It's available here:

https://www.unb.ca/cic/datasets/ids-2017.html

## Procedure: 

Acquire from https://www.unb.ca/cic/datasets/ids-2017.html.
Preprocess
Exploratory data analysis
Classification machine learning algorithms:
- Logistic Regression
- Random Forest
- Gradient Boosting
- XGBoost


## Extra notes:

Reference this site for putting pcap files into DataFrames:

https://www.automox.com/blog/visualizing-network-data-using-python-part-3

Research paper from the creators of the dataset: Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”, 4th International Conference on Information Systems Security and Privacy (ICISSP), Purtogal, January 2018

## Code:

In [1]:
#Imports
from scapy.all import *
import plotly
from datetime import datetime
import pandas as pd
import numpy as np

In [2]:
#Preferences
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:.3f}')

In [3]:
def fixing_col_names(df):
    """
    This function removes leading whitespace, '/s' characters and replaces spaces with '_'.
    """
    #Lists to capture alterations
    column_names = list(df.columns)
    fixed_names = []
    
    for item in column_names:
        #Removes leading whitespace
        if item[0].isspace():
            item = item[1:]
            item.replace(" ", "")
        #Removes '/s'
        if item[-2:] == "/s":
            item = item[:-2]
        #Removes '.1'
        if item[-2:] == ".1":
            item = item[:-2]
        #Replaces space with underscore
        item = item.replace(" ", "_")
        fixed_names.append(item)
    
    #Replaces names in the DataFrame
    df.rename(columns=dict(zip(column_names, fixed_names)), inplace=True)
    return df

In [4]:
#Get traffic data:
mon0 = fixing_col_names(pd.read_csv("csv_files/Monday-WorkingHours.pcap_ISCX.csv"))
tues0 = fixing_col_names(pd.read_csv("csv_files/Tuesday-WorkingHours.pcap_ISCX.csv"))
wed0 = fixing_col_names(pd.read_csv("csv_files/Wednesday-WorkingHours.pcap_ISCX.csv"))
thur0 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv"))
thur1 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv"))
fri0 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Morning.pcap_ISCX.csv"))
fri1 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv"))
fri2 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv"))

### Tasks
- Check for null values.
- Minority inbalance in several cases. Does that matter for anomaly detection?

In [5]:
#View traffic
#mon0.head()
#tues0.head()
#wed0.head()
#thur0.head()
#thur1.head()
#fri0.head()
#fri1.head()
fri2.head()

Unnamed: 0,Destination_Port,Flow_Duration,Total_Fwd_Packets,Total_Backward_Packets,Total_Length_of_Fwd_Packets,Total_Length_of_Bwd_Packets,Fwd_Packet_Length_Max,Fwd_Packet_Length_Min,Fwd_Packet_Length_Mean,Fwd_Packet_Length_Std,Bwd_Packet_Length_Max,Bwd_Packet_Length_Min,Bwd_Packet_Length_Mean,Bwd_Packet_Length_Std,Flow_Bytes,Flow_Packets,Flow_IAT_Mean,Flow_IAT_Std,Flow_IAT_Max,Flow_IAT_Min,Fwd_IAT_Total,Fwd_IAT_Mean,Fwd_IAT_Std,Fwd_IAT_Max,Fwd_IAT_Min,Bwd_IAT_Total,Bwd_IAT_Mean,Bwd_IAT_Std,Bwd_IAT_Max,Bwd_IAT_Min,Fwd_PSH_Flags,Bwd_PSH_Flags,Fwd_URG_Flags,Bwd_URG_Flags,Fwd_Header_Length,Bwd_Header_Length,Fwd_Packets,Bwd_Packets,Min_Packet_Length,Max_Packet_Length,Packet_Length_Mean,Packet_Length_Std,Packet_Length_Variance,FIN_Flag_Count,SYN_Flag_Count,RST_Flag_Count,PSH_Flag_Count,ACK_Flag_Count,URG_Flag_Count,CWE_Flag_Count,ECE_Flag_Count,Down/Up_Ratio,Average_Packet_Size,Avg_Fwd_Segment_Size,Avg_Bwd_Segment_Size,Fwd_Header_Length.1,Fwd_Avg_Bytes/Bulk,Fwd_Avg_Packets/Bulk,Fwd_Avg_Bulk_Rate,Bwd_Avg_Bytes/Bulk,Bwd_Avg_Packets/Bulk,Bwd_Avg_Bulk_Rate,Subflow_Fwd_Packets,Subflow_Fwd_Bytes,Subflow_Bwd_Packets,Subflow_Bwd_Bytes,Init_Win_bytes_forward,Init_Win_bytes_backward,act_data_pkt_fwd,min_seg_size_forward,Active_Mean,Active_Std,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label
0,22,1266342,41,44,2664,6954,456,0,64.976,109.865,976,0,158.045,312.675,7595.105,67.122,15075.5,104051.4,948537,0,1266342,31658.55,159355.259,996324,2,317671,7387.698,19636.448,104616,1,0,0,0,0,1328,1424,32.377,34.746,0,976,111.837,239.687,57449.785,0,0,0,1,0,0,0,0,1,113.153,64.976,158.045,1328,0,0,0,0,0,0,41,2664,44,6954,29200,243,24,32,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
1,22,1319353,41,44,2664,6954,456,0,64.976,109.865,976,0,158.045,312.675,7289.937,64.426,15706.583,104861.87,955790,1,1319353,32983.825,159247.901,996423,1,363429,8451.837,21337.263,104815,1,0,0,0,0,1328,1424,31.076,33.35,0,976,111.837,239.687,57449.785,0,0,0,1,0,0,0,0,1,113.153,64.976,158.045,1328,0,0,0,0,0,0,41,2664,44,6954,29200,243,24,32,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
2,22,160,1,1,0,0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,12500.0,160.0,0.0,160,160,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0,0,0,32,32,6250.0,6250.0,0,0,0.0,0.0,0.0,0,0,0,0,1,1,0,0,1,0.0,0.0,0.0,32,0,0,0,0,0,0,1,0,1,0,290,243,0,32,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
3,22,1303488,41,42,2728,6634,456,0,66.537,110.13,976,0,157.952,319.121,7182.268,63.675,15896.195,106554.899,956551,0,1303488,32587.2,160397.05,997357,1,346851,8459.78,23962.239,138295,0,0,0,0,0,1328,1360,31.454,32.221,0,976,111.452,241.643,58391.239,0,0,0,1,0,0,0,0,1,112.795,66.537,157.952,1328,0,0,0,0,0,0,41,2728,42,6634,29200,243,24,32,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
4,35396,77,1,2,0,0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,38961.039,38.5,14.849,49,28,0,0.0,0.0,0,0,49,49.0,0.0,49,49,0,0,0,0,32,64,12987.013,25974.026,0,0,0.0,0.0,0.0,0,0,0,0,1,1,0,0,2,0.0,0.0,0.0,32,0,0,0,0,0,0,1,0,2,0,243,290,0,32,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN


In [6]:
tues0[tues0["Label"] == "FTP-Patator"].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Destination_Port,7938.0,21.007,0.662,21.0,21.0,21.0,21.0,80.0
Flow_Duration,7938.0,4513244.598,4527255.604,0.0,219.0,4044135.0,8994940.5,10780124.0
Total_Fwd_Packets,7938.0,5.497,3.5,1.0,2.0,6.0,9.0,9.0
Total_Backward_Packets,7938.0,7.808,7.186,0.0,1.0,6.0,15.0,15.0
Total_Length_of_Fwd_Packets,7938.0,60.032,46.336,0.0,14.0,30.5,106.0,135.0
Total_Length_of_Bwd_Packets,7938.0,93.907,93.922,0.0,0.0,76.0,188.0,188.0
Fwd_Packet_Length_Max,7938.0,18.995,5.572,0.0,14.0,16.0,23.0,49.0
Fwd_Packet_Length_Min,7938.0,0.019,0.521,0.0,0.0,0.0,0.0,14.0
Fwd_Packet_Length_Mean,7938.0,9.39,2.494,0.0,7.0,9.688,11.778,15.0
Fwd_Packet_Length_Std,7938.0,9.697,0.846,0.0,9.458,9.899,9.899,15.504


In [7]:
mon0["Label"].value_counts()

BENIGN    529918
Name: Label, dtype: int64

In [8]:
tues0["Label"].value_counts()

BENIGN         432074
FTP-Patator      7938
SSH-Patator      5897
Name: Label, dtype: int64

In [9]:
wed0["Label"].value_counts()

BENIGN              440031
DoS Hulk            231073
DoS GoldenEye        10293
DoS slowloris         5796
DoS Slowhttptest      5499
Heartbleed              11
Name: Label, dtype: int64

In [10]:
thur0["Label"].value_counts()

BENIGN                        168186
Web Attack � Brute Force        1507
Web Attack � XSS                 652
Web Attack � Sql Injection        21
Name: Label, dtype: int64

In [11]:
thur1["Label"].value_counts()

BENIGN          288566
Infiltration        36
Name: Label, dtype: int64

In [12]:
fri0["Label"].value_counts()

BENIGN    189067
Bot         1966
Name: Label, dtype: int64

In [13]:
fri1["Label"].value_counts()

DDoS      128027
BENIGN     97718
Name: Label, dtype: int64

In [14]:
fri2["Label"].value_counts()

PortScan    158930
BENIGN      127537
Name: Label, dtype: int64