# Objective: 

The aim is to analyze network traffic to detect any anomalies or suspicious activities.

## Data: 

CICIDS 2017 dataset, which includes a wide range of attacks and normal traffic. It's available here:

https://www.unb.ca/cic/datasets/ids-2017.html

## Procedure: 

Acquire from https://www.unb.ca/cic/datasets/ids-2017.html.

Preprocess

Exploratory data analysis

Classification machine learning algorithms:
- Logistic Regression
- Random Forest
- Gradient Boosting
- XGBoost


## Extra notes:

Reference this site for putting pcap files into DataFrames:

https://www.automox.com/blog/visualizing-network-data-using-python-part-1

https://www.automox.com/blog/visualizing-network-data-using-python-part-2

https://www.automox.com/blog/visualizing-network-data-using-python-part-3

Found a whole site dedicated to malware traffic analysis:
https://www.malware-traffic-analysis.net/2021/index.html

Research paper from the creators of the dataset: Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”, 4th International Conference on Information Systems Security and Privacy (ICISSP), Purtogal, January 2018

### Current Tasks
- Check for null values.
- Minority inbalance in several cases. Does that matter for anomaly detection?
- Time series analysis – need to obtain timestamp.

## Code:

In [1]:
#Imports
from scapy.all import *
from collections import Counter
from prettytable import PrettyTable
import plotly
from datetime import datetime
import pandas as pd
import numpy as np
from prepare import *
import os

In [2]:
#Preferences
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:.3f}')

In [3]:
#Get traffic data:
load_data = False
if load_data == True:
    mon0 = fixing_col_names(pd.read_csv("csv_files/Monday-WorkingHours.pcap_ISCX.csv"))
    tues0 = fixing_col_names(pd.read_csv("csv_files/Tuesday-WorkingHours.pcap_ISCX.csv"))
    wed0 = fixing_col_names(pd.read_csv("csv_files/Wednesday-WorkingHours.pcap_ISCX.csv"))
    thur0 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv"))
    thur1 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv"))
    fri0 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Morning.pcap_ISCX.csv"))
    fri1 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv"))
    fri2 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv"))

In [4]:
mon_file_size = os.path.getsize("pcap_files/Monday-WorkingHours.pcap")
print(f"Pcap file is {mon_file_size/1_000_000_000:.3f} GB")

Pcap file is 10.823 GB


In [5]:
#`PcapReader()` iterates though the pcap entries.
#`rdpcap()` loads all pcap entries in memory at once. Takes a very long time.

In [6]:
#Getting pcaps
count = 0
for packet in PcapReader("pcap_files/Monday-WorkingHours.pcap"):
    print(packet)
    count += 1
    if count >= 25:
        break

Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
Ether / IP / TCP 8.254.250.126:http > 192.168.10.5:49188 FA / Padding
802.3 70:6e:6d:1d:bb:04 > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:06 > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:0c > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:0c > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:0c > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:0c > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw
802.3 70:6e:6d:1d:bb:0c > 01:00:0c:cc:cc:cc / LLC / SNAP / Raw

In [7]:
#Read pcap files:
#mon_packets = rdpcap("pcap_files/Monday-WorkingHours.pcap")

In [8]:
#View traffic
#mon0.head()
#tues0.head()
#wed0.head()
#thur0.head()
#thur1.head()
#fri0.head()
#fri1.head()
#fri2.head()

In [9]:
#tues0[tues0["Label"] == "FTP-Patator"].describe().T

In [10]:
#mon0["Label"].value_counts()

In [11]:
#tues0["Label"].value_counts()

In [12]:
#wed0["Label"].value_counts()

In [13]:
#thur0["Label"].value_counts()

In [14]:
#thur1["Label"].value_counts()

In [15]:
#fri0["Label"].value_counts()

In [16]:
#fri1["Label"].value_counts()

In [17]:
#fri2["Label"].value_counts()