# Objective: 

The aim is to analyze network traffic to detect any anomalies or suspicious activities.

## Data: 

CICIDS 2017 dataset, which includes a wide range of attacks and normal traffic. It's available here:

https://www.unb.ca/cic/datasets/ids-2017.html

## Procedure: 

Acquire from https://www.unb.ca/cic/datasets/ids-2017.html.

Preprocess

Exploratory data analysis

Classification machine learning algorithms:
- Logistic Regression
- Random Forest
- Gradient Boosting
- XGBoost


## Resources:

### Websites:

https://www.studytonight.com/network-programming-in-python/analyzing-network-traffic
https://plainenglish.io/blog/network-traffic-analysis-with-python-f95ed4e76c28

#### pcap files into DataFrames:

https://www.automox.com/blog/visualizing-network-data-using-python-part-1

https://www.automox.com/blog/visualizing-network-data-using-python-part-2

https://www.automox.com/blog/visualizing-network-data-using-python-part-3

#### Network Traffic Visualization (Geolocation):
https://medium.com/vinsloev-academy/python-cybersecurity-network-tracking-using-wireshark-and-google-maps-2adf3e497a93

#### Examples for malware traffic analysis:
https://www.malware-traffic-analysis.net/2021/index.html

#### Specific indicator of compromise:
https://cylab.be/blog/245/network-traffic-analysis-with-python-scapy-and-some-machine-learning

### YouTube:
https://www.youtube.com/watch?v=oA7QhYOhW_0
https://www.youtube.com/watch?v=xuNuy8n8u-Y

### LinkedIn Learning:
https://www.linkedin.com/learning/applied-ai-for-it-operations-aiops/network-traffic-analysis

### Books:
https://www.techtarget.com/searchnetworking/feature/Learn-how-to-master-network-traffic-analysis-with-Python

Research paper:
https://www.scitepress.org/papers/2018/66398/66398.pdf

### Current Tasks
- Practise reading pcap files.
- Figure out a way to capture pcap files from the notebook.
- Identify indicators of compromise.

## Code:

In [1]:
#Imports
from scapy.all import *
from collections import Counter
from prettytable import PrettyTable
import plotly
from datetime import datetime
import pandas as pd
import numpy as np
from prepare import *
import os
import psutil
import networkx

In [2]:
#Preferences
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:.3f}')

In [3]:
#Get traffic data:
load_data = True
if load_data == True:
    mon0 = fixing_col_names(pd.read_csv("csv_files/Monday-WorkingHours.pcap_ISCX.csv"))
    tues0 = fixing_col_names(pd.read_csv("csv_files/Tuesday-WorkingHours.pcap_ISCX.csv"))
    wed0 = fixing_col_names(pd.read_csv("csv_files/Wednesday-WorkingHours.pcap_ISCX.csv"))
    thur0 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv"))
    thur1 = fixing_col_names(pd.read_csv("csv_files/Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv"))
    fri0 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Morning.pcap_ISCX.csv"))
    fri1 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv"))
    fri2 = fixing_col_names(pd.read_csv("csv_files/Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv"))

In [4]:
for file in os.listdir("pcap_files"):
    print(f"Pcap file size: {os.path.getsize(f'pcap_files//{file}')/1_000_000_000:.3f} GB")
print(f"Available memory: {psutil.virtual_memory()[1]/1_000_000_000:.3f} GB")

Pcap file size: 8.839 GB
Pcap file size: 10.823 GB
Pcap file size: 8.303 GB
Pcap file size: 11.048 GB
Pcap file size: 13.421 GB
Available memory: 23.002 GB


In [5]:
#`PcapReader()` iterates though the pcap entries. Creates generator object.
#`rdpcap()` loads all pcap entries in memory at once. Takes a very long time and can fill up memory.

In [6]:
#Getting pcaps
count = 0
for packet in PcapReader("pcap_files/Monday-WorkingHours.pcap"):
    print(packet.show())
    count += 1
    if count >= 5:
        break

###[ Ethernet ]### 
  dst       = b8:ac:6f:36:0a:8b
  src       = 00:c1:b1:14:eb:31
  type      = IPv4
###[ IP ]### 
     version   = 4
     ihl       = 5
     tos       = 0x0
     len       = 40
     id        = 6964
     flags     = 
     frag      = 0
     ttl       = 55
     proto     = tcp
     chksum    = 0x9a72
     src       = 8.254.250.126
     dst       = 192.168.10.5
     \options   \
###[ TCP ]### 
        sport     = http
        dport     = 49188
        seq       = 3755453835
        ack       = 2495101083
        dataofs   = 5
        reserved  = 0
        flags     = FA
        window    = 329
        chksum    = 0xc534
        urgptr    = 0
        options   = ''
###[ Padding ]### 
           load      = '\x00\x00\x00\x00\x00\x00'

None
###[ Ethernet ]### 
  dst       = b8:ac:6f:36:0a:8b
  src       = 00:c1:b1:14:eb:31
  type      = IPv4
###[ IP ]### 
     version   = 4
     ihl       = 5
     tos       = 0x0
     len       = 40
     id        = 6964
     flags     = 

In [7]:
#Read pcap files:
#mon_packets = rdpcap("pcap_files/Monday-WorkingHours.pcap")

In [8]:
#View traffic
mon0.head()
#tues0.head()
#wed0.head()
#thur0.head()
#thur1.head()
#fri0.head()
#fri1.head()
#fri2.head()

Unnamed: 0,Destination_Port,Flow_Duration,Total_Fwd_Packets,Total_Backward_Packets,Total_Length_of_Fwd_Packets,Total_Length_of_Bwd_Packets,Fwd_Packet_Length_Max,Fwd_Packet_Length_Min,Fwd_Packet_Length_Mean,Fwd_Packet_Length_Std,Bwd_Packet_Length_Max,Bwd_Packet_Length_Min,Bwd_Packet_Length_Mean,Bwd_Packet_Length_Std,Flow_Bytes,Flow_Packets,Flow_IAT_Mean,Flow_IAT_Std,Flow_IAT_Max,Flow_IAT_Min,Fwd_IAT_Total,Fwd_IAT_Mean,Fwd_IAT_Std,Fwd_IAT_Max,Fwd_IAT_Min,Bwd_IAT_Total,Bwd_IAT_Mean,Bwd_IAT_Std,Bwd_IAT_Max,Bwd_IAT_Min,Fwd_PSH_Flags,Bwd_PSH_Flags,Fwd_URG_Flags,Bwd_URG_Flags,Fwd_Header_Length,Bwd_Header_Length,Fwd_Packets,Bwd_Packets,Min_Packet_Length,Max_Packet_Length,Packet_Length_Mean,Packet_Length_Std,Packet_Length_Variance,FIN_Flag_Count,SYN_Flag_Count,RST_Flag_Count,PSH_Flag_Count,ACK_Flag_Count,URG_Flag_Count,CWE_Flag_Count,ECE_Flag_Count,Down/Up_Ratio,Average_Packet_Size,Avg_Fwd_Segment_Size,Avg_Bwd_Segment_Size,Fwd_Header_Length.1,Fwd_Avg_Bytes/Bulk,Fwd_Avg_Packets/Bulk,Fwd_Avg_Bulk_Rate,Bwd_Avg_Bytes/Bulk,Bwd_Avg_Packets/Bulk,Bwd_Avg_Bulk_Rate,Subflow_Fwd_Packets,Subflow_Fwd_Bytes,Subflow_Bwd_Packets,Subflow_Bwd_Bytes,Init_Win_bytes_forward,Init_Win_bytes_backward,act_data_pkt_fwd,min_seg_size_forward,Active_Mean,Active_Std,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label
0,49188,4,2,0,12,0,6,6,6.0,0.0,0,0,0.0,0.0,3000000.0,500000.0,4.0,0.0,4,4,4,4.0,0.0,4,4,0,0.0,0.0,0,0,0,0,0,0,40,0,500000.0,0.0,6,6,6.0,0.0,0.0,0,0,0,0,1,1,0,0,0,9.0,6.0,0.0,40,0,0,0,0,0,0,2,12,0,0,329,-1,1,20,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
1,49188,1,2,0,12,0,6,6,6.0,0.0,0,0,0.0,0.0,12000000.0,2000000.0,1.0,0.0,1,1,1,1.0,0.0,1,1,0,0.0,0.0,0,0,0,0,0,0,40,0,2000000.0,0.0,6,6,6.0,0.0,0.0,0,0,0,0,1,1,0,0,0,9.0,6.0,0.0,40,0,0,0,0,0,0,2,12,0,0,329,-1,1,20,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
2,49188,1,2,0,12,0,6,6,6.0,0.0,0,0,0.0,0.0,12000000.0,2000000.0,1.0,0.0,1,1,1,1.0,0.0,1,1,0,0.0,0.0,0,0,0,0,0,0,40,0,2000000.0,0.0,6,6,6.0,0.0,0.0,0,0,0,0,1,1,0,0,0,9.0,6.0,0.0,40,0,0,0,0,0,0,2,12,0,0,329,-1,1,20,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
3,49188,1,2,0,12,0,6,6,6.0,0.0,0,0,0.0,0.0,12000000.0,2000000.0,1.0,0.0,1,1,1,1.0,0.0,1,1,0,0.0,0.0,0,0,0,0,0,0,40,0,2000000.0,0.0,6,6,6.0,0.0,0.0,0,0,0,0,1,1,0,0,0,9.0,6.0,0.0,40,0,0,0,0,0,0,2,12,0,0,329,-1,1,20,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN
4,49486,3,2,0,12,0,6,6,6.0,0.0,0,0,0.0,0.0,4000000.0,666666.667,3.0,0.0,3,3,3,3.0,0.0,3,3,0,0.0,0.0,0,0,0,0,0,0,40,0,666666.667,0.0,6,6,6.0,0.0,0.0,0,0,0,0,1,1,0,0,0,9.0,6.0,0.0,40,0,0,0,0,0,0,2,12,0,0,245,-1,1,20,0.0,0.0,0,0,0.0,0.0,0,0,BENIGN


In [9]:
#tues0[tues0["Label"] == "FTP-Patator"].describe().T

In [10]:
#mon0["Label"].value_counts()

In [11]:
#tues0["Label"].value_counts()

In [12]:
#wed0["Label"].value_counts()

In [13]:
#thur0["Label"].value_counts()

In [14]:
#thur1["Label"].value_counts()

In [15]:
#fri0["Label"].value_counts()

In [16]:
#fri1["Label"].value_counts()

In [17]:
#fri2["Label"].value_counts()