# Kitsune Network Attack Dataset

## Exploratory Data Analysis

[ Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. (2018). Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. *arXiv preprint arXiv:1802.09089*.](https://arxiv.org/abs/1802.09089)

**The Kitsune Network Attack Dataset comprises nine distinct network attack scenarios, each stored in its own directory.**

1. ARP MitM (Address Resolution Protocol Man-in-the-Middle): In this attack, the adversary intercepts and potentially alters communications between two parties by exploiting the ARP protocol. This allows the attacker to eavesdrop, modify, or block data between the victim devices.

2. Fuzzing: This technique involves sending malformed or unexpected inputs to a system to discover vulnerabilities. By observing how the system handles these inputs, attackers can identify potential weaknesses that could be exploited.

3. Mirai Botnet: Mirai is malware that targets IoT devices, transforming them into bots to form a botnet. This botnet can then be used to launch large-scale Distributed Denial of Service (DDoS) attacks, overwhelming targets with traffic.

4. OS Scan (Operating System Scan): Attackers perform OS scanning to determine the operating system running on a target machine. This information aids in selecting appropriate exploits tailored to the identified OS vulnerabilities.

5. SSDP Flood (Simple Service Discovery Protocol Flood): This DDoS attack exploits the SSDP protocol by sending a flood of discovery requests, causing the target device to become overwhelmed and potentially crash or become unresponsive.

6. SSL Renegotiation: This attack abuses the SSL/TLS renegotiation feature, repeatedly requesting renegotiation to consume server resources. The excessive renegotiation requests can lead to denial of service as the server becomes overwhelmed.

7. SYN DoS (SYN Denial of Service): In this attack, the perpetrator sends a succession of SYN requests to a target's system but does not complete the handshake. This leaves multiple half-open connections, exhausting the target's resources and leading to service denial.

8. Video Injection: This involves injecting malicious video content into a stream, potentially leading to unauthorized content delivery or exploitation of vulnerabilities in the video processing components.

9. UDP Flood (User Datagram Protocol Flood): A type of DDoS attack where the attacker sends a large number of UDP packets to random ports on a target machine. The target system, in attempting to process these packets, becomes overwhelmed, leading to service degradation or denial.

**Within each directory, you'll find three key files:**

1. Raw Network Capture File (.pcapng): This file contains the original network traffic data captured during the attack scenario. The packets are truncated to 200 bytes to maintain privacy.

2. Preprocessed Dataset (_dataset.csv): A CSV file where each row represents a network packet, and each column corresponds to one of the 115 features extracted using the AfterImage feature extractor. These features provide a statistical snapshot of the network's behavior at the time of each packet.

3. Labels File (_labels.csv): This CSV file contains binary labels indicating whether each packet is benign (0) or malicious (1). For Man-in-the-Middle attacks, all packets that passed through the attacker are labeled as malicious.


In [None]:
# review files
import os

def print_directory_tree_with_sizes(path, indent=""):
    # list all items in the directory
    for item in os.listdir(path):
        item_path = os.path.join(path, item)
        
        if os.path.isfile(item_path):
            # If item is a file, print its name and size
            file_size = os.path.getsize(item_path)
            print(f"{indent}├── {item} ({file_size / (1024 * 1024):.2f} MB)")
        elif os.path.isdir(item_path):
            # If item is a directory, print its name and recurse
            print(f"{indent}├── {item}/")
            print_directory_tree_with_sizes(item_path, indent + "    ")

# Set the path to the directory containing the data
data_directory = "/home/userj/projects/SLLIM/data"

# Print the directory tree with file sizes
print(f"{data_directory}/")
print_directory_tree_with_sizes(data_directory)

/home/userj/projects/SLLIM/data/
├── Fuzzing/
    ├── Fuzzing_pcap.pcapng (470.12 MB)
    ├── Fuzzing_labels.csv (26.76 MB)
    ├── Fuzzing_dataset.csv (6166.79 MB)
├── Mirai Botnet/
    ├── mirai_labels.csv (2.19 MB)
    ├── Mirai_pcap.pcap (71.60 MB)
    ├── Mirai_dataset.csv (1304.86 MB)
├── OS Scan/
    ├── OS_Scan_dataset.csv (4665.07 MB)
    ├── OS_Scan_pcap.pcapng (317.65 MB)
    ├── OS_Scan_labels.csv (19.99 MB)
├── SSDP Flood/
    ├── SSDP_Flood_dataset.csv (11205.51 MB)
    ├── SSDP_Flood_labels.csv (49.49 MB)
    ├── SSDP_Flood_pcap.pcap (701.77 MB)
├── Active Wiretap/
    ├── Active_Wiretap_pcap.pcapng (484.56 MB)
    ├── Active_Wiretap_dataset.csv (6261.10 MB)
    ├── Active_Wiretap_labels.csv (27.19 MB)
├── Video Injection/
    ├── Video_Injection_labels.csv (29.59 MB)
    ├── Video_Injection_dataset.csv (6790.59 MB)
    ├── Video_Injection_pcap.pcapng (542.01 MB)
├── SYN DoS/
    ├── SYN_DoS_labels.csv (33.30 MB)
    ├── SYN_DoS_pcap.pcap (498.17 MB)
    ├── SYN_DoS_data

# Fuzzing Scenario

In [7]:
import pandas as pd

# sample Fuzzing_dataset.csv
file_path = "/home/userj/projects/SLLIM/data/Fuzzing/Fuzzing_dataset.csv"
fuzz_sample = pd.read_csv(file_path, nrows=1000, header=None)
fuzz_sample.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,105,106,107,108,109,110,111,112,113,114
0,1.0,1294.0,0.0,1.0,1294.0,0.0,1.0,1294.0,0.0,1.0,...,0.0,0.0,0.0,1.0,1294.0,0.0,1294.0,0.0,0.0,0.0
1,1.999741,1294.0,0.0,1.999844,1294.0,6.984919e-10,1.999948,1294.0,4.656613e-10,1.999995,...,0.0,0.0,0.0,1.999999,1294.0,3.1e-05,1294.0,9.313226e-10,0.0,0.0
2,2.999068,1294.0,0.0,2.999441,1294.0,6.984919e-10,2.999814,1294.0,2.328306e-10,2.999981,...,0.0,0.0,0.0,2.999998,1294.0,1.5e-05,1294.0,2.328306e-10,0.0,0.0
3,3.9978,1294.0,2.328306e-10,3.99868,1294.0,6.984919e-10,3.99956,1294.0,2.328306e-10,3.999956,...,2.328306e-10,0.0,0.0,3.999996,1294.0,0.0,1294.0,0.0,0.0,0.0
4,4.996693,1294.0,6.984919e-10,4.998016,1294.0,6.984919e-10,4.999338,1294.0,4.656613e-10,4.999934,...,0.0,0.0,0.0,4.999993,1294.0,1.5e-05,1294.0,2.328306e-10,0.0,0.0


In [8]:
# Check column types and memory usage
fuzz_sample.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Columns: 115 entries, 0 to 114
dtypes: float64(115)
memory usage: 898.6 KB


In [11]:
# describe
fuzz_sample.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,105,106,107,108,109,110,111,112,113,114
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,...,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,111.14767,1227.182903,148028.466609,140.036479,1227.919653,146927.093103,185.338238,1228.539491,145824.233177,214.229458,...,155995.505237,3.185422e-13,1.816217e-09,74.439867,1217.51619,359.611203,1300.582554,155899.647498,1.972591e-13,1.115807e-09
std,56.562165,331.104458,60386.745676,75.938119,331.168507,60192.420269,111.03123,331.228137,60380.631021,135.671518,...,83323.546575,7.159642e-13,4.41715e-09,44.877273,338.351809,158.127502,123.261517,83295.684301,4.086589e-13,2.45876e-09
min,1.0,60.0,0.0,1.0,60.0,0.0,1.0,60.0,0.0,1.0,...,0.0,0.0,0.0,1.0,60.0,0.0,60.0,0.0,-2.692652e-28,-1.7080390000000002e-17
25%,65.039988,1302.774207,123049.764934,72.702511,1304.842008,121550.98933,83.217437,1307.265469,119750.915857,88.450249,...,152885.701396,0.0,0.0,35.960516,1264.865972,390.974554,1270.521222,152861.793545,0.0,0.0
50%,121.700588,1318.127147,158716.437124,150.46469,1317.953072,153525.793431,192.103414,1317.826343,145407.023668,211.644605,...,178541.979824,0.0,0.0,71.89368,1335.210811,422.373047,1335.210811,178399.002219,0.0,0.0
75%,153.663978,1337.961909,185903.212625,201.761549,1339.484177,183706.379319,282.824156,1340.594993,183472.630197,332.546456,...,190939.658221,0.0,0.0,112.798964,1347.72647,436.762233,1347.72647,190761.247881,0.0,0.0
max,216.168957,1376.814791,425755.2781,266.907023,1371.96307,425755.900116,374.933424,1369.117495,425756.211124,457.268254,...,425756.249689,3.19606e-12,2.332312e-08,161.592281,1383.501331,652.5,1383.501331,425756.249997,1.566777e-12,1.256088e-08


In [15]:
from scapy.all import rdpcap

# Load only the first 100 packets
packets = rdpcap("/home/userj/projects/SLLIM/data/Fuzzing/Fuzzing_pcap.pcapng", count=10)
for packet in packets:
    print(packet.summary())

Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.100.5:61904 > 192.168.2.15:https A / Padding
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
Ether / IP / TCP 192.168.100.5:61904 > 192.168.2.15:https A / Padding
Ether / IP / TCP 192.168.2.15:https > 192.168.100.5:61904 A / Raw
