Detection and Classification of Network Traffic Anomalies

Experiments are based on the light version of IoT-23 [1] dataset.

1. Prerequisites

1.1. Install Project Dependencies

No	Name	Version	Description
1	Python	3.8.8	Programming Language
2	scikit-learn	0.24.1	Tools for Machine Learning in Python
3	NymPy	1.19.5	Tools for Scientific Computing in Python
4	pandas	1.2.2	Tools for Data Analysis & Data Manipulation in Python
5	matplotlib	3.3.4	Visualization with Python
6	seaborn	0.11.1	Statistical data visualization
7	psutil	5.8.0	Cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python
8	scikit-plot	0.3.7	Library for visualizations
9	pickle	-	Python object serialization for model serialization

1.2. Download & Extract Dataset

Download the lighter version of IoT-23 (archive size - 8.8 GB)

The lighter version contains only labeled flows without the pcaps files

Extract Archive (size - approx. 44 GB)

2. Setup Project

Clone this repo
Install missing libraries
Open config.py and configure required directories

iot23_scenarios_dir should point to the home folder, where iot23 scenarios are located

iot23_attacks_dir will be used to store files for each attack type from the scenarios files

iot23_data_dir will be used to store files with data, extracted from attack files

iot23_experiments_dir will be used to store experiment files, including trained models and results (Excel files & Charts)

Check configuration by running run_step00_configuration_check.py

Make sure the output message says that you may continue to the next step. If not, then check your configuration and fix the errors.

3. Prepare Data for ML

3.1. Extract Data From Scenarios

Run data extraction by running run_step01_extract_data_from_scenarios.py

Even though, there are multiple scenarios, files still contain mixed attack and benign traffic. For this reason we are going to extract the entries of a similar type into separate files. The output files will be stored to iot23_attacks_dir.

⚠️ This step takes about 2h to complete.

3.2. Shuffle File Content

Run content shuffling by running run_step01_shuffle_file_content.py

This step will provide more reliable data samples. Larger files are split into partitions of 1 GB. Then the content of all partitions (of the same file) gets shuffled. When shuffling is ready, the partitions are merged back into a single file, that replaces the original one.

⚠️ This step takes about 2.5 - 3h to complete.

Option 1: Run Demo

1.1. Prerequisites

Download & Extract Dataset

Setup Project

Prepare Data for ML

1.2. Run demo by running run_demo.py

Use this option to check if everything is ok. It uses only 10_000 records per file, so that the whole process runs for a couple of minutes, if the data is already prepared.

Option 2: Run Designed Experiments

2.1. Prerequisites

Download & Extract Dataset

Setup Project

Prepare Data for ML

2.2. Run designed experiments by running run_experiments.py

⚠️⚠️⚠️ This step takes about 24h to complete!

Data samples for training and testing consist of more than 20M records.

TODO

Option 3: Run Custom Experiments

3.1. Prerequisites

Download & Extract Dataset

Setup Project

Prepare Data for ML

3.2. Run designed experiments by running run_experiments.py

TODO

[1]: “Stratosphere Laboratory. A labeled dataset with malicious and benign IoT network traffic. January 22th. Agustin Parmisano, Sebastian Garcia, Maria Jose Erquiaga. Online: https://www.stratosphereips.org/datasets-iot23

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
logs		logs
output		output
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
versions.py		versions.py

License

Iretha/IoT23-network-traffic-anomalies-classification

Folders and files

Latest commit

History

Repository files navigation

Detection and Classification of Network Traffic Anomalies

1. Prerequisites

1.1. Install Project Dependencies

1.2. Download & Extract Dataset

2. Setup Project

3. Prepare Data for ML

3.1. Extract Data From Scenarios

3.2. Shuffle File Content

Option 1: Run Demo

1.1. Prerequisites

1.2. Run demo by running run_demo.py

Option 2: Run Designed Experiments

2.1. Prerequisites

2.2. Run designed experiments by running run_experiments.py

Option 3: Run Custom Experiments

3.1. Prerequisites

3.2. Run designed experiments by running run_experiments.py

About

Topics

Resources

License

Stars

Watchers

Forks

Languages