Skip to content

The ML source code and dataset of my Mastter dissertation "A Research and Implementation of Machine Learning Based Open-source Virtual Industrial Control System Honeypots Identification"

License

zh250/ICS-honeypot-identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICS honeypot identification

Files and directories in the repository

Original datasets of three ICS protocols (ATG, modbus, S7) after label and feature processing (Datasets were based on yunyueye's work, more details about datasets please refer here, The original raw data of datasets were captured and authorized by Ditecting with some attributes were queried from Shodan's API) for legal and ethcial reasons, all IP were hidden

the implementation of feature processing and machine learning algorithm (all were impelmented with scikit-learn) the processed data of original datasets are also in corresponding sub-directory of this folder The experimental results are in the folders of their corresonding algorithms of this folder, the filenames end with result, and the common features and all features are distinguished

The experimental results in this file just a summary of all results and compare with Wu's research (the paper is not open access till now). You can execute the program by yourself to check the result (mention: datasets and source code may be updated. Although I will try to make sure all files in the repository are updated correctly, the result may be different. It is just a reference)

The main entrance of all ML program, including four machine learning algorithm experiments of three protocols, distinguishes common features and all features, and supports parameter search

Others

Originally,there is a web GUI for demonstration (like Shodan's Honeyscore). However, without the IP address databases (they cannot be disclosed), it is meaningless to upload source code here

For researchers in the field: Naive Bayes classifier is unsutiable for the classifcation task here, because it cannot make sure OS and ISP type of a captured IP mutually independent. Some cloud service providers only provide certain OSs (like Linode), and IP of cloud service providers accounts for a non-negligible proportion in these datasets

The mian innovation in this project is the feature extraction and processing method, please refer here for more details, and more explanations and details will be supplemented in the future

How to execute

Requirements

Operating system and libraries: tested on Ubuntu 22.04 LTS with Python 3.10.12, numpy 1.20.2, pandas 2.1.3, scipy 1.11.4, sklearn 1.4.1.post1 and macOS 12.7.4 (Intel) with Python 3.9.6, numpy 1.24.3, pandas 2.0.1, scipy 1.10.1, sklearn 1.2.2

Executing steps

  1. Clone this repository: git clone https://github.com/zh250/ICS-honeypot-identification.git
  2. Open the directory: cd ICS-honeypot-identification
  3. There are two options to execute the program:
    • grant permission for main.py and execute it directly
      • grant permission for main.py by command: chmod 777 main.py
      • then execute main.py directly by command: ./mainpy
    • execute main.py by python
      • (make sure python3 is already in your PATH) execute command: python3 main.py

Future works

The project can be regarded as a work of cyber assets discovery and identification since it is a multi-class classifcation

More ICS and IoT and even some IT protocols which are used by CPSs may be studied in the future

Although it demonstrated a relatively good preformance, but preformance in Modbus is obviously worse than ATG and S7. Results of permutation feature importance and some other phenomena have not been explained till now.

These phenomena indicated further study and improvement of feature extraction and processing method are necessary. Some ideas were already in my mind, but immature.

About

The ML source code and dataset of my Mastter dissertation "A Research and Implementation of Machine Learning Based Open-source Virtual Industrial Control System Honeypots Identification"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages