Skip to content

My masters project on Botnet Network Intrusion Detection on Resource Constrained Devices for IoT Applications

Notifications You must be signed in to change notification settings

Bool-urns/iot_botnet_nids

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 

Repository files navigation

IoT-based Network Intrusion Detection

This is my masters project, which won the PwC Ireland award for M.Sc in Computing (March 2020)

Brief Synopsis

The project explores the idea of machine learning-based network intrusion detection (NIDS) on so-called 'Internet of Things' (IoT) devices, focusing on one of the biggest threats to IoT devices: becoming compromised as part of a Botnet.

From examining prior research, it was determined that deploying a lightweight NIDS on gateway devices in the IoT architecture could be a viable solution for preventing egde IoT devices from being compromised as part of a botnet. Over the course of this project, a simple NIDS was created on a Raspberry Pi single-board computer and five lightwieght classification algorithms were evaluated for use as part of the system. Each algorithm was trained using simulated botnet attack data, allowing for specific multi-class classification of attack behaviour. The whole system was evaluated in terms of a number of performance metrics and energy consumption (a circuit containing current sensor was created to measure this).

Project Structure

  • The final project paper can be found in the docs folder
  • src contains:
    1. Dataset and Features: this contains the condensed dataset used in this project orginally from here and Python notebooks explaining the feature extraction process
    2. Measuring: this contains the bash and Python scripts used for measuring the metrics outlined the Metrics section below
    3. NIDS: this contains the full implementation of the basic NIDS used for testing in this project
    4. Classification: this contains the implementations used of the five classification algorithms outlined below

Installation and Usage

Due to the fact that this was a research project with a focus on attaining results instead of creating a usable application for others, this page is intended more as a guide to the project itself and is not intended to be installed or used by others.

Data Gathering

With the huge Machine Learning element to the project, using a high-quality and realistic dataset was fundemental. Initially, the intention with this project was to create a dataset to be used too. However after re-evaluating the priorities of the project, I decided this was outside the scope.

Many datasets exist for ML-based intrusion detection, like KDD and the DARPA ones however with the focus of the project being on IoT and botnets, these datasets seemed unrealistic (and also quite old). A Number of IoT-based datasets existed such as the one created from this paper but as this comparison paper found in it's evaluation of IoT IDS datasets, generally these datatsets suffered from a lack of inclusion of IoT traffic, were not realistic or did not include realistic attack scenarios. This same paper that did these comparisons also created it's own dataset called BoT-IoT.

This dataset aimed to combat the three issues listed above by simulating more realistic attack scenarios from a range of IoT devices and was ultimately chosen to be used in this project.

Dataset Evaluation and Feature Selection

The dataset contained ten attack classes, broken up into three main groups:

  • Denial of service:
    • TCP Dos
    • TCP DDoS
    • UDP DoS
    • UDP DDoS
    • HTTP DoS
    • HTTP DDos
  • Data theft:
    • Key logging
    • Data Exfiltration
  • Data gathering:
    • Service Scanning
    • OS fingerprinting

    This meant that the models would be trained using a multi-class dataset of eleven classes (including normal/benign traffic). However these eleven classes aren't represented equally in the dataset, most of the data represents the six denial of service classes. Nonetheless, the 15GB full dataset is available and this was used to create a more balanced version, that better represented each of the classes. However, despite these efforts, the two data theft classes: Keylogging and Data Exfiltration were still vastly under-represented, with the latter possessing only 116 samples in the entire 15GBs. With these removed, there was a total of nine classes.

    Despite the lack of the two data theft classes, this multi- class dataset facilitated the ability to train each of the models for more detailed botnet detection. Potentially allowing for the multi-class predictions to be used in real-time to take more specific action against the type of threat detected.

    For the sake of brevity, the feature evaluation and selection process isn't covered fully here (look at page five of the paper in the Docs folder for more detail). What will be said is that each of the original twenty one features included in this dataset, each were evaluated in terms of the CPU utilisation, latency and maximum resident set size (an approximate value for the required RAM of a process) needed to extract each feature. This was done to better understand the most lightweight of the features.

    The Five Algorithms

    The five algorithms below were used for comparison:

    1. ProtoNN
    2. Bonsai
    3. Super Fast Support Vector Classifier
    4. XGBoost
    5. Random Forest (Scikit learn implementation)

    These are listed in descending order in terms of how constrained each implementation is. For more information on each algorithm and their configuration for this project see pages three to nine of the paper in the Docs folder.

About

My masters project on Botnet Network Intrusion Detection on Resource Constrained Devices for IoT Applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published