## IoT Botnet Detection

Due to Increasing usage of digital communication in this digital era, cyber security is crucial to maintain a high level of safety. To prevent an increasing number of cyber attacks, traditional security system firewalls, encryption is not enough. There is a need for intrusion detection systems that can integrate with traditional systems and assure a high level of security of data.


Data set contain 11 different csv file. each file represent different type of attack data. There is two malware used on this dataset. Mirai and Gafgyt.

* benign_traffic.csv - Benign Traffic
* mirai_attacks/ack.csv - Mallicious 1
* mirai_attacks/scan.csv - mallicious 2
* mirai_attacks/syn.csv - mallicious 3
* mirai_attacks/udp.csv - mallicious 4
* mirai_attacks/udpplain.csv - mallicious 5
* gafgyt_attacks/combo.csv - mallicious 6
* gafgyt_attacks/junk.csv - mallicious 7
* gafgyt_attacks/scan.csv - mallicious 8
* gafgyt_attacks/tcp.csv - mallicious 9
* gafgyt_attacks/udp.csv - mallicious 10

In [1]:
# Importing necssary modules
import pandas as pd
import numpy as np
import seaborn as sns
import os
from glob import glob

In [2]:
# Data folder path
base_directory = '../rawdata'
file_extension = "*.csv"

The data contain 9 different IoT devices, each devices having benign / mallicious traffic data. since all csv files are in seperate folder, i created simple function to load the csv file into data frame

In [3]:
# This function will read all the bengin data and load into bengin dataframe.
def benign_traffic_data(PATH, EXT):
    """
    Creates a data frame consisting of all the .csv-files in a given directory. The directory should
    be where the unzipped data files are stored. Assumes the file structurce is
        device name
            mirai_attacks(folder)
            gafgyt_attacks(folder)
            benign_traffic.csv
    Parameters
    ----------
    PATH : str
        The directory in which the data files are stored. 
    EXT : str
        Extension of the file
        
    Returns
    -------
    benign_data : pandas data frame 
        consisting of all the bengin data.
            
    """
    try:
        benign_dfs = []
        for path, subdir, files in os.walk(PATH):
            for file in glob(os.path.join(path, EXT)):
                if 'benign_traffic' in file:
                    data = pd.read_csv(file)
                    data['label'] = 'Benign'
                    data['device'] = file.split('\\')[1]
                    benign_dfs.append(data)
        
        benign_data = pd.concat(benign_dfs, ignore_index=True)
        return benign_data
    except Exception as e:
        return e

#### Loading Benign Data

In [4]:
benign_data = benign_traffic_data(base_directory, file_extension)

In [5]:
type(benign_data)

pandas.core.frame.DataFrame

In [6]:
# Checking first few row of the data
benign_data.head(3)

Unnamed: 0,MI_dir_L5_weight,MI_dir_L5_mean,MI_dir_L5_variance,MI_dir_L3_weight,MI_dir_L3_mean,MI_dir_L3_variance,MI_dir_L1_weight,MI_dir_L1_mean,MI_dir_L1_variance,MI_dir_L0.1_weight,...,HpHp_L0.1_pcc,HpHp_L0.01_weight,HpHp_L0.01_mean,HpHp_L0.01_std,HpHp_L0.01_magnitude,HpHp_L0.01_radius,HpHp_L0.01_covariance,HpHp_L0.01_pcc,label,device
0,1.0,60.0,0.0,1.0,60.0,0.0,1.0,60.0,0.0,1.0,...,0.0,1.0,60.0,0.0,60.0,0.0,0.0,0.0,Benign,Danmini_Doorbell
1,1.0,354.0,0.0,1.0,354.0,0.0,1.0,354.0,0.0,1.0,...,0.0,5.319895,344.262695,4.710446,344.262695,22.188299,0.0,0.0,Benign,Danmini_Doorbell
2,1.857879,360.45898,35.789338,1.912127,360.275733,35.923972,1.969807,360.091968,35.991542,1.996939,...,0.0,6.318264,347.703087,9.03466,347.703087,81.625077,0.0,0.0,Benign,Danmini_Doorbell


In [7]:
# Dimention of the data
benign_data.shape

(555932, 117)

In [12]:
# Information about Dataframe
benign_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 555932 entries, 0 to 555931
Columns: 117 entries, MI_dir_L5_weight to device
dtypes: float64(115), object(2)
memory usage: 496.2+ MB


In [13]:
# Individual device data count
benign_data['device'].value_counts()

Philips_B120N10_Baby_Monitor                175240
Provision_PT_838_Security_Camera             98514
Provision_PT_737E_Security_Camera            62154
Samsung_SNH_1011_N_Webcam                    52150
Danmini_Doorbell                             49548
SimpleHome_XCS7_1002_WHT_Security_Camera     46585
Ennio_Doorbell                               39100
SimpleHome_XCS7_1003_WHT_Security_Camera     19528
Ecobee_Thermostat                            13113
Name: device, dtype: int64

In [14]:
# Checking data contain any null value
benign_data.isnull().values.any()

False