## IoT Botnet Detection

Due to Increasing usage of digital communication in this digital era, cyber security is crucial to maintain a high level of safety. To prevent an increasing number of cyber attacks, traditional security system firewalls, encryption is not enough. There is a need for intrusion detection systems that can integrate with traditional systems and assure a high level of security of data.

In [1]:
# Importing necssary modules
import pandas as pd
import numpy as np
import seaborn as sns
import os

In [2]:
# Data folder path
raw_data_path = '../rawdata'

The data contain 9 different IoT devices, each devices having benign / mallicious traffic data.

#### Device 1 - Danmini_Doorbell

Damini Doorbell data set contain 11 different csv file. each file represent different type of attack data. There is two malware used on this dataset. Mirai and Gafgyt.

* benign_traffic.csv - Benign Traffic
* mirai_attacks/ack.csv - Mallicious 1
* mirai_attacks/scan.csv - mallicious 2
* mirai_attacks/syn.csv - mallicious 3
* mirai_attacks/udp.csv - mallicious 4
* mirai_attacks/udpplain.csv - mallicious 5
* gafgyt_attacks/combo.csv - mallicious 6
* gafgyt_attacks/junk.csv - mallicious 7
* gafgyt_attacks/scan.csv - mallicious 8
* gafgyt_attacks/tcp.csv - mallicious 9
* gafgyt_attacks/udp.csv - mallicious 10

In [3]:
# Benign Traffic Data
damini_doorbell_path = raw_data_path + '/Danmini_Doorbell/'
damini_doorbell_benign = pd.read_csv(damini_doorbell_path + 'benign_traffic.csv')

In [4]:
damini_doorbell_benign.shape

(49548, 115)

In [5]:
damini_doorbell_benign.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49548 entries, 0 to 49547
Columns: 115 entries, MI_dir_L5_weight to HpHp_L0.01_pcc
dtypes: float64(115)
memory usage: 43.5 MB


In [6]:
damini_doorbell_benign.head()

Unnamed: 0,MI_dir_L5_weight,MI_dir_L5_mean,MI_dir_L5_variance,MI_dir_L3_weight,MI_dir_L3_mean,MI_dir_L3_variance,MI_dir_L1_weight,MI_dir_L1_mean,MI_dir_L1_variance,MI_dir_L0.1_weight,...,HpHp_L0.1_radius,HpHp_L0.1_covariance,HpHp_L0.1_pcc,HpHp_L0.01_weight,HpHp_L0.01_mean,HpHp_L0.01_std,HpHp_L0.01_magnitude,HpHp_L0.01_radius,HpHp_L0.01_covariance,HpHp_L0.01_pcc
0,1.0,60.0,0.0,1.0,60.0,0.0,1.0,60.0,0.0,1.0,...,0.0,0.0,0.0,1.0,60.0,0.0,60.0,0.0,0.0,0.0
1,1.0,354.0,0.0,1.0,354.0,0.0,1.0,354.0,0.0,1.0,...,34.095047,0.0,0.0,5.319895,344.262695,4.710446,344.262695,22.188299,0.0,0.0
2,1.857879,360.45898,35.789338,1.912127,360.275733,35.923972,1.969807,360.091968,35.991542,1.996939,...,100.081513,0.0,0.0,6.318264,347.703087,9.03466,347.703087,81.625077,0.0,0.0
3,1.0,337.0,0.0,1.0,337.0,0.0,1.0,337.0,0.0,1.0,...,0.0,0.0,0.0,1.0,337.0,0.0,337.0,0.0,0.0,0.0
4,1.680223,172.140917,18487.44875,1.79358,182.560279,18928.1753,1.925828,193.165753,19153.79581,1.992323,...,0.0,0.0,0.0,1.0,60.0,0.0,60.0,0.0,0.0,0.0


In [7]:
import glob

In [8]:
#Mirai_attacks data
damini_doorbell_mirai = pd.concat(map(pd.read_csv, glob.glob(damini_doorbell_path + 'mirai_attacks/*.csv')))

In [10]:
damini_doorbell_mirai.shape

(652100, 115)

In [11]:
damini_doorbell_mirai.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 652100 entries, 0 to 81981
Columns: 115 entries, MI_dir_L5_weight to HpHp_L0.01_pcc
dtypes: float64(115)
memory usage: 577.1 MB


In [12]:
damini_doorbell_mirai.head()

Unnamed: 0,MI_dir_L5_weight,MI_dir_L5_mean,MI_dir_L5_variance,MI_dir_L3_weight,MI_dir_L3_mean,MI_dir_L3_variance,MI_dir_L1_weight,MI_dir_L1_mean,MI_dir_L1_variance,MI_dir_L0.1_weight,...,HpHp_L0.1_radius,HpHp_L0.1_covariance,HpHp_L0.1_pcc,HpHp_L0.01_weight,HpHp_L0.01_mean,HpHp_L0.01_std,HpHp_L0.01_magnitude,HpHp_L0.01_radius,HpHp_L0.01_covariance,HpHp_L0.01_pcc
0,1.0,566.0,0.0,1.0,566.0,0.0,1.0,566.0,0.0,1.0,...,0.0,0.0,0.0,1.0,566.0,0.0,566.0,0.0,0.0,0.0
1,1.996585,566.0,5.820766e-11,1.99795,566.0,5.820766e-11,1.999316,566.0,0.0,1.999932,...,0.0,0.0,0.0,1.0,566.0,0.0,566.0,0.0,0.0,0.0
2,2.958989,566.0,0.0,2.975291,566.0,5.820766e-11,2.991729,566.0,5.820766e-11,2.999171,...,0.0,0.0,0.0,1.0,566.0,0.0,566.0,0.0,0.0,0.0
3,3.958979,566.0,0.0,3.975285,566.0,0.0,3.991727,566.0,1.164153e-10,3.999171,...,0.0,0.0,0.0,1.0,566.0,0.0,566.0,0.0,0.0,0.0
4,4.914189,566.0,1.164153e-10,4.948239,566.0,5.820766e-11,4.982654,566.0,5.820766e-11,4.998261,...,0.0,0.0,0.0,1.0,566.0,0.0,566.0,0.0,0.0,0.0


In [9]:
# Gafgyt
damin_doorbell_gafgyt = pd.concat(map(pd.read_csv, glob.glob(damini_doorbell_path + 'gafgyt_attacks/*.csv')))

In [13]:
damin_doorbell_gafgyt.shape

(316650, 115)

In [14]:
damin_doorbell_gafgyt.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 316650 entries, 0 to 105873
Columns: 115 entries, MI_dir_L5_weight to HpHp_L0.01_pcc
dtypes: float64(115)
memory usage: 280.2 MB


In [15]:
damin_doorbell_gafgyt.head()

Unnamed: 0,MI_dir_L5_weight,MI_dir_L5_mean,MI_dir_L5_variance,MI_dir_L3_weight,MI_dir_L3_mean,MI_dir_L3_variance,MI_dir_L1_weight,MI_dir_L1_mean,MI_dir_L1_variance,MI_dir_L0.1_weight,...,HpHp_L0.1_radius,HpHp_L0.1_covariance,HpHp_L0.1_pcc,HpHp_L0.01_weight,HpHp_L0.01_mean,HpHp_L0.01_std,HpHp_L0.01_magnitude,HpHp_L0.01_radius,HpHp_L0.01_covariance,HpHp_L0.01_pcc
0,1.0,98.0,0.0,1.0,98.0,0.0,1.0,98.0,0.0,1.0,...,0.0,0.0,0.0,1.0,98.0,0.0,98.0,0.0,0.0,0.0
1,1.029,98.0,1.818989e-12,1.11952,98.0,0.0,1.492583,98.0,3.637979e-12,1.93164,...,1.818989e-12,0.0,0.0,1.992944,98.0,1e-06,138.592929,1.818989e-12,0.0,0.0
2,1.504156,76.725612,228.1808,1.729662,79.499272,249.746357,2.294102,84.051188,251.7926,2.904273,...,0.0,0.0,0.0,1.0,66.0,0.0,114.856432,0.0,0.0,0.0
3,2.460087,75.617679,137.22,2.699075,77.461807,164.269331,3.280499,80.987267,196.4467,3.902546,...,0.0,0.0,0.0,1.0,74.0,0.0,74.0,0.0,0.0,0.0
4,3.460055,75.150149,98.09937,3.699054,76.525944,122.224798,4.28049,79.354915,159.2943,4.902545,...,0.0,0.0,0.0,1.0,74.0,0.0,74.0,0.0,0.0,0.0


There are 49548 rows of benign data, 652100 rows of mirai data, 316650 rows of gafgyt data. Mirai and gafgyt is combined dataframe of all 5 class of attacks.

In [18]:
damini_doorbell_benign.columns == damini_doorbell_mirai.columns

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

In [19]:
damin_doorbell_gafgyt.columns == damini_doorbell_mirai.columns

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

In [20]:
damini_doorbell_benign.columns == damin_doorbell_gafgyt.columns

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

All three dataframe having same features