# Network-based Intrusion Detection System
*Evaluation of Classifiers (Scikit Learn)*

**Dataset** [NSL-KDD](https://www.unb.ca/cic/datasets/nsl.html)

**Classification**

1. Binary (Benign and Attack classes)
2. Multi-class (Benign, Probe, DoS, U2R, and R2L classes)

**Classifiers**

1. Decision Tree
2. K-Nearest Neighbours
3. Classification and Regression Tree
4. Random Forest
5. AdaBoost
6. Logistics Regression
7. Linear Discriminant Analysis
8. Quadratic Discriminant Analysis
9. Multi-Layer Perceptron
10. Linear SVC

**Metrics**

1. Acccuracy
2. Precision
3. Recall
4. F1-score
5. Execution Time

Metrics and Confusion Matrices: https://github.com/BenoyRNair/NIDS/blob/main/NIDS_EvaluateClassifiers.pdf


[This notebook](https://github.com/BenoyRNair/NIDS_MultipleClassifiers) was tested in ***Google Colab***.

## Credits

1. Adopted from [Predicting Network Attacks](https://colab.research.google.com/github/smlra-kjsce/Cyber-ML-DL-101/blob/master/Predicting_Network_Attacks.ipynb)
2. [Scikit Learn](https://scikit-learn.org/)
3. [Alive progress](https://pypi.org/project/alive-progress/)

## Licence
@Author [Benoy R Nair](https://github.com/BenoyRNair)

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied See the License for the specific language governing permissions and limitations under the License.

# Setup

In [None]:
!pip install wget
# Download NSL-KDD.zip from the website
!wget http://205.174.165.80/CICDataset/NSL-KDD/Dataset/NSL-KDD.zip
# alternatively if the above link does not work
#!wget -O NSL-KDD.zip https://cloudstor.aarnet.edu.au/plus/s/P13PYzoU5olpDwz/download
!wget -O name.txt http://kdd.ics.uci.edu/databases/kddcup99/training_attack_types
#!wget -O name.txt http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names
!unzip NSL-KDD.zip
# To show the progress of execution with the different classifiers
!pip install alive-progress

In [None]:
classification_list = ['Binary', 'Multi-class']

#@markdown Specify the classification (Binary or Multi-class).

#@markdown Binary: Benign and Attack classes

#@markdown Multi-class: Benign, Probe, DoS, U2R, and R2L classes


from ipywidgets import interactive
import ipywidgets as widgets

def model(CLASSIFICATION):
  return CLASSIFICATION

classification_widget = interactive (model, CLASSIFICATION=classification_list)
display (classification_widget)

# Preprocessing

In [None]:
import os
from collections import defaultdict
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

dataset_root = '/content'
train_file = os.path.join(dataset_root, 'KDDTrain+.txt')
test_file = os.path.join(dataset_root, 'KDDTest+.txt')

header_names = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised', 'root_shell', 'su_attempted', 'num_root', 'num_file_creations', 'num_shells', 'num_access_files', 'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate', 'attack_type', 'success_pred']

col_names = np.array(header_names)

nominal_idx = [1, 2, 3]
binary_idx = [6, 11, 13, 14, 20, 21]
numeric_idx = list(set(range(41)).difference(nominal_idx).difference(binary_idx))

nominal_cols = col_names[nominal_idx].tolist()
binary_cols = col_names[binary_idx].tolist()
numeric_cols = col_names[numeric_idx].tolist()

category = defaultdict(list)
category['benign'].append('normal')

name = os.path.join(dataset_root,'name.txt')

attack_mapping = {'apache2': 'dos',
 'back': 'dos',
 'buffer_overflow': 'u2r',
 'ftp_write': 'r2l',
 'guess_passwd': 'r2l',
 'httptunnel': 'r2l',
 'imap': 'r2l',
 'ipsweep': 'probe',
 'land': 'dos',
 'loadmodule': 'u2r',
 'mailbomb': 'dos',
 'mscan': 'probe',
 'multihop': 'r2l',
 'named': 'r2l',
 'neptune': 'dos',
 'nmap': 'probe',
 'normal': 'benign',
 'perl': 'u2r',
 'phf': 'r2l',
 'pod': 'dos',
 'portsweep': 'probe',
 'processtable': 'dos',
 'ps': 'u2r',
 'rootkit': 'u2r',
 'saint': 'probe',
 'satan': 'probe',
 'sendmail': 'r2l',
 'smurf': 'dos',
 'snmpgetattack': 'r2l',
 'snmpguess': 'r2l',
 'spy': 'r2l',
 'sqlattack': 'u2r',
 'teardrop': 'dos',
 'udpstorm': 'dos',
 'warezclient': 'r2l',
 'warezmaster': 'r2l',
 'worm': 'dos',
 'xlock': 'r2l',
 'xsnoop': 'r2l',
 'xterm': 'u2r'}

attack_mapping_2 = attack_mapping.copy()

for key, value in attack_mapping_2.items():
  if key != 'normal':
    attack_mapping_2.update({key:'attack'})

selected_classification = classification_widget.result

train_df = pd.read_csv(train_file, names=header_names)
train_df['attack_category'] = train_df['attack_type'].map(lambda x: attack_mapping[x] if classification_widget.result == 'Multi-class' else attack_mapping_2[x])
train_df.drop(['success_pred'], axis=1, inplace=True)
    
test_df = pd.read_csv(test_file, names=header_names)
test_df['attack_category'] = test_df['attack_type'].map(lambda x: attack_mapping[x] if classification_widget.result == 'Multi-class' else attack_mapping_2[x])
test_df.drop(['success_pred'], axis=1, inplace=True)

train_attack_types = train_df['attack_type'].value_counts()
train_attack_cats = train_df['attack_category'].value_counts()

test_attack_types = test_df['attack_type'].value_counts()
test_attack_cats = test_df['attack_category'].value_counts()

train_df[binary_cols].describe().transpose()
train_df.groupby(['su_attempted']).size()

train_df['su_attempted'].replace(2, 0, inplace=True)
test_df['su_attempted'].replace(2, 0, inplace=True)
train_df.groupby(['su_attempted']).size()

train_df.groupby(['num_outbound_cmds']).size()

train_df.drop('num_outbound_cmds', axis = 1, inplace=True)
test_df.drop('num_outbound_cmds', axis = 1, inplace=True)
numeric_cols.remove('num_outbound_cmds')

train_Y = train_df['attack_category']
train_x_raw = train_df.drop(['attack_category','attack_type'], axis=1)
test_Y = test_df['attack_category']
test_x_raw = test_df.drop(['attack_category','attack_type'], axis=1)

combined_df_raw = pd.concat([train_x_raw, test_x_raw])
combined_df = pd.get_dummies(combined_df_raw, columns=nominal_cols, drop_first=True)

train_x = combined_df[:len(train_x_raw)]
test_x = combined_df[len(train_x_raw):]

dummy_variables = list(set(train_x)-set(combined_df_raw))

from sklearn.preprocessing import StandardScaler

durations = train_x['duration'].values.reshape(-1, 1)
standard_scaler = StandardScaler().fit(durations)
scaled_durations = standard_scaler.transform(durations)
#pd.Series(scaled_durations.flatten()).describe()

# Evaluation of Classifiers (NIDS)

Classifier training & prediction

Review confusion matrix for each classifier

In [None]:
import time
import warnings
warnings.filterwarnings("ignore")

from sklearn import model_selection
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.neural_network import MLPClassifier

models = []
models.append(('DT', DecisionTreeClassifier(random_state=17)))
models.append(('KNN', KNeighborsClassifier(n_neighbors=9)))
models.append(('CART', DecisionTreeClassifier(max_depth=5)))
models.append(('RF', RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)))    
models.append(('ABoost', AdaBoostClassifier()))
models.append(('LR', LogisticRegression(solver='lbfgs', max_iter=200)))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('QDA', QuadraticDiscriminantAnalysis()))
models.append(('MLP', MLPClassifier()))
models.append(('LinSVC', LinearSVC()))

from sklearn.metrics import confusion_matrix, zero_one_loss, accuracy_score, precision_score, recall_score, f1_score
import seaborn as sns

from alive_progress import alive_bar
import time

metrics_summary = {}

if classification_widget.result != selected_classification:
  print ('Classification has been changed.\nRun the \'Preprocessing\' section again.')
else:
  xlabel = 'Predicted'
  ylabel = 'Actual'
  title = 'Confusion Matrix'

  if classification_widget.result == 'Multi-class':
    tickLabels = ['benign', 'probe','dos','u2r','r2l']
    figsize=(15, 12)
  else:
    tickLabels = ['benign', 'attack']
    figsize=(5, 4)

  print ("\n")
  
  with alive_bar(len (models), force_tty = True, stats = False) as bar:
    for name, classifier in models:
      bar.title_length = 10
      bar.title = name
      start_time = time.time()

      classifier.fit(train_x, train_Y)
      pred_y = classifier.predict(test_x)

      delta = time.time() - start_time

      results = confusion_matrix(test_Y, pred_y)
      error = zero_one_loss(test_Y, pred_y)
      accuracy = accuracy_score (test_Y, pred_y)
      precision = precision_score (test_Y, pred_y, average='weighted')
      recall = recall_score (test_Y, pred_y, average='macro')
      f1score = f1_score (test_Y, pred_y, average='weighted')
      
      plt.figure(figsize=figsize)
      plt.subplots_adjust(hspace=0.5)
      ax = plt.subplot()
      sns.heatmap(results, annot=True, ax = ax, cmap='Blues', fmt='g', cbar=False)

      ax.set_xlabel (xlabel)
      ax.set_ylabel (ylabel)
      ax.set_title (title + ': ' + name)
      ax.xaxis.set_ticklabels (tickLabels)
      ax.yaxis.set_ticklabels (tickLabels)

      metrics_summary [name] = {
          'accuracy' : accuracy,
          'precision' : precision,
          'recall' : recall,
          'f1score' : f1score,
          'delta': delta
      }

      bar()

  print ("\nConfusion Matrices...\n")

## Metrics

Review Accuracy, Precision, Recall, F1-score and Execution Time

In [None]:
if classification_widget.result != selected_classification:
  print ('Classification has been changed.\nRun the \'Preprocessing\' section again.')
else:
  print("{:<10} {:<8}\t{:<8}\t{:<8}\t{:<8}\t{:<20}".format('Name', 'Accuracy', 'Precision', 'Recall', 'F1-score', 'Execution Time'))

  for key, value in metrics_summary.items():
    print("{:<10} {:.2f} %\t{:.2f} % \t{:.2f} % \t{:.2f} % \t{:.2f} secs".format(key, value['accuracy'] * 100, value['precision'] * 100, value['recall'] * 100, value['f1score'] * 100, value['delta']))