# ** PLEASE RUN THIS IN GOOGLE COLAB ONLY!!**

# ** Network Intrusion Detection System using ML**
** PLEASE RUN THIS IN GOOGLE COLAB ONLY!!**

### 1. Mount drive to download Datasets
https://www.kaggle.com/datasets/sampadab17/network-intrusion-detection/data



In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

### 2. Download and unzip datasets



Make directory `IDS_Dataset`

In [None]:
%cd /content/gdrive/My\ Drive
%mkdir IDS_Dataset
%cd IDS_Dataset/

In [None]:
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/kddcup.data.zip
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/kddcup.data_10_percent.zip
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/kddcup.newtestdata_10_percent_unlabeled.zip
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/kddcup.testdata.unlabeled.zip
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/corrected.zip
!wget --no-check-certificate http://kdd.org/cupfiles/KDDCupData/1999/kddcup.testdata.unlabeled_10_percent.zip


In [None]:
!unzip kddcup.data.zip -d data
!unzip kddcup.data_10_percent.zip -d data
!unzip kddcup.newtestdata_10_percent_unlabeled.zip -d data
!unzip kddcup.testdata.unlabeled.zip -d data
!unzip corrected.zip -d data
!unzip kddcup.testdata.unlabeled_10_percent.zip -d data

### 3. Reading Datasets

In [None]:
import pandas as pd

dataset_head = ['duration','protocol_type','service','src_bytes','dst_bytes','flag','land','wrong_fragment','urgent',
'hot','num_failed_logins','logged_in','num_compromised','root_shell','su_attempted','num_root','num_file_creations',
'num_shells','num_access_files','num_outbound_cmds','is_host_login','is_guest_login','count','serror_rate',
'rerror_rate','same_srv_rate','diff_srv_rate','srv_count','srv_serror_rate','srv_rerror_rate','srv_diff_host_rate',
'dst_host_count','dst_host_srv_count','dst_host_same_srv_rate','dst_host_diff_srv_rate','dst_host_same_src_port_rate',
'dst_host_srv_diff_host_rate','dst_host_serror_rate','dst_host_srv_serror_rate','dst_host_rerror_rate','dst_host_srv_rerror_rate','class']

train = pd.read_csv (r'/content/gdrive/My Drive/IDS_Dataset/data/kddcup.data.txt', header = None, nrows=4817099)
test = pd.read_csv (r'/content/gdrive/My Drive/IDS_Dataset/data/kddcup.data_10_percent.txt', header = None, nrows=485797)

train.columns = dataset_head
test.columns = dataset_head

print(train.head())
print(test.tail())

### 4. Preprocessing
Prepocessing of dataset is important to get the dataset in desired format. It also includes converting non-number values to numbers using Label Encoding

 Here we can use two types of methods, Simple Label Encoding & One Hot Encoder.

 **Label Encoder**

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for column in train.columns:
    if train[column].dtype == type(object):
        train[column] = le.fit_transform(train[column])

for column in test.columns:
    if test[column].dtype == type(object):
        test[column] = le.fit_transform(test[column])

**Split Train & Test dataset into X and Y**

In [None]:
#Since last column is the labels, we are separating them as X and Y,
# X are the features and Y is the assigned labels of threat

trainX = train.iloc[:,:41]
trainY = train.iloc[:,-1]

testX = test.iloc[:,:41]
testY = test.iloc[:,-1]



### Models Used for training models

**Random Forest Classifier**

Random Forest Classifier is an ensemble tree-based learning algorithm. It is a set of decision trees from randomly selected subset of training set. It aggregates the output from different decision trees to decide the final output result.


* Training the model

In [None]:
from sklearn.ensemble import RandomForestClassifier
#Create a RandomForest Classifier
RFclf = RandomForestClassifier(n_estimators=100, n_jobs=-1)  # n_estimators refers to the number of trees in the forest


#Train the model using the training dataset
print(RFclf.fit(trainX,trainY))


* Predicting the accuracy

In [None]:
from sklearn.metrics import accuracy_score

predrf = RFclf.predict(testX) #Predicting on the test dataset
print(accuracy_score(predrf, testY)*100,'%')

VISUALISATION OF ALERT MESSAGE

In [None]:
# Predict using the trained model
predictions = RFclf.predict(testX)

# Prepare data for  summary
summary_data = []
intrusion_count = 0

for i, pred_label in enumerate(predictions):
    if pred_label == 1:
        intrusion_count += 1
        summary_data.append([f"Sample {i+1}", "Network Intrusion Detected"])

    if intrusion_count >= 5:  # Display only the first 5 intrusions
        break
# Display  summary
print("Intrusion Detection Summary:")
for sample, message in summary_data:
    print(f"{sample}: {message}")

# Display total intrusion count
print(f"\nTotal Network Intrusions Detected: {intrusion_count}/{len(predictions)}")


In [None]:
"""  *****alternate method to display intrusions******

# Predict using the trained model
predictions = RFclf.predict(testX)

# Check predictions and display detection messages
for i, pred_label in enumerate(predictions):
    if pred_label == 1:  # Assuming 1 is the label for network intrusion
        print(f"Alert: Network Intrusion detected in sample {i}.")
    else:
        print(f"No Intrusion detected in sample {i}.")
"""

In [None]:
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import numpy as np



probs = RFclf.predict_proba(testX)


auc_score = roc_auc_score(testY, probs, multi_class='ovr', average='micro')


plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], linestyle='--', color='red', label='Random')

# Plot the ROC curve for each class
for i in range(probs.shape[1]):
    fpr, tpr, _ = roc_curve(testY, probs[:, i], pos_label=i)
    plt.plot(fpr, tpr, label=f'Class {i}')

# Annotate AUC scores for each class
for i in range(probs.shape[1]):
    plt.annotate(f'AUC Class {i} = {roc_auc_score(testY == i, probs[:, i]):.2f}', xy=(0.6, 0.2 + i * 0.1))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve for Multiclass')
plt.legend()
plt.show()