# Case Study: Automated Machine Learning (AutoML) for Autonomous Intrusion Detection System Development 
This is the code for the paper entitled "**[Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis](https://ieeexplore.ieee.org/document/10472316)**" published in *IEEE Transactions on Network and Service Management* (IF:5.3).<br>
Authors: Li Yang (liyanghart@gmail.com), Mirna El Rajab, Abdallah Shami, and Sami Muhaidat<br>

L. Yang, M. E. Rajab, A. Shami, and S. Muhaidat, "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis," IEEE Transactions on Network and Service Management, pp. 1-28, 2024, doi: https://doi.org/10.1109/TNSM.2024.3376631.

# Code Part 3: Adversarial Machine Learning (AML) Attack and Defense
As many network services and functionalities rely on AI/ML models, they are becoming increasingly vulnerable to AML attacks. AML attacks exploit the weaknesses and vulnerabilities of ML models by generating adversarial inputs that can deceive or manipulate the models into making incorrect predictions. In networks, AML attacks pose a significant threat to overall network security and reliability.  
This case study aims to demonstrate the detrimental impact that AML attacks can have on ML models in networks and presents basic defense strategies to mitigate these attacks, thereby ensuring the accuracy of the ML-based IDS. In this case study, three common types of adversarial attacks ((i.e., DTA, FGSM, and BIM)) are used to generate adversarial samples to probe the vulnerability of the IDS.  
Subsequently, basic defense mechanisms, including adversarial sample detection and filtering, are devised to safeguard the AutoML-based IDS against AML attacks.  

## Dataset 1: CICIDS2017
A subset of the network traffic data randomly sampled from the [CICIDS2017 dataset](https://www.unb.ca/cic/datasets/ids-2017.html).  

The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS2017) dataset has the most updated network threats. The CICIDS2017 dataset is close to real-world network data since it has a large amount of network traffic data, a variety of network features, various types of attacks, and highly imbalanced classes.

## Import libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split,cross_val_score
import lightgbm as lgb
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from scipy.stats import shapiro
from imblearn.over_sampling import SMOTE
import time

In [2]:
import warnings 
warnings.filterwarnings('ignore')

## Read the sampled CICIDS2017 dataset

In [3]:
df = pd.read_csv("Data/cic_0.01km.csv")

In [4]:
df

Unnamed: 0,Flow Duration,Total Length of Fwd Packets,Fwd Packet Length Max,Fwd Packet Length Mean,Bwd Packet Length Max,Bwd Packet Length Min,Flow IAT Mean,Flow IAT Min,Fwd IAT Min,Fwd Header Length,Bwd Header Length,Fwd Packets/s,Bwd Packets/s,Min Packet Length,URG Flag Count,Down/Up Ratio,Init_Win_bytes_forward,Init_Win_bytes_backward,min_seg_size_forward,Label
0,50833,0,0,0.0000,0,0,5.083300e+04,50833,0,32,32,19.672260,19.672260,0,1,1,319,153,32,0
1,49,0,0,0.0000,0,0,4.900000e+01,49,49,64,0,40816.326530,0.000000,0,0,0,277,-1,32,0
2,306,6,6,6.0000,6,6,3.060000e+02,306,0,20,20,3267.973856,3267.973856,6,0,1,0,0,20,0
3,63041,65,65,65.0000,124,124,6.304100e+04,63041,0,32,32,15.862693,15.862693,65,0,1,-1,-1,32,0
4,47682,43,43,43.0000,59,59,4.768200e+04,47682,0,32,32,20.972275,20.972275,43,0,1,-1,-1,32,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28298,45,0,0,0.0000,0,0,4.500000e+01,45,0,32,32,22222.222220,22222.222220,0,1,1,349,307,32,0
28299,114309573,511,427,31.9375,746,0,3.941709e+06,94,165,332,424,0.139971,0.122474,0,0,0,8192,343,20,0
28300,48850,80,40,40.0000,72,72,1.628333e+04,1,48,64,64,40.941658,40.941658,40,0,1,-1,-1,32,0
28301,260,66,33,33.0000,97,97,8.666667e+01,48,48,40,40,7692.307692,7692.307692,33,0,1,-1,-1,20,0


# 1. Automated Data Pre-Processing

## Automated Transformation/Encoding
Automatically identify and transform string/text features into numerical features to make the data more readable by ML models

In [5]:
# Define the automated data encoding function
def Auto_Encoding(df):
    cat_features=[x for x in df.columns if df[x].dtype=="object"] ## Find string/text features
    le=LabelEncoder()
    for col in cat_features:
        if col in df.columns:
            i = df.columns.get_loc(col)
            # Transform to numerical features
            df.iloc[:,i] = df.apply(lambda i:le.fit_transform(i.astype(str)), axis=0, result_type='expand')
    return df

In [6]:
df=Auto_Encoding(df)

## Automated Imputation
Detect and impute missing values to improve data quality

In [7]:
# Define the automated data imputation function
def Auto_Imputation(df):
    if df.isnull().values.any() or np.isinf(df).values.any(): # if there is any empty or infinite values
        df.replace([np.inf, -np.inf], np.nan, inplace=True)
        df.fillna(0, inplace = True)  # Replace empty values with zeros; there are other imputation methods discussed in the paper
    return df

In [8]:
df=Auto_Imputation(df)

## Automated normalization
Normalize the range of features to a similar scale to improve data quality

In [9]:
def Auto_Normalization(df):
    stat, p = shapiro(df)
    print('Statistics=%.3f, p=%.3f' % (stat, p))
    # interpret
    alpha = 0.05
    numeric_features = df.drop(['Label'],axis = 1).dtypes[df.dtypes != 'object'].index
    
    # check if the data distribution follows a Gaussian/normal distribution
    # If so, select the Z-score normalization method; otherwise, select the min-max normalization
    # Details are in the paper
    if p > alpha:
        print('Sample looks Gaussian (fail to reject H0)')
        df[numeric_features] = df[numeric_features].apply(
            lambda x: (x - x.mean()) / (x.std()))
        print('Z-score normalization is automatically chosen and used')
    else:
        print('Sample does not look Gaussian (reject H0)')
        df[numeric_features] = df[numeric_features].apply(
            lambda x: (x - x.min()) / (x.max()-x.min()))
        print('Min-max normalization is automatically chosen and used')
    return df

In [10]:
df=Auto_Normalization(df)

Statistics=0.076, p=0.000
Sample does not look Gaussian (reject H0)
Min-max normalization is automatically chosen and used


In [11]:
df=Auto_Imputation(df)

## Train-test split
Split the dataset into the training and the test set

In [12]:
X = df.drop(['Label'],axis=1)
y = df['Label']

# Here we used the 80%/20% split, it can be changed based on specific tasks
#X_train, X_test, y_train, y_test = train_test_split(X,y, train_size = 0.8, test_size = 0.2, shuffle=False,random_state = 0)
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size = 0.8, test_size = 0.2,random_state = 0)

## Automated data balancing
Generate minority class samples to solve class-imbalance and improve data quality.  
Synthetic Minority Over-sampling Technique (SMOTE) method is used.

In [13]:
pd.Series(y_train).value_counts()

0    18126
1     4516
Name: Label, dtype: int64

In [14]:
# For binary data (can be modified for multi-class data with the same logic)
def Auto_Balancing(X_train, y_train):
    number0 = pd.Series(y_train).value_counts().iloc[0]
    number1 = pd.Series(y_train).value_counts().iloc[1]
    
    if number0 > number1:
        nlarge = number0
    else:
        nlarge = number1
    
    # evaluate whether the incoming dataset is imbalanced (the abnormal/normal ratio is smaller than a threshold (e.g., 50%)) 
    if (number1/number0 > 1.5) or (number0/number1 > 1.5):
        smote=SMOTE(n_jobs=-1,sampling_strategy={0:nlarge, 1:nlarge})
        X_train, y_train = smote.fit_sample(X_train, y_train)
        
    return X_train, y_train

In [15]:
X_train, y_train = Auto_Balancing(X_train, y_train)

In [16]:
pd.Series(y_train).value_counts()

1    18126
0    18126
Name: Label, dtype: int64

# 2. Automated Feature Engineering
Feature selection method 1: **Information Gain (IG)**, used to remove irrelevant features to improve model efficiency  
Feature selection method 2: **Pearson Correlation**, used to remove redundant features to improve model efficiency and accuracy  

In [17]:
# Remove irrelevant features and select important features
def Feature_Importance_IG(data):
    features = data.drop(['Label'],axis=1).values  # "Label" should be changed to the target class variable name if different
    labels = data['Label'].values
    
    # Extract feature names
    feature_names = list(data.drop(['Label'],axis=1).columns)

    # Empty array for feature importances
    feature_importance_values = np.zeros(len(feature_names))
    model = lgb.LGBMRegressor(verbose = -1)
    model.fit(features, labels)
    feature_importances = pd.DataFrame({'feature': feature_names, 'importance': model.feature_importances_})

    # Sort features according to importance
    feature_importances = feature_importances.sort_values('importance', ascending = False).reset_index(drop = True)

    # Normalize the feature importances to add up to one
    feature_importances['normalized_importance'] = feature_importances['importance'] / feature_importances['importance'].sum()
    feature_importances['cumulative_importance'] = np.cumsum(feature_importances['normalized_importance'])
    
    cumulative_importance=0.90 # Only keep the important features with cumulative importance scores>=90%. It can be changed.

    # Make sure most important features are on top
    feature_importances = feature_importances.sort_values('cumulative_importance')

    # Identify the features not needed to reach the cumulative_importance
    record_low_importance = feature_importances[feature_importances['cumulative_importance'] > cumulative_importance]

    to_drop = list(record_low_importance['feature'])
#     print(feature_importances.drop(['importance'],axis=1))
    return to_drop

In [18]:
# Remove redundant features
def Feature_Redundancy_Pearson(data):
    correlation_threshold=0.90 # Only remove features with the redundancy>90%. It can be changed
    features = data.drop(['Label'],axis=1)
    corr_matrix = features.corr()

    # Extract the upper triangle of the correlation matrix
    upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k = 1).astype(np.bool))

    # Select the features with correlations above the threshold
    # Need to use the absolute value
    to_drop = [column for column in upper.columns if any(upper[column].abs() > correlation_threshold)]

    # Dataframe to hold correlated pairs
    record_collinear = pd.DataFrame(columns = ['drop_feature', 'corr_feature', 'corr_value'])

    # Iterate through the columns to drop
    for column in to_drop:

        # Find the correlated features
        corr_features = list(upper.index[upper[column].abs() > correlation_threshold])

        # Find the correlated values
        corr_values = list(upper[column][upper[column].abs() > correlation_threshold])
        drop_features = [column for _ in range(len(corr_features))]    

        # Record the information (need a temp df for now)
        temp_df = pd.DataFrame.from_dict({'drop_feature': drop_features,
                                         'corr_feature': corr_features,
                                         'corr_value': corr_values})
        record_collinear = record_collinear.append(temp_df, ignore_index = True)
#     print(record_collinear)
    return to_drop

In [19]:
def Auto_Feature_Engineering(df):
    drop1 = Feature_Importance_IG(df)
    dfh1 = df.drop(columns = drop1)
    
    drop2 = Feature_Redundancy_Pearson(dfh1)
    dfh2 = dfh1.drop(columns = drop2)
    
    return dfh2

In [20]:
dfh2 = Auto_Feature_Engineering(df)
dfh2

Unnamed: 0,Flow Duration,Total Length of Fwd Packets,Fwd Packet Length Max,Fwd Packet Length Mean,Bwd Packet Length Max,Flow IAT Mean,Flow IAT Min,Fwd IAT Min,Fwd Header Length,Bwd Packets/s,Init_Win_bytes_forward,Init_Win_bytes_backward,Label
0,4.236419e-04,0.000000,0.000000,0.000000,0.000000,4.707129e-04,4.707129e-04,0.000000e+00,0.000150,1.311484e-05,0.004883,0.002350,0
1,4.416669e-07,0.000000,0.000000,0.000000,0.000000,4.907407e-07,4.907407e-07,4.083333e-07,0.000299,0.000000e+00,0.004242,0.000000,0
2,2.583334e-06,0.000008,0.000242,0.001556,0.000516,2.870370e-06,2.870370e-06,0.000000e+00,0.000094,2.178649e-03,0.000015,0.000015,0
3,5.253752e-04,0.000090,0.002619,0.016856,0.010660,5.837500e-04,5.837500e-04,0.000000e+00,0.000150,1.057513e-05,0.000000,0.000000,0
4,3.973835e-04,0.000060,0.001732,0.011151,0.005072,4.415370e-04,4.415370e-04,0.000000e+00,0.000150,1.398152e-05,0.000000,0.000000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
28298,4.083335e-07,0.000000,0.000000,0.000000,0.000000,4.537037e-07,4.537037e-07,0.000000e+00,0.000150,1.481481e-02,0.005341,0.004700,0
28299,9.525802e-01,0.000710,0.017204,0.008282,0.064133,3.649735e-02,9.074074e-07,1.375000e-06,0.001554,8.164962e-08,0.125015,0.005249,0
28300,4.071168e-04,0.000111,0.001612,0.010373,0.006190,1.508086e-04,4.629629e-08,4.000000e-07,0.000299,2.729444e-05,0.000000,0.000000,0
28301,2.200001e-06,0.000092,0.001330,0.008558,0.008339,8.395061e-07,4.814815e-07,4.000000e-07,0.000187,5.128205e-03,0.000000,0.000000,0


## Data Split & Balancing (After Feature Engineering)

In [21]:
X = dfh2.drop(['Label'],axis=1)
y = dfh2['Label']

#X_train, X_test, y_train, y_test = train_test_split(X,y, train_size = 0.8, test_size = 0.2, shuffle=False,random_state = 0)
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size = 0.8, test_size = 0.2,random_state = 0)

In [22]:
X_train, y_train = Auto_Balancing(X_train, y_train)

# 3. Automated Model Selection
Select the best-performing model among five common machine learning models (Naive Bayes, KNN, random forest, LightGBM, and ANN/MLP) by evaluating their learning performance

## Model learning (for Comparison)

In [23]:
%%time
lg = lgb.LGBMClassifier(verbose = -1)
lg.fit(X_train,y_train)
t1=time.time()
predictions = lg.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 99.77000000000001%
Precision: 99.466%
Recall: 99.378%
F1-score: 99.422%
Time: 2.29035
Wall time: 315 ms


In [24]:
%%time
rf = RandomForestClassifier()
rf.fit(X_train,y_train)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 99.735%
Precision: 99.465%
Recall: 99.2%
F1-score: 99.332%
Time: 9.51379
Wall time: 2.82 s


# 4. Hyperparameter Optimization
Optimize the best performing machine learning model (lightGBM) by tuning its hyperparameters

## Hold-out validation

In [25]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score, StratifiedKFold

# Define the objective function
def objective(params):
    params = {
        'n_estimators': int(params['n_estimators']), 
        'max_depth': int(params['max_depth']),
        'learning_rate': abs(float(params['learning_rate'])),
        "num_leaves": int(params['num_leaves']),
        "min_child_samples": int(params['min_child_samples']),
    }
    clf = lgb.LGBMClassifier( **params)
    clf.fit(X_train,y_train)
    predictions = clf.predict(X_test)
    score = accuracy_score(y_test,predictions)
    return {'loss':-score, 'status': STATUS_OK }

# Define the hyperparameter configuration space
space = {
    'n_estimators': hp.quniform('n_estimators', 50, 500, 20),
    'max_depth': hp.quniform('max_depth', 5, 50, 1),
    "learning_rate":hp.uniform('learning_rate', 0, 1),
    "num_leaves":hp.quniform('num_leaves',100,2000,100),
    "min_child_samples":hp.quniform('min_child_samples',10,50,5),
}

# Detect the optimal hyperparameter values
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=50)
print("LightGBM: Hyperopt estimated optimum {}".format(best))



100%|██████████| 50/50 [01:22<00:00,  1.65s/trial, best loss: -0.9982335276452924]
LightGBM: Hyperopt estimated optimum {'learning_rate': 0.06157312210088248, 'max_depth': 22.0, 'min_child_samples': 20.0, 'n_estimators': 360.0, 'num_leaves': 2000.0}


In [28]:
%%time
clf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")

Accuracy: 99.823%
Precision: 99.468%
Recall: 99.644%
F1-score: 99.556%
Wall time: 7.11 s


After hyperparameter optimization, the hold-out accuracy has been improved from 99.806% to 99.841%

In [59]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier

# Define the objective function
def objective(params):
    params = {
        'n_estimators': int(params['n_estimators']), 
        'max_depth': int(params['max_depth']),
        'min_samples_split': int(params['min_samples_split']),
        'min_samples_leaf': int(params['min_samples_leaf']),
        'criterion': str(params['criterion'])
    }
    clf = RandomForestClassifier(**params)
    clf.fit(X_train,y_train)
    predictions = clf.predict(X_test)
    score = accuracy_score(y_test,predictions)
    return {'loss':-score, 'status': STATUS_OK }

# Define the hyperparameter configuration space
space = {
    'n_estimators': hp.quniform('n_estimators', 20, 500, 20),
    'max_depth': hp.quniform('max_depth', 5, 50, 1),
    'min_samples_split': hp.quniform('min_samples_split', 2, 11, 1),
    'min_samples_leaf': hp.quniform('min_samples_leaf', 1, 11, 1),
    'criterion': hp.choice('criterion', ['gini', 'entropy'])
}

# Detect the optimal hyperparameter values
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=50)
print("RandomForest: Hyperopt estimated optimum {}".format(best))


100%|██████████| 50/50 [07:12<00:00,  8.65s/trial, best loss: -0.9973502914679385]
RandomForest: Hyperopt estimated optimum {'criterion': 0, 'max_depth': 19.0, 'min_samples_leaf': 1.0, 'min_samples_split': 3.0, 'n_estimators': 380.0}


In [60]:
%%time
clf = RandomForestClassifier(max_depth=19, n_estimators = 380, min_samples_split = 3,
                         min_samples_leaf = 1, criterion = 'gini')
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")

Accuracy: 99.735%
Precision: 99.465%
Recall: 99.2%
F1-score: 99.332%
Wall time: 10.3 s


# AML

### Experiment 1
* 1. Original ML model for IDS
* 2. generate adversarial samples using DecisionTreeAttack or other attacks
* 3. test ML model under attack
* 4. develop adversarial sample detection model
* 5. remove adversarial samples from the training set
* 6. re-train IDS model

## DecisionTreeAttack

### Step 1: Original ML model for IDS

In [30]:
%%time
clf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")

Accuracy: 99.823%
Precision: 99.468%
Recall: 99.644%
F1-score: 99.556%
Wall time: 6.91 s


### Step 2: generate adversarial samples using DecisionTreeAttack

In [33]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_digits
from matplotlib import pyplot as plt
import numpy as np

from art.attacks.evasion import DecisionTreeAttack
from art.estimators.classification import SklearnClassifier

clf = DecisionTreeClassifier()
clf.fit(X_train,y_train)
from sklearn.preprocessing import OneHotEncoder
y_train1 = y_train.values.reshape(-1, 1)
y_train1_AML = OneHotEncoder().fit_transform(y_train1).toarray()

clf_art = SklearnClassifier(clf)
attack = DecisionTreeAttack(clf_art)

# Generate adversarial examples
x_adv = attack.generate(X_train.values,y_train1_AML)

Decision tree attack: 100%|██████████| 36252/36252 [00:18<00:00, 1999.66it/s]


### Step 3: test ML model under attack

In [35]:
%%time
#DTA
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(x_adv,y_train)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 78.608%
Precision: 15.873000000000001%
Recall: 1.778%
F1-score: 3.197%
Time: 6.69349
Wall time: 1.31 s


### Step 4: develop adversarial sample detection model


In [36]:
X_new = np.concatenate([x_adv,X_train])
y1 = pd.Series(np.ones(y_train.shape[0]))
y2 = pd.Series(np.zeros(y_train.shape[0]))
y_new = np.concatenate([y1,y2])


X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(X_new,y_new, train_size = 0.1, test_size = 0.9,random_state = 0)

In [37]:
%%time
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_train_d,y_train_d)
t1=time.time()
predictions2 = rf.predict(X_test_d)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test_d,predictions2),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test_d,predictions2),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test_d,predictions2),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test_d,predictions2),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test_d)*1000000,5)))

Accuracy: 99.937%
Precision: 99.932%
Recall: 99.942%
F1-score: 99.937%
Time: 4.44764
Wall time: 1.7 s


### Step 5: remove adversarial samples from the training set

In [38]:
detection_results = rf.predict(X_new)
X_new1 = X_new
y_new1 = np.concatenate([y_train,y_train])

indices_to_remove = [i for i in range(len(detection_results)) if detection_results[i] == 1]
for i in reversed(indices_to_remove):
    X_new1 = np.delete(X_new1, i, axis=0)
    y_new1 = np.delete(y_new1, i)

### Step 6: re-train IDS model

In [39]:
%%time
#DTA
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_new1,y_new1)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 99.806%
Precision: 99.37899999999999%
Recall: 99.644%
F1-score: 99.512%
Time: 10.21826
Wall time: 5.66 s


## FastGradientMethod

### Step 1: Original ML model for IDS

In [40]:
%%time
clf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")

Accuracy: 99.823%
Precision: 99.468%
Recall: 99.644%
F1-score: 99.556%
Wall time: 6 s


### Step 2: generate adversarial samples using DecisionTreeAttack

In [41]:
from keras.layers import Input,Dense,Dropout,BatchNormalization,Activation
from art.estimators.classification import KerasClassifier
from keras import Model
import keras.backend as K
import keras.callbacks as kcallbacks
from keras import optimizers
from keras.optimizers import Adam

from keras.callbacks import EarlyStopping

from art.attacks import evasion
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import SklearnClassifier

def ANN(optimizer = 'sgd',neurons=16,batch_size=1024,epochs=80,activation='relu',patience=8,loss='binary_crossentropy'):
    inputs=Input(shape=(X.shape[1],))
    x=Dense(1000)(inputs)
    x=BatchNormalization()(x)
    x=Activation('relu')(x)
    x=Dropout(0.3)(x)
    x=Dense(256)(inputs)
    x=BatchNormalization()(x)
    x=Activation('relu')(x)
    x=Dropout(0.25)(x)
    x=Dense(2,activation='softmax')(x)
    model=Model(inputs=inputs,outputs=x,name='base_nlp')
    model.compile(optimizer='adam',loss='categorical_crossentropy')
#     model.compile(optimizer=Adam(lr = 0.01),loss='categorical_crossentropy',metrics=['accuracy'])
    early_stopping = EarlyStopping(monitor="loss", patience = patience)# early stop patience
    history = model.fit(X, pd.get_dummies(y).values,
              batch_size=batch_size,
              epochs=epochs,
              callbacks = [early_stopping],
              verbose=0) #verbose set to 1 will show the training process
    return model

# Create a KerasClassifier
classifier = KerasClassifier(model=ANN(), clip_values=(0, 1))

# Create the FastGradientMethod attack
attack = FastGradientMethod(estimator=classifier, eps=0.1)

# Generate adversarial examples
x_adv = attack.generate(X_train.values)

Using TensorFlow backend.


### Step 3: test ML model under attack

In [42]:
%%time
#FGM
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(x_adv,y_train)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 86.31%
Precision: 96.296%
Recall: 32.356%
F1-score: 48.436%
Time: 15.67928
Wall time: 8.78 s


### Step 4: develop adversarial sample detection model


In [43]:
X_new = np.concatenate([x_adv,X_train])
y1 = pd.Series(np.ones(y_train.shape[0]))
y2 = pd.Series(np.zeros(y_train.shape[0]))
y_new = np.concatenate([y1,y2])


X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(X_new,y_new, train_size = 0.1, test_size = 0.9,random_state = 0)

In [44]:
%%time
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_train_d,y_train_d)
t1=time.time()
predictions2 = rf.predict(X_test_d)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test_d,predictions2),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test_d,predictions2),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test_d,predictions2),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test_d,predictions2),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test_d)*1000000,5)))

Accuracy: 99.407%
Precision: 99.667%
Recall: 99.144%
F1-score: 99.405%
Time: 8.21566
Wall time: 2.44 s


### Step 5: remove adversarial samples from the training set

In [45]:
detection_results = rf.predict(X_new)
X_new1 = X_new
y_new1 = np.concatenate([y_train,y_train])

indices_to_remove = [i for i in range(len(detection_results)) if detection_results[i] == 1]
for i in reversed(indices_to_remove):
    X_new1 = np.delete(X_new1, i, axis=0)
    y_new1 = np.delete(y_new1, i)

### Step 6: re-train IDS model

In [46]:
%%time
#FGM
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_new1,y_new1)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 99.77000000000001%
Precision: 99.291%
Recall: 99.556%
F1-score: 99.423%
Time: 13.91828
Wall time: 6.78 s


## BasicIterativeMethod

### Step 1: Original ML model for IDS

In [47]:
%%time
clf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")

Accuracy: 99.823%
Precision: 99.468%
Recall: 99.644%
F1-score: 99.556%
Wall time: 8.45 s


### Step 2: generate adversarial samples using FastGradientMethod

In [48]:
from keras.layers import Input,Dense,Dropout,BatchNormalization,Activation
from art.estimators.classification import KerasClassifier
from keras import Model
import keras.backend as K
import keras.callbacks as kcallbacks
from keras import optimizers
from keras.optimizers import Adam

from keras.callbacks import EarlyStopping

from art.attacks import evasion
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import SklearnClassifier

def ANN(optimizer = 'sgd',neurons=16,batch_size=1024,epochs=80,activation='relu',patience=8,loss='binary_crossentropy'):
    inputs=Input(shape=(X.shape[1],))
    x=Dense(1000)(inputs)
    x=BatchNormalization()(x)
    x=Activation('relu')(x)
    x=Dropout(0.3)(x)
    x=Dense(256)(inputs)
    x=BatchNormalization()(x)
    x=Activation('relu')(x)
    x=Dropout(0.25)(x)
    x=Dense(2,activation='softmax')(x)
    model=Model(inputs=inputs,outputs=x,name='base_nlp')
    model.compile(optimizer='adam',loss='categorical_crossentropy')
#     model.compile(optimizer=Adam(lr = 0.01),loss='categorical_crossentropy',metrics=['accuracy'])
    early_stopping = EarlyStopping(monitor="loss", patience = patience)# early stop patience
    history = model.fit(X, pd.get_dummies(y).values,
              batch_size=batch_size,
              epochs=epochs,
              callbacks = [early_stopping],
              verbose=0) #verbose set to 1 will show the training process
    return model

# Create a KerasClassifier
classifier = KerasClassifier(model=ANN(), clip_values=(0, 1))

# Create the FastGradientMethod attack
attack = evasion.BasicIterativeMethod(estimator=classifier, eps=0.1, eps_step=0.1, max_iter=200, batch_size=32, verbose = False)

# Generate adversarial examples
x_adv = attack.generate(X_train.values)

### Step 3: test ML model under attack

In [49]:
%%time
#BIM
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(x_adv,y_train)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 88.324%
Precision: 96.586%
Recall: 42.756%
F1-score: 59.272999999999996%
Time: 16.47628
Wall time: 16.4 s


### Step 4: develop adversarial sample detection model


In [50]:
X_new = np.concatenate([x_adv,X_train])
y1 = pd.Series(np.ones(y_train.shape[0]))
y2 = pd.Series(np.zeros(y_train.shape[0]))
y_new = np.concatenate([y1,y2])


X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(X_new,y_new, train_size = 0.1, test_size = 0.9,random_state = 0)

In [51]:
%%time
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_train_d,y_train_d)
t1=time.time()
predictions2 = rf.predict(X_test_d)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test_d,predictions2),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test_d,predictions2),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test_d,predictions2),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test_d,predictions2),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test_d)*1000000,5)))

Accuracy: 99.453%
Precision: 99.673%
Recall: 99.22999999999999%
F1-score: 99.451%
Time: 9.85973
Wall time: 2.88 s


### Step 5: remove adversarial samples from the training set

In [52]:
detection_results = rf.predict(X_new)
X_new1 = X_new
y_new1 = np.concatenate([y_train,y_train])

indices_to_remove = [i for i in range(len(detection_results)) if detection_results[i] == 1]
for i in reversed(indices_to_remove):
    X_new1 = np.delete(X_new1, i, axis=0)
    y_new1 = np.delete(y_new1, i)

### Step 6: re-train IDS model

In [53]:
%%time
#BIM
rf = lgb.LGBMClassifier(max_depth=22, learning_rate= 0.06157312210088248, n_estimators = 360, 
                         num_leaves = 2000, min_child_samples = 20)
rf.fit(X_new1,y_new1)
t1=time.time()
predictions = rf.predict(X_test)
t2=time.time()
print("Accuracy: "+str(round(accuracy_score(y_test,predictions),5)*100)+"%")
print("Precision: "+str(round(precision_score(y_test,predictions),5)*100)+"%")
print("Recall: "+str(round(recall_score(y_test,predictions),5)*100)+"%")
print("F1-score: "+str(round(f1_score(y_test,predictions),5)*100)+"%")
print("Time: "+str(round((t2-t1)/len(y_test)*1000000,5)))

Accuracy: 99.806%
Precision: 99.37899999999999%
Recall: 99.644%
F1-score: 99.512%
Time: 16.12124
Wall time: 8.05 s
