## Description

The Jupyter notebook consists of three parts: 

1. Preprocessing of the NSL-KDD data set
2. Train of a fully connected DNN
3. Execution of the XAI methods for getting explanations for the model

The code for executing these steps is not part of the notebook. Instead each step is done in a separated class written in python. The Juptyter notebook acts like a 'main.py' for executing the different steps of the paper.

### Data preprocessing

In [None]:
# Load module
from xai_anomaly_detection.preprocessing import preprocessing

# Initialise instance which loads the data
Preprocessing = preprocessing.PreprocessNSLKDD()
# show initial shape and first 2 lines of data set
Preprocessing.print_overview('train')

In [None]:
# Start preprocessing step
# one-hot encoding of categorical features
# min-max normalization 
# convert all sub attack classes to common 'attack' label
Preprocessing.preprocessing()

In [None]:
# Shape and first 2 lines
Preprocessing.print_overview('train')

# The paper said after preprocessing there will be 122 features
# but I get 124 features (with the label column)

In [None]:
# get train data separated in features and labels
(x_train, y_train) = Preprocessing.get_data()

print("Shape y: ", y_train.shape)
print("Shape X: ", x_train.shape)

# columns of features
columns = Preprocessing.test_data.columns[Preprocessing.test_data.columns != 'outcome']
print(columns)

### Model initialization and training

In [None]:
import tensorflow as tf
from xai_anomaly_detection.model.FCModel import FCModel, f1_m, precision_m, recall_m

# initialise subclasses tf model
model = FCModel(x_train.shape[1])
# compile model
model.compile(
    loss = tf.keras.losses.SparseCategoricalCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
    metrics = ['accuracy', precision_m, recall_m, f1_m]
)
model.build(x_train.shape)
model.summary()

In [None]:
# train the model if not exists
from os.path import exists
if exists('tmp/weights.index'):
    model.load_weights('tmp/weights')
else:
    model.fit(x_train, y_train, epochs=5, batch_size=64)
    model.save_weights('tmp/weights', save_format='tf')


In [None]:
# evaluate model
(x_test, y_test) = Preprocessing.get_data(test_data=True)
scores = model.evaluate(x_test, y_test)
for i in range(1, len(model.metrics_names)):
    print("%s: %.2f%%" % (model.metrics_names[i], scores[i]*100))

### Generating explanations

#### Build another model for SHAP
Reason: see below

In [None]:
from xai_anomaly_detection.model.FCModel import get_sequential_model

# A bug causing 'model.outputs' to be 'None' for subclassed models
# see https://github.com/tensorflow/tensorflow/issues/45202
# this forces me to create another model

# get compiled model
seq_model = get_sequential_model(x_train.shape[1])

# train model
if exists('tmp/seq_model_weights.index'):
    seq_model.load_weights('tmp/seq_model_weights')
else:
    seq_model.fit(x_train, y_train, epochs=5, batch_size=64)
    seq_model.save_weights('tmp/seq_model_weights', save_format='tf')

# evaluate
scores = seq_model.evaluate(x_test, y_test)
for i in range(1, len(seq_model.metrics_names)):
    print("%s: %.2f%%" % (seq_model.metrics_names[i], scores[i]*100))

### SHAP

In [None]:
from xai_anomaly_detection.explanations.shap import shap_explanations
# initialise shap class and create explainer for model
Shap = shap_explanations(seq_model, x_train, x_test)

In [None]:
# generate global explanation with SHAP summary plot
Shap.generate_summary_plot(columns)

In [None]:
# local explanation with a SHAP force plot
Shap.generate_force_plot(columns)

In [None]:
# local explanation for multiple samples
# Shap.generate_collective_force_plot(columns, x_test)
# https://github.com/slundberg/shap

### LIME

In [None]:
# Local explanations with LIME

# select random sample
import numpy as np
x_rand = x_test[np.random.randint(x_test.shape[0], size=1)].flatten()

from xai_anomaly_detection.explanations.lime import lime_explanations
Lime = lime_explanations(x_train, columns)

# note: graph background is transparent 
# thus it is a little bit ugly in dark mode

# here I used the original model instead of sequential model
# it proofs that the model is correctly build and only the bug 
# in tf prevents to execute shap on it
Lime.generate_lime_explanation(model, x_rand, num_features=10, show_table=True)