# Epileptic Seizure Classification with Random Forest
This notebook contains the classification of time series EEG data for the detection of epileptic seizures based on the preprocessed CHB-MIT Scalp EEG Database.
The codes is structured as followed:
1. [Imports](#1-imports)
2. [Load Preprocessed Dataset](#2-load-preprocessed-dataset)
3. [Split Dataset](#3-split-dataset)
4. [Normalize Dataset](#4-normalize-dataset)
5. [Autoencoder](#5-autoencoder) <br>
5.1 [Seperate Normal & Anomaly Data](#51-seperate-normal--anomaly-data) <br>
5.2 [Define Autoencoder-Model](#52-define-autoencoder-model) <br>
5.3 [Compile Autoencoder-Model](#53-compile-autoencoder-model) <br>
5.4 [Train Autoencoder](#54-train-autoencoder-model) <br>
6. [Visualize Reconstruction Error](#6-visualize-reconstruction-error)
7. [Binary Classification](#7-binary-classification)
8. [Conclusions](#8-conclusion)

## 1. Imports
Import requiered libraries. <br>
External packages can be installed via the `pip install` command.

In [None]:
import numpy as np

# Pre-Processing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Neural Network
import tensorflow as tf
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM, Dense, Bidirectional, Input, RepeatVector, TimeDistributed, Flatten, Conv1D
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.regularizers import L1L2
from tensorflow.keras.models import Model

import plotly.express as px
import plotly.graph_objects as go
from sklearn.metrics import f1_score, roc_auc_score, precision_score, recall_score
from imblearn.metrics import geometric_mean_score

## 2. Load Preprocessed Dataset
In order to load the preprocessed dataset, that was created with the notebook `00_Preprocessing.ipynb`, is loaded and the numpy Arrays for the features and labels are extracted. <br>
To enshure a functional distribution of the classes in the dataset, the classes with the respective amounts are plotted.

In [None]:
dataset = np.load('../00_Data/Processed-Data/classification_dataset_max.npz')
X = dataset["features"]
y = dataset["labels"]

In [None]:
print("Shapes: ", X.shape, y.shape)
print(np.unique(y, return_counts=True))

## 3. Split Dataset
In order to validate and test the trained classifier, the dataset must be split into a `train`, `test`, and `validation` subset. <br>
To preserve an equal distribution within each split, the `stratify`-option is enabled.

In [None]:
X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.4, shuffle=True, stratify=np.ravel(y), random_state=34)
X_test, X_val, y_test, y_val = train_test_split(X_rest, y_rest, test_size=0.5, shuffle=True, stratify=np.ravel(y_rest), random_state=34)

## 4. Normalize Dataset
When working with neural networks, it is imperative to normalize the data bevore training and testing. This enshures a faster training, avoids numerical instablities and provides a better generalization of the neural network. However with EEG-data, there are additional requirements due to the different characteristics and value-ranges of the individual channels. Therefore, the normalization is done channel by channel based on the training-subset and applied on the test- and validation-split.

In [None]:
def normalize_features(X_train:np.ndarray, X_test:np.ndarray, X_val:np.ndarray, use_standard_scaler:bool=False) -> tuple:
    if(use_standard_scaler):
        scaler = StandardScaler()
    else:
        scaler = MinMaxScaler()
    X_train_norm = np.zeros(shape=(X_train.shape), dtype='float32')
    X_test_norm = np.zeros(shape=(X_test.shape), dtype='float32')
    X_val_norm = np.zeros(shape=(X_val.shape), dtype='float32')
    for feature_col in range(X_train.shape[2]):
        X_train_norm[:][:][feature_col] = scaler.fit_transform(X_train[:][:][feature_col])
        X_test_norm[:][:][feature_col] = scaler.transform(X_test[:][:][feature_col])
        X_val_norm[:][:][feature_col] = scaler.transform(X_val[:][:][feature_col])
    return X_train_norm, X_test_norm, X_val_norm

In [None]:
X_train_normalized, X_test_normalized, X_val_normalized = normalize_features(X_train, X_test, X_val, True)

## 5. Autoencoder
The following section contains the data-preperation, build and training of the autoencoder. <br>

<b>What is an autoencoder?</b><br>
An autoencoder is a neural network architecture, that is used for unsupervised machine learning tasks. It consists out of two main components: The encoder & decoder. The encoder takes the input data and transforms it into a lower dimensional representation of the data, the so-called "latent-space". The decoder-part takes this data and tries to reconstruct the original input data. The main target during the training-phase is to minimize the reconstruction error. <br>

<b>How can autoencoders be used for the detection of epileptic seizures in EEG-data?</b><br>
There are two options how autoencoders can be used for the detection of epileptic seizures in EEG-data: Reconstruction-Error & Latent-Space. <br>
By training the autoencoder only on data that does not contain any epileptic seizures, the reconstruction error for "normal" data is minimized. That means that if a sample with an active seizure is predicted, the reconstruction error will be increased. By defining an error-threshold, a binary classification can be performed to seperate normal samples from samples with an epileptic seizure.

The second option is to use the latent space for the classification. The autoencoder is trained on the complete dataset with the same task of minimizing the reconstruction error. By seperating the decoding-component from the autoencoder, the latent space is exposed. Because of the differences in the data when an epileptic seizure is present, the representation of these samples must be different in the reduced space. Based on this assumption, a classification by using a clustering-approach can be done.

The following code contains the first approach.

### 5.1 Seperate Normal & Anomaly Data
In order to detect anomalies in the EEG-data that indicate an epileptic seizure, the autoencoder is only trained on samples where no seizure is present. Therefore, the samples must be seperated into "normal" and "anomalie" data. 

In [None]:
X_train_normal = X_train_normalized[np.where(y_train == 0)[0]]
X_train_anomalies = X_train_normalized[np.where(y_train == 1)[0]]

X_val_normal = X_val_normalized[np.where(y_val == 0)[0]]
X_val_anomalies = X_val_normalized[np.where(y_val == 1)[0]]

### 5.2 Define Autoencoder-Model
As described, an autoencoder consists out of a encoder and decoder component. The encoder reduces the dimensionality of the data and the decoder tries to reconstruct the input data. The resulting neural network model can be defined as follows:

In [None]:
def autoencoder_model(X):
    inputs = Input(shape=(X.shape[1], X.shape[2]))
    L1 = LSTM(50, return_sequences=True)(inputs)
    do1 = Dropout(0.5)(L1)
    L2 = LSTM(40, return_sequences=True)(do1)
    do2 = Dropout(0.5)(L2)
    L3 = LSTM(20, return_sequences=True)(do2)
    do3 = Dropout(0.5)(L3)
    L4 = LSTM(15, return_sequences=False)(do3)
    L5 = RepeatVector(X.shape[1])(L4)
    L6 = LSTM(15, return_sequences=True)(L5)
    do4 = Dropout(0.5)(L6)
    L7 = LSTM(20, return_sequences=True)(do4)
    do5 = Dropout(0.5)(L7)
    L8 = LSTM(40, return_sequences=True)(do5)
    do6 = Dropout(0.5)(L8)
    L9 = LSTM(50, return_sequences=True)(do6)
    output = TimeDistributed(Dense(X.shape[2]))(L9)   
    model = Model(inputs=inputs, outputs=output)
    return model

### 5.3 Compile Autoencoder-Model
After defining the neural network, the model must be compiled. In addition the optimizer and loss-function must be set.

In [None]:
autoencoder = autoencoder_model(X_train_normal)

opt = tf.keras.optimizers.legacy.Adam(learning_rate=0.00001)
autoencoder.compile(optimizer=opt, loss='mse')
autoencoder.summary()

In [None]:
earlystopper = EarlyStopping(patience=25, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, patience=5, min_lr=0.0000001, verbose=1, cooldown=10)

### 5.4 Train Autoencoder Model

In [None]:
history = autoencoder.fit(
    X_train_normal, 
    X_train_normal,
    epochs=100,
    batch_size=50,
    validation_data=(X_val_normal, X_val_normal),
    shuffle=True,
    verbose=1,
    callbacks=[earlystopper, reduce_lr]
)

In [None]:
fig = go.Figure(
    data = [
        go.Scatter(y=history.history['loss'], name="train"),
        go.Scatter(y=history.history['val_loss'], name="val"),
    ],
    layout = {"yaxis": {"title": "Loss [MSE]"}, "xaxis": {"title": "Epoch"}, "title": "Model Loss over Epochs"}
)

fig.show()

## 6. Visualize Reconstruction Error

In [None]:
X_val_pred_normal = autoencoder.predict(X_val_normal)
mse = np.mean(np.square(X_val_normal - X_val_pred_normal), axis=(1, 2))
fig = px.histogram(x=mse, nbins=20)
fig.show()

In [None]:
X_val_pred_anomalies = autoencoder.predict(X_val_anomalies)
mse = np.mean(np.square(X_val_anomalies - X_val_pred_anomalies), axis=(1, 2))
fig = px.histogram(x=mse, nbins=20)
fig.show()

In [None]:
# autoencoder.save('./AE_model')

In [None]:
a = np.transpose(X_train_normal, (0,2,1))
data = []
for i in a[0]:
    data.append(
        go.Scatter(y=i)
    )

fig = go.Figure(
    data = data,
    layout = {"yaxis": {"title": "Loss [MSE]"}, "xaxis": {"title": "Epoch"}, "title": "Model Loss over Epochs"}
)

fig.show()

In [None]:
pred = autoencoder.predict(X_train_normal[:1])
a = np.transpose(pred, (0,2,1))
data = []
for i in a[0]:
    data.append(
        go.Scatter(y=i)
    )

fig = go.Figure(
    data = data,
    layout = {"yaxis": {"title": "Loss [MSE]"}, "xaxis": {"title": "Epoch"}, "title": "Model Loss over Epochs"}
)

fig.show()

## 7. Binary Classification

In [None]:
X_test_pred = autoencoder.predict(X_test_normalized)
mse = np.mean(np.square(X_test_pred - X_test_normalized), axis=(1, 2))
y_test_predictions = np.where(mse > 0.1, 0, 1)

In [None]:
f1score = f1_score(y_test, y_test_predictions)
gm = geometric_mean_score(y_test, y_test_predictions, average="binary")
auc = roc_auc_score(y_test, y_test_predictions, average="weighted")
precision = precision_score(y_test, y_test_predictions)
recall = recall_score(y_test, y_test_predictions)

In [None]:
print(f1score, gm, auc, precision, recall)

In [None]:
# 1st try: 0.6601055484239052 0.06778158069014832 0.500582460274657 0.49349541480059717 0.9965546942291128