# Epileptic Seizure Classification - Comparison of developed Models
This notebook contains the comparison of the different classification models of time series EEG data for the detection of epileptic seizures based on the preprocessed CHB-MIT Scalp EEG Database.<br>
The codes is structured as followed:
1. [Imports](#1-imports)
2. [Define General Functions](#2-define-general-functions)
3. [Get Prediction Metrics](#3-get-prediction-metrics)
4. [Model Comparison](#4-comparison)
5. [Conclusion](#5-conclusions)

## 1. Imports
Import requiered libraries. <br>
External packages can be installed via the `pip install -r requirements.txt` command or the notebook-cell below.

In [None]:
! pip install -r ../requirements.txt

In [42]:
# Import datascience libraries
import numpy as np
import pandas as pd

# Import preprocessing-libraries, classification metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import f1_score, roc_auc_score, precision_score, recall_score
from imblearn.metrics import geometric_mean_score

# Import visualization libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Import ML libraries
import tensorflow as tf
import joblib

## 2. Define General Functions
In order to calculate all metrics for the created models, first general functions have to be defined, which are necessary for the prediction.

In [2]:
def normalize_features(X_train:np.ndarray, X_test:np.ndarray, X_val:np.ndarray, use_standard_scaler:bool=False) -> tuple:
    if(use_standard_scaler):
        scaler = StandardScaler() # Create Z-Score normalizer
    else:
        scaler = MinMaxScaler() # Create Min-Max normalizer
    X_train_norm = np.zeros(shape=(X_train.shape), dtype='float32') # Create empty array for normalized train-data
    X_test_norm = np.zeros(shape=(X_test.shape), dtype='float32') # Create empty array for normalized test-data
    X_val_norm = np.zeros(shape=(X_val.shape), dtype='float32') # Create empty array for normalized val-data
    for feature_col in range(X_train.shape[2]): # Iterate over features in dataset
        X_train_norm[:,:,feature_col] = scaler.fit_transform(X_train[:,:,feature_col]) # Fit and apply normalizer on current feature in train subset
        X_test_norm[:,:,feature_col] = scaler.transform(X_test[:,:,feature_col]) # Apply normalizer on current feature in test subset
        X_val_norm[:,:,feature_col] = scaler.transform(X_val[:,:,feature_col]) # Apply normalizer on current feature in val subset
    return X_train_norm, X_test_norm, X_val_norm

In [3]:
metric_df = pd.DataFrame(columns=['Model', 'F1','GMean','AUC','Precision','Recall'])

## 3. Get Prediction Metrics
The following section loads all saved models and calculates each of the defined metrics. To allow a later comparison, the values are stored in a dataframe.

### 3.1 Random Forest Classifier

In [4]:
dataset = np.load('./00_Data/Processed-Data/classification_dataset_mean.npz')
X = dataset["features"]
y = dataset["labels"]

n_samples, n_timesteps, n_features = X.shape
X_reshaped = np.reshape(X, (n_samples, (n_timesteps * n_features)))

X_train, X_val, y_train, y_val = train_test_split(X_reshaped, y, test_size=0.3, shuffle=True, stratify=np.ravel(y), random_state=34)

rf_classifier = joblib.load("./99_Assets/01_Saved Models/00_random_forest_classifier.joblib")

y_val_pred = rf_classifier.predict(X_val) #Predict X_test
y_val_pred_proba = rf_classifier.predict_proba(X_val) #Predict probablities X_test
f1 = f1_score(y_val, y_val_pred, average="macro") #Compute f1-score
auc = roc_auc_score(np.ravel(y_val), y_val_pred_proba[:,1], average="macro", multi_class="ovr") #Compute AUC
gmean = geometric_mean_score(y_val, y_val_pred, average="macro") #Compute G-Mean
precision = precision_score(y_val, y_val_pred)
recall = recall_score(y_val, y_val_pred)

metric_df.loc[len(metric_df)] = ['RandomForest', f1, gmean, auc, precision, recall]

[Parallel(n_jobs=10)]: Using backend ThreadingBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  21 tasks      | elapsed:    0.0s
[Parallel(n_jobs=10)]: Done 142 tasks      | elapsed:    0.0s
[Parallel(n_jobs=10)]: Done 345 tasks      | elapsed:    0.1s
[Parallel(n_jobs=10)]: Done 470 out of 470 | elapsed:    0.1s finished
[Parallel(n_jobs=10)]: Using backend ThreadingBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  21 tasks      | elapsed:    0.0s
[Parallel(n_jobs=10)]: Done 142 tasks      | elapsed:    0.0s
[Parallel(n_jobs=10)]: Done 345 tasks      | elapsed:    0.1s
[Parallel(n_jobs=10)]: Done 470 out of 470 | elapsed:    0.1s finished


### 3.2 Bi-LSTM

In [5]:
dataset = np.load('./00_Data/Processed-Data/classification_dataset_max.npz') # Load compressed numpy array
X = dataset["features"] # Extract feature-array from compressed file
y = dataset["labels"] # Extract label-array from compressed file

X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.4, shuffle=True, stratify=np.ravel(y), random_state=254)
X_test, X_val, y_test, y_val = train_test_split(X_rest, y_rest, test_size=0.5, shuffle=True, stratify=np.ravel(y_rest), random_state=865)

X_train_norm, X_test_norm, X_val_norm = normalize_features(X_train, X_test, X_val, True)

model = tf.keras.models.load_model('./99_Assets/01_Saved Models/01_bi_lstm.h5')

y_test_predictions = model.predict(X_test_norm)
y_test_predictions = (y_test_predictions >= 0.5).astype(int)
f1score = f1_score(y_test, y_test_predictions)
gm = geometric_mean_score(y_test, y_test_predictions, average="binary")
auc = roc_auc_score(y_test, y_test_predictions, average="weighted")
precision = precision_score(y_test, y_test_predictions)
recall = recall_score(y_test, y_test_predictions)

metric_df.loc[len(metric_df)] = ['Bi-LSTM', f1score, gm, auc, precision, recall]

Metal device set to: Apple M1 Pro

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB



2023-07-02 16:30:22.386746: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-07-02 16:30:22.770317: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }




### 3.3 Autoencoder - Latent Space Classification

In [6]:
dataset = np.load('./00_Data/Processed-Data/classification_dataset_max.npz') # Load compressed numpy array
X = dataset["features"] # Extract feature-array from compressed file
y = dataset["labels"] # Extract label-array from compressed file

X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.4, shuffle=True, stratify=np.ravel(y), random_state=34)
X_test, X_val, y_test, y_val = train_test_split(X_rest, y_rest, test_size=0.5, shuffle=True, stratify=np.ravel(y_rest), random_state=34)

X_train_normalized, X_test_normalized, X_val_normalized = normalize_features(X_train, X_test, X_val, True)

clf = tf.keras.models.load_model('./99_Assets/01_Saved Models/03_classifier_latent_space_clf.h5')

y_test_predictions = clf.predict(X_test_normalized)
y_test_predictions = (y_test_predictions >= 0.5).astype(int)
f1score = f1_score(y_test, y_test_predictions)
gm = geometric_mean_score(y_test, y_test_predictions, average="binary")
auc = roc_auc_score(y_test, y_test_predictions, average="weighted")
precision = precision_score(y_test, y_test_predictions)
recall = recall_score(y_test, y_test_predictions)

metric_df.loc[len(metric_df)] = ['AE-LatentSpace', f1score, gm, auc, precision, recall]



### 3.4 Autoencoder - Threshold Classification

In [7]:
dataset = np.load('./00_Data/Processed-Data/classification_dataset_max.npz')
X = dataset["features"]
y = dataset["labels"]

X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.4, shuffle=True, stratify=np.ravel(y), random_state=34)
X_test, X_val, y_test, y_val = train_test_split(X_rest, y_rest, test_size=0.5, shuffle=True, stratify=np.ravel(y_rest), random_state=34)

X_train_normalized, X_test_normalized, X_val_normalized = normalize_features(X_train, X_test, X_val, True)

autoencoder = tf.keras.models.load_model('./99_Assets/01_Saved Models/02_autoencoder_threshold.h5')

X_test_pred = autoencoder.predict(X_test_normalized)
mse = np.mean(np.square(X_test_pred - X_test_normalized), axis=(1, 2))
y_test_predictions = np.where(mse >= 0.2, 1, 0)

f1score = f1_score(y_test, y_test_predictions)
gm = geometric_mean_score(y_test, y_test_predictions, average="binary")
auc = roc_auc_score(y_test, y_test_predictions, average="weighted")
precision = precision_score(y_test, y_test_predictions)
recall = recall_score(y_test, y_test_predictions)

metric_df.loc[len(metric_df)] = ['AE-Threshold', f1score, gm, auc, precision, recall]



## 4. Comparison
The following section includes the comparison of the developed models based on the defined metrics. A summary of the findings as well as a narrowing down takes place in the following chapter.

In [79]:
fig = make_subplots(rows=2, cols=3, subplot_titles=["F1-Score","G-Mean","AUC", "Precision", "Recall"])
fig.add_trace(
    go.Bar(
        x=metric_df.Model,
        y=metric_df.F1,
        text=metric_df.F1,
        name="F1-Score"
    ),
    row=1, 
    col=1
)
fig.add_trace(
    go.Bar(
        x=metric_df.Model,
        y=metric_df.GMean,
        text=metric_df.GMean,
        name="F1-Score"
    ),
    row=1, 
    col=2
)
fig.add_trace(
    go.Bar(
        x=metric_df.Model,
        y=metric_df.AUC,
        text=metric_df.AUC,
        name="F1-Score"
    ),
    row=1, 
    col=3
)
fig.add_trace(
    go.Bar(
        x=metric_df.Model,
        y=metric_df.Precision,
        text=metric_df.Precision,
        name="Precision"
    ),
    row=2, 
    col=1
)
fig.add_trace(
    go.Bar(
        x=metric_df.Model,
        y=metric_df.Recall,
        text=metric_df.Recall,
        name="Recall"
    ),
    row=2, 
    col=2
)
fig.update_layout(
    title="Comparison Metrics over all Models",
    autosize=False,
    width=2000,
    height=1000
)
fig.update_traces(texttemplate='%{text:.4}', textposition='inside')
fig.show()

## 5. Conclusions
The aim of this project was to classify EEG data for epilleptic seizure detection. For this purpose, a random forest, a neural network with Bi-LSTM and attention layer, an autoencoder with connected classifier and an autoencoder model with threshold method were developed. All models were able to successfully perform classification of the samples. However, each approach has strengths and weaknesses.

`Autoencoder Threshold Classification` <br>
Classification of the EEG data by using an autoencoder followed by evaluation of the reconstruction error provided satisfactory results. However, this model performs worst in the overall comparison with all metrics except recall. This loss can be traced back to the implicit creation of the training data. In the Sliding Time Windows technique, samples are also created in which only a portion contains an epilleptic seizure. Therefore, the reconstruction error is increased only to a certain proportion. However, since the average of all time steps is formed for each sample, samples with a small temporal fraction of epilleptic seizures cannot be correctly detected. Therefore, this approach is only suitable if the training data contains only time series where the label applies to all time steps.

`Random Forest Classification` <br>
The Random Forest classifier is the third best model and also achieved good results. The model created with hyperparameter optimization achieved an F1 score, G-Mean and Precision of about 0.83 and an AUC of 0.90. Thus, the comparatively very simple model can already make reliable predictions. Only the recall is the worst of all models, which means that a deficit in the detection of epileptic seizures can be identified.

`Autoencoder Latent Space Classification` <br>
This model is the most complex of the demonstrated methodologies and uses both an autoencoder and another neural network that takes the latent space as input and performs classification based on it. This approach has proven to be the second best and can achieve very good results across all metrics. The problem, which occurred with the threshold method, is not so important here, since the reconstruction error is only decisive for the training of the autoencoder. Only the recall shows a limited ability, whereby a smaller influence of the described problem can be measured again.

`Bi-LSTM` <br>
Across all metrics, the neural network with Bi-LSTM and Attention-Layer performed best. Here, an F1 score, G-Mean and AUC of 0.96 could be achieved. This result suggests a very good classification ability, which was also able to process the slightly unbalanced data set very well. These findings are also consistent with the Precision and Recall, demonstrating both the ability of the model to detect epilleptic seizures very well and a very low false positive rate. The good results can be attributed to the use of the LSTM layers in combination with the attention layer. The LSTM can process the temporal dependencies in the data and perform a highly effective mapping to the following dense layers via the attention layer. Combined with the high sensitivity of this network architecture, this model is very well suited for epilleptic seizure detection in EEG data. Another advantage of the Bi-LSTM model is the possibility of using Explainer libraries to understand how the model works. This makes it possible to understand the influence of individual channels on the results and, on this basis, to make a diagnosis of the affected brain region.

Classification of EEG data for epilleptic seizure detection is a current and relevant application area for research. The use of neural networks not only allows the diagnosis of epillepsy but can also be used as an early warning system and for targeted treatment. Due to the time series reference, structures with Bi-LSTM and attention layers are particularly suitable for this task. However, in order to validate the results obtained, further tests must be carried out with other data sets in order to prove the generalizability of the statements. In addition, there are further possibilities, which have been successful in research, how the classification can be carried out. One promising methodology would be the use of autoencoders for dimensionality reduction. Another very important aspect is the use of frequency information. By using algorithms such as the Fourier transform, more information can be extracted from the EEG data, which can provide more detailed information about the exact circumstances of the seizure. However, since this project was only developed within one lecture, these aspects could not be considered due to time constraints.