# Detection of interictal periods in EEG signals using machine learning

## Train and test machine learning models to detect interictal periods in EEG signals with pycaret

In this notebook, we will use the pycaret library to train and test machine learning models to detect interictal periods in EEG signals. The models will be trained on features extracted from the EEG signals using pycaret. This notebook show a high-level overview of the process and the result about the best model to detect interictal periods in EEG signals. The models will be evaluated using various metrics such as accuracy, precision, recall, and F1-score. The best model will be selected based on these metrics and will be used for further analysis.

## Prepare the environment

### Install requirements

In [None]:
!pip install -r ../requirements.txt

### Global variables

In [1]:
PATH_DATASET = "../datasets"
PATH_SCRIPTS = "../scripts"
PATH_RESULTS = "../results"

### Import libraries

In [3]:
import pandas as pd
from pycaret.classification import setup, compare_models,plot_model,get_config

## Loading final dataset

In [5]:
Path_final_dataset = PATH_RESULTS+"/features/EEG_features_AllFeatures.csv"
df_final = pd.read_csv(Path_final_dataset,sep=';')
df_final

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,...,Energy,Std,ZeroCrossings,DWT_A_Energy,DWT_A_Std,DWT_L1_Energy,DWT_L1_Std,DWT_L2_Std,DWT_L3_Std,Label
0,-0.910810,-0.432789,1.557870,1.238957,6.244845,5.608685,-0.011796,4.849851,6.243332,1.448628,...,6701313.0,28.625071,161,6.494952e+06,77.592879,1.851422e+05,18.901216,6.525675,2.479045,1
1,0.669241,-2.770761,2.628015,1.059362,5.188711,8.703730,-2.145227,8.147524,7.337089,0.293559,...,77128620.0,133.475722,72,7.466569e+07,368.810753,2.346342e+06,67.289651,18.696915,3.012844,1
2,1.796252,2.188240,-0.747876,-1.133820,-3.750040,-2.946909,-1.082376,-1.652842,-2.302908,-1.303553,...,23778006.0,71.958091,189,2.198122e+07,193.852127,1.590235e+06,55.385295,15.882118,4.972672,1
3,2.203993,2.113146,-0.424998,-1.584056,-5.539428,-5.599065,-0.269519,-3.230361,-5.071597,-1.470821,...,10945840.0,37.801572,136,1.071575e+07,103.497921,2.593951e+05,22.377317,6.419182,2.324330,1
4,0.535932,1.589788,-0.389449,-1.399532,-2.005868,-2.518123,0.547491,0.839877,-2.527554,-2.759672,...,30095867.0,84.285276,236,2.640545e+07,221.600514,2.929839e+06,75.145144,28.542184,6.026236,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
645,20.023325,16.020371,6.290595,4.731961,4.266542,6.531234,-11.489270,-5.686753,18.477839,-8.366114,...,5185146.0,126.518221,172,1.603541e+07,345.614337,1.217490e+06,95.318972,39.374376,6.534350,0
646,31.160014,12.274358,6.142229,4.872937,4.144671,0.125958,2.868405,7.321399,9.279057,-0.844818,...,5837538.0,115.398081,147,1.381258e+07,320.763931,2.126342e+05,39.811116,24.477559,12.198242,0
647,0.635756,4.026442,0.969354,-15.829528,-19.835517,7.931998,3.804053,1.651805,4.298282,-16.361731,...,5785262.0,124.482693,166,1.453175e+07,329.283893,1.303496e+06,98.613805,18.890919,3.344671,0
648,-24.988395,20.037430,11.400389,7.526375,-10.123523,7.438246,-1.055514,0.284052,-2.857728,5.791968,...,5935519.0,123.019851,161,1.562052e+07,341.202046,4.119846e+05,55.443075,15.198340,5.862354,0


## Training Models

We use setup of pycaret to prepare the data for training. This function will automatically preprocess the data, handle missing values, and encode categorical variables. We will also specify the target variable, which is the label indicating whether the signal is interictal or not.

In [17]:

Target_data='Label'
clf1 = setup(data=df_final, target=Target_data, session_id=123, verbose=True)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Label
2,Target type,Binary
3,Original data shape,"(650, 173)"
4,Transformed data shape,"(650, 173)"
5,Transformed train set shape,"(454, 173)"
6,Transformed test set shape,"(196, 173)"
7,Numeric features,172
8,Preprocess,True
9,Imputation type,simple


## Compare models

We use the compare_models function to train and evaluate multiple machine learning models on the dataset. This function will automatically select the best model based on the specified evaluation metric, which is accuracy in this case. The function will also return a dataframe with the results of all the models, including their accuracy, precision, recall, and F1-score.

In [18]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.9734,0.9935,0.965,0.9662,0.965,0.9435,0.9443,0.188
rf,Random Forest Classifier,0.9713,0.9938,0.9595,0.9666,0.9623,0.9391,0.94,0.31
lightgbm,Light Gradient Boosting Machine,0.9625,0.99,0.9425,0.9607,0.9501,0.9201,0.9219,1.594
gbc,Gradient Boosting Classifier,0.9602,0.9889,0.9425,0.9548,0.9471,0.9153,0.9172,1.846
xgboost,Extreme Gradient Boosting,0.9582,0.9895,0.9373,0.954,0.9443,0.9109,0.9125,0.456
ada,Ada Boost Classifier,0.9339,0.9794,0.9141,0.9166,0.914,0.8604,0.8621,0.45
dt,Decision Tree Classifier,0.9271,0.9275,0.9301,0.891,0.9072,0.8474,0.852,0.114
lr,Logistic Regression,0.9271,0.9783,0.9186,0.8961,0.9056,0.8462,0.8487,0.735
ridge,Ridge Classifier,0.8919,0.912,0.9431,0.8176,0.8725,0.7804,0.7909,0.047
lda,Linear Discriminant Analysis,0.8898,0.9211,0.949,0.8065,0.8703,0.7762,0.7862,0.055


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

## Review shapes and keys of the training data



In [None]:
X_train = get_config('X_train')
y_train = get_config('y_train')

print(X_train.keys())
print(X_train.shape)  
print(y_train.shape)

Index(['PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'PC6', 'PC7', 'PC8', 'PC9', 'PC10',
       ...
       'Complexity', 'Energy', 'Std', 'ZeroCrossings', 'DWT_A_Energy',
       'DWT_A_Std', 'DWT_L1_Energy', 'DWT_L1_Std', 'DWT_L2_Std', 'DWT_L3_Std'],
      dtype='object', length=172)


## Visualization of important graphs about the best model

In [20]:
plot_model(best_model, plot='confusion_matrix', save=True)

plot_model(best_model, plot='auc', save=True)

plot_model(best_model, plot='learning', save=True)

plot_model(best_model, plot='feature', save=True)

plot_model(best_model, plot='class_report', save=True)

'Class Report.png'

## Save the best model

In [21]:

import joblib

joblib.dump(best_model, "Ada-Boost-Classifier.joblib")

['Ada-Boost-Classifier.joblib']