Classification Model & Pipeline

🔬 Scientific and technical context of sleep classification
📚 Dataset
🔍 Feature extraction
📈 Methodology for classification model benchmarking and selection
- 🔗 Hidden Markov model
- 🏆 Results

ℹ️ To view the classification pipeline and server architecture's diagram, head to the Server architecture section.

🔬 Scientific and technical context of sleep classification

Currently, the standard for sleep stage classification in the medical field is manual classification of signals over time according to the AASM classification guide. As mentioned in the context of the project, the inter-scorer agreement amongst technicians who perform this work is 82.6% [Rosenberg and Van Hout, 2013]. This relatively low inter-scoring rate may explain why automatic classification methods have not yet been able to penetrate this field.

However, some algorithms do attempt to improve the results. SleepEEGNet is an example of an artificial neural network that approaches the state of the art. It implements a convolutional neural network to perform automatic feature extraction on EEG data. Then, a bidirectional recurrent neural network is used to take into account the temporal information of the signals. This type of network obtains an accuracy of 84.26% [Mousavi et al., 2019] when compared to the same data set we used. However, random tree forests are recognized as the most efficient for this type of classification which can reach up to 93% accuracy [Fiorillo et al., 2019].

Like most of the papers mentioned above, the classification algorithm provided for this project uses only the EEG channels of the data provided in Sleep-EDF. The objective is to address a classification method that takes less time to realize than the present clinical standard and that facilitates the process of biosignal acquisition. In the following sections we will try to explain how the classification algorithm we have developed works and analyse its performance.

21 hours of EEG recording for the Fpz-Cz channel

21 hours of EEG recording for the Pz-oz channel

📚 Dataset

The dataset comes from the public Sleep-EDF dataset of Physionet [Kemp, 2018]. It provides two electroencephalographic signals (FPz-Cz and Pz-Oz) sampled at 100Hz, as well as the hypnograms (sleep stages time-serie) associated with them following a polysomnographic analysis performed manually by a clinical technician according to the 1968 scoring manual, Rechtschaffen and Kales.

To be coherent with the new scoring standards stated by the AASM, we have merged the stages 3 and 4. The data also include an EOG signal, an EMG signal from below the eyes, oro-nasal airflow, rectal temperature and events that took place during the night. However, we only use electroencephalographic data, as we want to reduce the complexity of using our algorithm and facilitate signal acquisition.

The dataset has 153 nights of sleep distributed among 83 subjects. Note that for each hypnogram we have the following metadata: age and sex of the subject, as well as the date and time of the beginning of the recording and the time of bedtime. As defined by the AASM, hypnograms decompose the night into 5 stages: Wake, N1, N2, N3 and REM.

🔍 Feature extraction

In order to classify sleep stages, certain characteristics were extracted from the EEG signals to facilitate the discrimination between different associated sleep stages. To do so, we used some theoretical elements of signal processing that we have selected from this notebook.

Frequency range according to their sub-band name

⌛ Time-domain features

Time domain characteristics were extracted on each of our two EEG signals (Fpz-Cz and Pz-Oz) for each 30-second epoch. First of all, we found the four statistical moments on the epoch: mean, variance, skewness and flattening (kurtosis). Next, we calculate the zero-crossing rate from the centered signal to the mean, and then the Hjorth parameters of complexity and mobility.

〽️ Frequency domain features

Each of the epochs were also transformed in the frequency domain using a fourier transform and the Welch method. Windows of 256 samples were used and then averaged to achieve the spectral power density of our signal.

This change in space allowed us to extract characteristics such as the absolute power of each of the frequency bands defined in Table 1. We also calculated the relative power of each of these frequency bands. Another interesting characteristic that we extracted from the frequency domain is the spectral edge frequency difference. This type of characteristic has been shown to better discriminate REM from other sleep stages [Imtiaz and Rodriguez-Villegas, 2014].

🤿 Features based on the frequency sub-bands in the time domain

IIR (Butterworth) filters were applied to the original signals in order to extract the different frequency sub-bands. On these sub-bands, we were able to use the average energy of the sub-bands, as an interesting feature for classification.

📈 Methodology for classification model benchmarking and selection

The classification model we have chosen is a Voting Classifier composed of a random forest algorithm (RF) and a vector classification support (SVC). Voting is done in the soft way, i.e. the class chosen by the classifier is the one with the highest sum of probabilities from the algorithms. This sum is weighted by the importance given to the classifier, i.e. 0.837 for the random forest and 0.163 for the SVC. The methodology used to choose this algorithm was first to explore several different classifiers, to select the best hyperparameters by performing grid search cross validation, and then to compare them by observing the metrics obtained on the test set.

Since we are in the presence of hierarchical data, we had to take into account the individuals that make up the dataset, in order to avoid data leakage across our different set splits. Indeed, the train/test split is done by randomly selecting subjects (and all of their sleep sequences) from the original dataset. Similarly, for cross-validation, we specify groups so that the validation and training set does not contain data from the same subject at the same time.

We initially targeted a few data classification algorithms, namely random forest, k-nearest neighbors, naïve Bayes, support vector classification and voting classifier. The results obtained for each of the models can be observed here. We can also see that for each models, we have added to the pipeline a normalization of the features with continuous values. In each step of the hyperparameters search, we also evaluated whether we obtained the best results, i.e. a clear improvement in the training time with a not very significant decrease in agreement, with or without a reduction in model size. For the dimension reduction, we evaluated whether to proceed with a linear discriminant analysis or a principal component analysis.

🔗 Hidden Markov model

In order to automatically classify the sleep stages, one can observe the temporal dependency of the stages through a sleep sequence. Indeed, the sequence involves cycles between the different sleep stages previously listed. These cycles follow certain patterns that can be observed on larger and smaller scales. Thus, the literature shows promising results for models with recurrent neural networks as well as sequential probabilistic models. One such model is the Hidden Markov model. This model, similar to a first-order Markov model, establishes that all hidden states depend only on the hidden state at the previous period. However, this state is not a variable that is directly measured. Rather, it is the emissions that are observed, which themselves depend solely on the hidden state at the corresponding period of time.

In the literature, different approaches are defined in order to apply a Hidden Markov model to the output of a classifier, or to a network that calculates the probabilities of each stage based only on EEG signals [Jiang et al., 2019] [Malafeev et al., 2018]. The chosen implementation consists in specifying the state hidden by the sleep stage manually scanned by medical electrophysiology.

The measured variables are the stages predicted by the classifier, in our case a voting classifier composed of a random forest and a CVS. The transition and departure probability tables are calculated from the scoring data of the training and validation sets. The emission probabilities table is computed on the validation across all folds. Finally, the Viterbi algorithm is applied in order to find the most likely hidden state sequence given our emissions on our test set.

🏆 Results

Full results and performance analysis are part of our react web-app available here

Currently, there are still some problems with the way we classify the data. Using the Sleep-EDF dataset, which is the standard in the literature for automatic sleep classification, we use data that have been scaled according to the 1968 R&K manual and that consists of 6 classes (W, REM, S1, S2, S3 and S4). However, the manual generally used today, the AASM manual (2007), is composed of 5 classes (W, REM, N1, N2, N3). The AASM manual also suggests simply grouping the R&K classes S3 and S4 together to form the AASM class N3. In addition, as stated before the inter-scoring rate amongst electrophysiologists who perform automatic sleep classification is 82.6% [Rosenberg and Van Hout, 2013]. In order to obtain a little more information on the quality of the classification we perform, we asked the electrophysiologist to score a night's sleep from the dataset according to the standards set out by the AASM. We then compared the results obtained with those from the dataset that were scored using the R&K guide. The video discussion (in french) with the expert is available here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly