Skip to content

This repository contains a series of experiments that improved the classification performance of EEG-Spectrogram Data in the Kaggle competition HMS - Harmful Brain Activity Classification.

Notifications You must be signed in to change notification settings

Cranjis-McB/HMS-Harmful-Brain-Activity-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 

Repository files navigation

This repository contains a series of experiments that improved the classification performance of EEG-Spectrogram Data in the Kaggle competition HMS - Harmful Brain Activity Classification.

Description

  • The Data consists of 50-second long EEG samples plus matched spectrograms covering a 10-minute window centered at the same time and labeled the central 10 seconds.
  • Each of these samples belongs to one of six categories: Seizure, LPD, GPD, LRDA, GRDA, or Other is determined by expert voters.
  • The vote count for each sample varies among several experts, ranging from 1 to 28.
  • The Competition Criterion is KLDivergence Loss between the predicted probability and the observed target.

I primarily focused on utilizing Spectrogram Image Data, employing both CNN and Transformer based approaches to enhance the Performance.

Configuration

For most of the experiments, I have followed the same configuration as described below.

  • Model: Efficientnets
  • Fold: StratifiedGroupKFold (5 Folds)
  • Epochs: 6
  • Eval_per_epoch: 2
  • Optimizer: AdamW
  • Learning Rate: 1e-3 (For CNN)/ 1e-4 (For Transformers)
  • Scheduler: One Cycle Policy with MaxLR: 1e-3 (For CNN)/ 1e-4 (For Transformers)
  • Loss: KLDiv Loss

Our baseline model processes a spectrogram image composed of four panels stacked vertically: LL, LP, RL, and RP.

Input OOF-CV Public LB
Spectrogram Images 0.7287 0.45

In this approach, rather than directly using the images, we extract the following statistics from four panel images and utilize them as input for our CNN:

 X_min = np.min([LL, LP, RL, RP])
 X_max = np.max([LL, LP, RL, RP])
 X_mean = np.mean([LL, LP, RL, RP])
 X_var = X_max - X_min

These can be seen as global spectrogram features. These derived statistics are then utilized as input features for our Convolutional Neural Network (CNN).

Input OOF-CV Public LB
Global Features 0.7324 0.46

3. Ensemble of 1 + 2

Ensemble can be performed in multiple ways; 1. Model Ensemble; where we take the weighted sum of the 2 models to get the final output. 2. Input Feature Ensemble; where we concat the input features from 1 and 2 and then train the model.

 # 1. Model Ensemble
 model = 0.5 * model_1 + 0.5 * model_2

 # 2. Input Ensemble
 input = np.hstack([baseline_features, global_features])
Type OOF-CV Public LB
Model Ensemble NA 0.42
Input Feature Ensemble 0.7027 0.43

4. EEG Spectrograms

Instead of using Kaggle-provided spectrograms, we generated Spectrograms from EEG Data as described in this notebook.

Note that for the baseline model, we concatenated percentile features along with the Input Features. that gave us a good 0.04 boost on CV and 0.01 boost on LB.

 # Percentiles
 X_20p = np.percentile(X, q=20, axis=0)
 X_40p = np.percentile(X, q=40, axis=0)
 X_60p = np.percentile(X, q=60, axis=0)
 X_80p = np.percentile(X, q=80, axis=0)
 X_median = np.vstack([X_20p, X_40p, X_60p, X_80p])
 
 input_img = np.hstack([input_img, X_median])
Inputs OOF-CV Public LB
Baseline + Percentiles 0.7104 0.45
Global Features 0.7537 0.46
Model Ensemble NA 0.42

5. Kaggle + EEG Ensemble

This is the ensemble of the models yielded in 3 and 4.

Type OOF-CV Public LB
Kaggle Ensemble NA 0.42
EEG Ensemble NA 0.42
Kaggle + EEG Ensemble NA 0.38

6. Vote-Weighted KLDiv Loss

So far we were only using KL-Divergence Loss as a Cost function; ignoring the total expert votes for a given sample.

The idea here is that samples with more votes are more reliable. So we modify the cost function to take account of the total number of votes along with KLDiv-Loss. we modify the cost function to:

Loss = KLDiv * torch.log(total_votes + 1)

This alone gave us a total of 0.02 boost in CV and 0.02 boost in LB. we further added percentile features as described in 4 and used Same-Class Cutmix Augmentation to get an additional 0.03 boost in CV and 0.01 boost in LB over baseline described in 1.

Input OOF-CV Public LB
Spectrogram + Percentiles 0.6767 0.42
Global Features 0.6971 0.42
Ensemble NA 0.40

Table: Kaggle Spectrograms

7. Global Normalization

So far we have been normalizing the spectrogram images according to their mean and variance as shown below.

# Normalization
m = np.nanmean(img.flatten())
s = np.nanstd(img.flatten())
img = (img - m) / (s + ep)

Instead of doing this, we derived the mean and standard deviation from the training data and used them for normalization. This gave us a a good 0.04 boost in CV and 0.02 boost in LB. (Thanks to Sandeep Anna for suggesting this idea.)

Input OOF-CV Public LB
Spectrograms 0.6355 0.40
Global Features 0.6566 0.41
Ensemble NA 0.38

Table: Kaggle Spectrograms

8. Mosaic Warmup + xloss

  • Mosaic Warmup: We combined 4 spectrogram images into one image and labeled them as the average of their labels. we use these images and labels as warmup training for 3-epochs.

  • xloss: we further modified the loss function to

    Loss = KLDiv * torch.clamp(total_votes , 10)

Input OOF-CV Public LB
Spectrograms 0.6290 0.37
Global Features 0.6402 0.39
Ensemble NA 0.36

Table: Kaggle Spectrograms

9. 2-Stage Training

  • we divided the samples into two categories based on the total number of expert votes.

    1. 1-3 Votes => Noisy Labels
    2. 4-28 votes => Good Labels

In Stage 1, we trained the model only on Noisy Labels. (1-3 votes). Later in Stage 2, we finetuned the models on Good Labels. (4-28 votes)

Input OOF-CV Public LB
Kaggle Ensemble 0.4142 0.34
EEG Ensemble 0.4608 0.37
Final Ensemble NA 0.32

Table: Ensemble

Competition Result

  • Our final solution achieved a ranking of 329th among 2768 candidates.

References

About

This repository contains a series of experiments that improved the classification performance of EEG-Spectrogram Data in the Kaggle competition HMS - Harmful Brain Activity Classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published