Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] USLEEP model #282

Merged
merged 88 commits into from
Sep 10, 2021
Merged

[MRG] USLEEP model #282

merged 88 commits into from
Sep 10, 2021

Conversation

tgnassou
Copy link
Contributor

Add USleep model for sleep staging
Joint work with @l-omar-chehab

“l-omar-chehab“ and others added 30 commits June 21, 2021 18:21
…taset` to return sequences of windows

- adding modified sleep staging examples that work on sequences of EEG windows
- adding a method `embed` to `SleepStagetChambon2018` class to reuse it as a feature extractor in the sleep staging on sequences examples
- adding tests
… test

- updating sleep_staging_sequences example to use sampler instead of SequenceDataset
- adding seq_target_transform property to BaseConcatDataset to transform the targets when extracting sequences of SequenceWindowsDataset
- adding functionality in BaseConcatDataset to index with a list of indices, such that a sequence a windows and targets can be returned
- fixing SleepPhysionet bug where cropping would cause an error with some recordings
- changing hyperparameters of sleep staging example to improve performance in a few epochs
- adding more representative splitting functionality in sleep staging example
- using class weights for imbalanced classes in sleep staging example
…ecode into sequence-dataset

Fusioner usleep + sequence dataset
@hubertjb
Copy link
Collaborator

hubertjb commented Jul 8, 2021

I just had a chat with @tgnassou and @l-omar-chehab. There are three things that still need to be done in this PR:

  1. Clarify what hyparameters were actually used in the USleep paper. The paper, the Supplementary Materials and the keras implementation all have different descriptions of how the numbers of convolutional channels are computed. A quick summary:

    • Paper (p.8): c_t = floor(c_{t-1} * np.sqrt(2)); c_1 = 5
    • Suppl. (p.32): c_t = floor(c_{t-1} * np.sqrt(1.67)); c_1 = 6 (doesn't seem to fit with the values in the table)
    • Keras : c_t = floor(h_t * np.sqrt(2)) where h_t = floor(h_{t-1} * np.sqrt(2)); h_1 = 5

@tgnassou and @l-omar-chehab will reach out to the authors to ask them which version they ended up using.

  1. Make sure the model can be used with more than just one fixed input sampling rate. Right now the kernel size and maxpool size are hardcoded to avoid previously seen problems with the dimensions of layers. It looks like the actual problem is that the maxpool size and the depth size (i.e. number of blocks) need to be compatible. @tgnassou is looking at that.

  2. We need another test that compares the dimensions of the model to those reported in the paper.

@hubertjb
Copy link
Collaborator

hubertjb commented Jul 8, 2021

Also, to answer your previous comment @robintibor : crop_wake_mins actually shouldn't be decreased below 30 (it's the typical value used in the literature). I think at this point the data is really as small as it can get, unless we want to only use a few examples from the two sleep recordings that are used in the example.

@robintibor
Copy link
Contributor

Maybe a clarification regarding examples as I see it @hubertjb
Obviously the nicest thing would be to have the best-performing pipeline as the examples, so people can directly use the code and expect great results and at the same time it is validated to run by the CI.
However, this would take too long on the CI, and in the end we decided that:

  • all decoding examples will run in CI
  • code should be as close as possible to good pipeline so users may not be confused and run suboptimal hyperparameters expecting good results
  • to ensure fast runs on CI, we drastically reduce data and epochs, even if it makes the example nonsensical as it is and clearly note that these are not useful values, indicating what are the useful values

This way, we can make sure the code always runs, is fast enough on CI, and should prevent users from mistakenly adopting suboptimal pipelines. And it is acceptable if the example due to reduced data and reduced number of examples does not learn anything as is (e.g., our trialwise decoding example (https://braindecode.org/master/auto_examples/plot_bcic_iv_2a_moabb_trial.html) is barely above chance on train and exactly at chance on valid at the end of training).

Otherwise, with examples that are "more realistic" but with suboptimal hyperparameters, we have seen users adopt suboptimal hyperparameters in their training and expect good results.

@tgnassou
Copy link
Contributor Author

This PR is ready to merge

@tgnassou tgnassou changed the title [WIP] USLEEP model [MRG] USLEEP model Jul 16, 2021
depth = 10
time_conv_size = 9 # 0.09s at sfreq = 100 Hz
max_pool_size = 2 # 0.02s at sfreq = 100 Hz
n_time_filters = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we expose this parameter in the init we'll need to add a test to make sure it works for different values no? No strong feeling here. @hubertjbanville thoughts?

@@ -209,6 +209,7 @@
with_skip_connection=True,
n_classes=n_classes,
input_size_s=input_size_samples / sfreq,
apply_softmax=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to use nn.CrossEntropyLoss here as you use class-weights? if so can you add a comment
so it's clear why you need to change a default parameter in a first example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about these questions @tgnassou ? please do add some comment about the depth=12 line before the model creation, so it shows up clearly in the example doc why we reduce here. I am also confused about apply_softmax=True and using nn.CrossEntropyLoss as doc of cross entropy loss states:
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

This criterion combines LogSoftmax and NLLLoss in one single class.

[...]

The input is expected to contain raw, unnormalized scores for each class.```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see this comment here @hubertjb ?


.. warning::
The example is written to have a very short excecution time.
This number of epochs is here too small and very few recordings are used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This number of epochs is here too small and very few recordings are used.
The number of epochs is here too small and very few recordings are used.

.. warning::
The example is written to have a very short excecution time.
This number of epochs is here too small and very few recordings are used.
To obtain competitive results you need to use more data and more epochs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what you observe can you be more specific? How many epochs will lead to
what accuracy (roughly). Something like : Taking for example at least 30 recordings
to training and doing 40 epochs should lead to a balanced accuracy close to 0.8.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also put the same warning in plot_sleep_staging.py ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and also add the changing of the depth parameter, in case that was for execution time @tgnassou

@robintibor
Copy link
Contributor

See comments by me and alex, and also @tgnassou I invited you to be maintainer, then you don't have to always wait for me to approve workflows. could you accept this invitation?

@hubertjb hubertjb mentioned this pull request Jul 24, 2021
@robintibor
Copy link
Contributor

So I think the loss comment from me above #282 (comment) definitely needs addressing. Is somebody still active on it or do we need to search somebody @hubertjb @tgnassou @agramfort

@tgnassou
Copy link
Contributor Author

tgnassou commented Aug 9, 2021

I am in holidays, that's why i'm not very active these days. But I will continue the work in september! Is it ok for you @robintibor ?

@robintibor
Copy link
Contributor

ok my plan was to have a release ~end of august, but maybe we can shift to midseptember anyways, then this one could be inside as well

@hubertjb
Copy link
Collaborator

hubertjb commented Aug 9, 2021

I'm still working on it and should have results soon @robintibor

… number of filters of each layer, first conv layer in decoder blocks)

- defining kernel size based on sfreq
…an even number

- adding test for usleep number of parameters
@agramfort
Copy link
Collaborator

@hubertjbanville nice ! can you just share a learning / convergence curve before we merge this? just to check that it's learning well and reaching good performance eg vs. Chambon et al. 🙏

@hubertjb
Copy link
Collaborator

Yes, I'll share that asap!

So far I can say that the balanced sequence sampler does seem to speed up learning a bit. The USleep model, especially in its depth=12 version, is massively overparametrized though, especially given a smallish dataset like Sleep Physionet. Reducing the depth and adding some weight decay helps, but the SleepStagerChambon2018 learns much more quickly and so far reaches higher test performance.

Also I did see a minor improvement in validation performance when combining softmax + CrossEntropyLoss, but so far not as much as what @tgnassou reported.

More to come soon!

@hubertjb
Copy link
Collaborator

hubertjb commented Sep 7, 2021

Trying to conclude on this @robintibor @agramfort @tgnassou

  1. U-Sleep is much harder to train on the "small-ish" SleepPhysionet. I found some configurations that could somewhat learn more quickly (e.g. model depth=8, weight decay=5e-5, learning rate=1e-5), but they never go much above 50% test balanced accuracy. SleepStagerChambon2018 in contrast can easily reach >70% in under 100 epochs. Using really low learning rates (e.g. 1e-7 like in the paper) does yield very smooth learning curves, but we would have to train models for a lot more than the 100 epochs I tested on to get interesting performance.

  2. The balanced sampler seems to help quite a lot. The models learn faster and reach slightly better test performance. Here are some learning curves obtained with SleepStagerChambon2018:

image

  1. The self-ensembling also does help, with gains of 5-10% balanced accuracy in my limited experiments on (overfitting) U-Sleep.

Overall, I'm confident the model implementation is accurate. I went through the original keras implementation and made sure the dimensions and the number of parameters were exactly the same (there are now unit tests for these). The model does not learn as well on SleepPhysionet, but my guess is this is an issue with the scale of the data and the training budget rather than the model itself.

@agramfort
Copy link
Collaborator

Great summary. I suggest to merge. Thanks @hubertjb for finishing this and @tgnassou and @l-omar-chehab for the ground work

@tgnassou
Copy link
Contributor Author

tgnassou commented Sep 7, 2021

Very nice ! Thank you @hubertjb !

@agramfort
Copy link
Collaborator

@robintibor the green button is yours !

@robintibor
Copy link
Contributor

There were some conflicts now due to other PR merge, which I have resolved.
More importantly @hubertjb , it seems the comment regarding applying softmax and using crossentropyloss has not been addressed I think (#282) ? Crossentropyloss will apply softmax again, so it is not correct what is happening there I think... Maybe even apply_softmax=False could already solve it? Or do I have some misunderstanding? :)

@hubertjb
Copy link
Collaborator

hubertjb commented Sep 9, 2021

Thanks @robintibor for fixing the conflicts and for pointing that out! I forgot to change that line - it should indeed be apply_softmax=False.

@robintibor robintibor merged commit fa6042b into braindecode:master Sep 10, 2021
@robintibor
Copy link
Contributor

Great stuff, thanks @tgnassou @l-omar-chehab @hubertjb , merged now

@agramfort
Copy link
Collaborator

awesome team effort @tgnassou @l-omar-chehab @hubertjb 🚀 🚀 🚀 🚀 👏 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants