Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: added unified validation scheme #397

Merged
merged 9 commits into from Aug 20, 2022

Conversation

martinwimpff
Copy link
Collaborator

Added a unified validation scheme example as discussed in #378
This tutorial shares some similarities with #349
@bruAristimunha: let me know what your think about this first draft

@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Merging #397 (b099a1a) into master (337d4c8) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #397   +/-   ##
=======================================
  Coverage   82.63%   82.63%           
=======================================
  Files          53       53           
  Lines        3738     3738           
=======================================
  Hits         3089     3089           
  Misses        649      649           

@bruAristimunha
Copy link
Collaborator

Hi @martinwimpff,

Thank you so much for your PR!

Here is the render version:
https://output.circle-artifacts.com/output/job/a34faa77-933e-41f7-8a6f-4905c1fe2ef4/artifacts/0/dev/auto_examples/plot_unified_validation_scheme.html#sphx-glr-auto-examples-plot-unified-validation-scheme-py

I am still revising it, but it is already pretty cool! I will try to add a figure to make it easier to differentiate the three schemes.

Copy link
Collaborator

@sliwy sliwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @martinwimpff @bruAristimunha! To me the description is clear and I hope people will read it before doing ML for EEG. My comments are mainly typos and some reformulations.

A visualization of the CV schemes would be highly appreciated.

One more thing, now this tutorial and plot_hyperparameter_tuning_with_scikit-learn.py are quite similar. Maybe we can add hyperparameters search to the title of this one and just keep this tutorial? What do you think?

examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@bruAristimunha bruAristimunha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is very good! I tried to give some small suggestions, I tried not to change the text. Feel free to accept or not the suggestions. I'll try to put some image to illustrate the options.

examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
examples/plot_unified_validation_scheme.py Outdated Show resolved Hide resolved
Co-authored-by: Maciej Sliwowski <maciejo988@wp.pl>
@bruAristimunha
Copy link
Collaborator

bruAristimunha commented Aug 4, 2022

Hi @martinwimpff,

At a tutorial, I was wondering if we could add:

schema_train_test

image

It's a vertorized image, so possibly it will be bigger in the tutorial.

@martinwimpff
Copy link
Collaborator Author

@sliwy: first of all, thanks a lot for your review!

One more thing, now this tutorial and plot_hyperparameter_tuning_with_scikit-learn.py are quite similar. Maybe we can add hyperparameters search to the title of this one and just keep this tutorial? What do you think?

I agree with that, there shouldn't be two tutorials. The new title could be something like: How to train, test and tune your model. But I am open for your suggestions ;)

As you suggested I added additional parts for option 2 and 3 that do not use GridSearch.

@bruAristimunha: thanks for your suggestions as well, I added them as well.
Regarding the visualization, I also agree with both of you, we should include that. The image you posted looks really nice but in my opinion, something simpler would probably make the point clearer. Here is a quick example of what I had in mind:

CV

@agramfort
Copy link
Collaborator

agramfort commented Aug 5, 2022 via email

@sliwy
Copy link
Collaborator

sliwy commented Aug 5, 2022

@martinwimpff thanks for the corrections!

How to train, test and tune your model - works for me :)

Having visualization done with matplotlib and sklearn could be also awesome.

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

  1. Option 1 - train-test split
  2. Option 2 - train-valid-test split
  3. Option 3 - k-fold blockwise CV
    In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I took a while (quite short, so I just scanned paper for CV method) to prepare summary of evaluation methods used for models that are currently available in braindecode. Maybe it can be useful for discussion and showing how it is used in papers. Feel free to include it in the tutorial if you think it may be useful. I feel that it would be nice if we can express all the methods from this table in terms of our evaluation tutorial terminology. What do you think?

Article Models Hyperparameters optimization Final evaluation
Schirrmeister et al. 2017 ShallowFBSCPNet
Deep4Net
Option 2 Option 1
Lawhern et al. 2018 EEGNet n/a Option 3 (hold-out test only for SMR dataset)
Chambon et al. 2018 SleepStagerChambon2018 Nested option 3 Option 3, no hold-out test set
Eldele et al. 2021 SleepStagerEldele2021 Option 3 (without hold-out test) Option 3 (without hold-out test)
Perslev et al. 2021 USleep Option 1 Option 1 on hold-out test

Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F., Burgard, W. and Ball, T. (2017), Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp., 38: 5391-5420. https://doi.org/10.1002/hbm.23730

Lawhern, V.J., Solon, A.J., Waytowich, N.R., Gordon, S.M., Hung, C.P., & Lance, B. (2018). EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. Journal of Neural Engineering, 15.

Chambon, S., Galtier, M.N., Arnal, P.J., Wainrib, G., & Gramfort, A. (2018). A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 758-769.

Eldele, E., Chen, Z., Liu, C., Wu, M., Kwoh, C., Li, X., & Guan, C. (2021). An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 809-818.

Perslev, M., Darkner, S., Kempfner, L., Nikolic, M., Jennum, P.J., & Igel, C. (2021). U-Sleep: resilient high-frequency sleep staging. NPJ Digital Medicine, 4.

@martinwimpff
Copy link
Collaborator Author

How to train, test and tune your model - works for me :)

👍

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

  1. Option 1 - train-test split
  2. Option 2 - train-valid-test split
  3. Option 3 - k-fold blockwise CV
    In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

Agree. I am not sure about the nested CV. What we do in option 3 is a k-fold blockwise CV (with a fixed Hold-Out set, the untouched test set). As far as I understand this should be used with BCI research as the test set should always be from an unseen session or subject. Further, most datasets provide this two-fold split anyways.
With nested CV one would throw everything in one data set and run two loops: one outer loop for splitting out the test set and one inner loop for the validation set. I don't really see a benefit from doing that for BCI research, as one would have to be extremely careful with how to design the splitters and the improvement is quite marginal.

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I actually had the same thought while creating the image. Maybe we have to make the division between HP search and "normal" training with 3 splits clearer. For HP tuning the test set should never be used.
For option 2 it may be used if you need to use the validation set for i.e. EarlyStopping. Else (if all HPs are fixed) option 1 should be used.
The same holds for option 3 with the addition that the final test score is the average over k test scores.

We can include the table that you created in something like an additional section that we call Reporting or something like that? There we could also highlight the (large) differences between the publications + the occasional lack of documentation regarding the tuning/evaluation to further motivate why this page was created.

@agramfort
Copy link
Collaborator

agramfort commented Aug 5, 2022 via email

@bruAristimunha
Copy link
Collaborator

I would produce images with matplotlib like sklearn does to demo the cross validation generators

Something like that @agramfort?

https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html

@bruAristimunha
Copy link
Collaborator

How to train, test and tune your model - works for me :)

+2. I love the new title =)

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

  1. Option 1 - train-test split
  2. Option 2 - train-valid-test split
  3. Option 3 - k-fold blockwise CV
    In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

Agree. I am not sure about the nested CV. What we do in option 3 is a k-fold blockwise CV (with a fixed Hold-Out set, the untouched test set). As far as I understand this should be used with BCI research as the test set should always be from an unseen session or subject. Further, most datasets provide this two-fold split anyways. With nested CV one would throw everything in one data set and run two loops: one outer loop for splitting out the test set and one inner loop for the validation set. I don't really see a benefit from doing that for BCI research, as one would have to be extremely careful with how to design the splitters and the improvement is quite marginal.

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I actually had the same thought while creating the image. Maybe we have to make the division between HP search and "normal" training with 3 splits clearer. For HP tuning the test set should never be used. For option 2 it may be used if you need to use the validation set for i.e. EarlyStopping. Else (if all HPs are fixed) option 1 should be used. The same holds for option 3 with the addition that the final test score is the average over k test scores.

We can include the table that you created in something like an additional section that we call Reporting or something like that? There we could also highlight the (large) differences between the publications + the occasional lack of documentation regarding the tuning/evaluation to further motivate why this page was created.

I love the idea of a reporting subsection to discuss a little the potential difference that we can find in metrics reported in articles and during the replication moment. I think you just passed this with deep4net, right @martinwimpff?

I would love this table in my thesis, but not in the library. With more deep learning methods, it may become unfeasible to maintain the table.

I wonder if we're putting too much information into a tutorial. I would suggest that we don't spend too much energy on definitions, as indicated by @agramfort, and close with a simple matplotlib view, like https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html. What do you think, @martinwimpff, @sliwy and @agramfort?

@martinwimpff
Copy link
Collaborator Author

To avoid confusing the community I would aim to follow sklearn naming conventions and refer to sklearn documentation as much as possible. Maintaining a good documentation takes energy so let’s put our energy where we are very unique

I think that this is a valid point and it might be better if we reorganize the content of this tutorial, also regarding the merge with the current hyperparameter tuning tutorial.

So, as mentioned above, I would like to make the division between "normal" training and HP tuning clearer. Therefore I would propose the following struture:

  1. Introduction (motivate the topic)
  2. How to train and evaluate your model
    2.1 Simple Train-Test Split
    2.2 Train-Val-Test Split
    2.3 (blockwise) k-Fold CV (with Holdout test set)
  3. Hyperparameter Tuning
    3.1 Train-Val(-Test) Split
    3.2 (blockwise) k-Fold CV
  4. Reporting

As @agramfort mentioned, we should reference the sklearn library as often as possible (especially in sections 2 and 3). Further, the explanations of the schemes should be minimal and rather focus on the specifics of BCI data (e.g. test set should be from another session or subject, train-val split should be blockwise, is there anything else?). Is this what you meant with too much information @bruAristimunha ?

@bruAristimunha
Copy link
Collaborator

Yes @martinwimpff

@sliwy
Copy link
Collaborator

sliwy commented Aug 8, 2022

@bruAristimunha I agree, the table may be too much for the tutorial. The matplotlib simple plots look good to me! :)

@martinwimpff I like the new structure of the tutorial. Just for point 2.3 I would mention that it can be used as well without hold-out test set.

@martinwimpff
Copy link
Collaborator Author

I just committed a new version with the restructuring mentioned above.
@sliwy @bruAristimunha what do you think of this?
@bruAristimunha How should we proceed with the visualizations? Have you already had a look into it?

@bruAristimunha
Copy link
Collaborator

Hello @martinwimpff and @sliwy,
I have included three visualizations. Can you look and review it?

@bruAristimunha bruAristimunha removed the request for review from robintibor August 17, 2022 17:34
@martinwimpff
Copy link
Collaborator Author

Great job @bruAristimunha!

I have two small suggestions for the visualizations:

  1. I would keep the colors consistent throughout the images (currently the test set color changes from 1 to 2&3)
  2. I would keep the scaling for option 3 (x-axis ratio) s.t. its clearer that the test set (size) remains the same in all 3 options. You probably wanted to make the kFold visualization as big as possible + clearly separate the test set. So this one is probably a matter of taste ;)

Copy link
Collaborator

@sliwy sliwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @martinwimpff @bruAristimunha! Thanks for that!

The only small thing is that the code for generating the visualization complicates the tutorial a little bit, but I don't have any idea how to fix that apart from moving those functions to some module. I am not a big fan of this option either.

To me, after the small changes in the visualization mentioned by @martinwimpff it is ready to be merged! :)

examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
@bruAristimunha
Copy link
Collaborator

I have incorporated all comments. For me, we can apply merge, @agramfort.

@sliwy
Copy link
Collaborator

sliwy commented Aug 18, 2022

@bruAristimunha I almost forgot, do we finally remove the plot_hyperparameter_tuning_with_scikit-learn.py?

@bruAristimunha
Copy link
Collaborator

Can we handle this in another pull request?

@sliwy
Copy link
Collaborator

sliwy commented Aug 19, 2022

@bruAristimunha sure!

Once again, thanks for this great tutorial! 👏

examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
Comment on lines +315 to +323
from torch.utils.data import Subset
from sklearn.model_selection import train_test_split
from skorch.helper import predefined_split, SliceDataset

X_train = SliceDataset(train_set, idx=0)
y_train = np.array([y for y in SliceDataset(train_set, idx=1)])
train_indices, val_indices = train_test_split(
X_train.indices_, test_size=0.2, shuffle=False
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is neat but a bit convoluted for an intro tutorial.

I would consider exposing in train_set indices or something that can simply
be passed to train_test_split

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could use

train_test_split(np.array(len(train_set)), test_size=0.2, shuffle=False)

instead, but X_train and y_train would still be necessary for cross_val_score, search.fit() and the visualizations.

examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved
train_val_split = KFold(n_splits=5, shuffle=False)

######################################################################
# .. note::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something that should appear at the top and maybe as a warning as it's not a detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use copy.deepcopy(model) in every EEGClassifier init, so we won't modify the weights of original model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the warning option. If we use deepcopy we would have to include a small note on that as well to avoid confusion.

@agramfort
Copy link
Collaborator

agramfort commented Aug 19, 2022 via email

@martinwimpff
Copy link
Collaborator Author

I included the comments from @agramfort.

I did not change this part:

from torch.utils.data import Subset
from sklearn.model_selection import train_test_split
from skorch.helper import predefined_split, SliceDataset

X_train = SliceDataset(train_set, idx=0)
y_train = np.array([y for y in SliceDataset(train_set, idx=1)])
train_indices, val_indices = train_test_split(
    X_train.indices_, test_size=0.2, shuffle=False
)

as X_train and y_train are used multiple times later in the tutorial.

I changed the final note to a warning and moved it to the start as I personally prefer this option for the tutorial. I also like the deepcopy option in general (since it it a very common thing to do) but I think that it might confuse people in this tutorial.

Copy link
Collaborator

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort agramfort merged commit 4feffb7 into braindecode:master Aug 20, 2022
@agramfort
Copy link
Collaborator

thanks @martinwimpff @bruAristimunha 🙏

@bruAristimunha
Copy link
Collaborator

A huge thank you to you, @martinwimpff. For building the entire tutorial! Welcome to the braindecode contributor group, and we look forward to your next contributions! Always feel welcome to contribute and interact with the library.

@robintibor
Copy link
Contributor

Also wanted to say a big thank you to @martinwimpff @bruAristimunha and also @agramfort the tutorial really looks incredible, really a valuable addition to the library. I had been a bit absent and feel super happy to see such great work once I found the time to look haha :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants