WIP: added unified validation scheme #397

martinwimpff · 2022-08-04T13:29:24Z

Added a unified validation scheme example as discussed in #378
This tutorial shares some similarities with #349
@bruAristimunha: let me know what your think about this first draft

codecov · 2022-08-04T13:58:14Z

Codecov Report

Merging #397 (b099a1a) into master (337d4c8) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #397   +/-   ##
=======================================
  Coverage   82.63%   82.63%           
=======================================
  Files          53       53           
  Lines        3738     3738           
=======================================
  Hits         3089     3089           
  Misses        649      649

bruAristimunha · 2022-08-04T15:01:02Z

Hi @martinwimpff,

Thank you so much for your PR!

Here is the render version:
https://output.circle-artifacts.com/output/job/a34faa77-933e-41f7-8a6f-4905c1fe2ef4/artifacts/0/dev/auto_examples/plot_unified_validation_scheme.html#sphx-glr-auto-examples-plot-unified-validation-scheme-py

I am still revising it, but it is already pretty cool! I will try to add a figure to make it easier to differentiate the three schemes.

sliwy

Great job @martinwimpff @bruAristimunha! To me the description is clear and I hope people will read it before doing ML for EEG. My comments are mainly typos and some reformulations.

A visualization of the CV schemes would be highly appreciated.

One more thing, now this tutorial and plot_hyperparameter_tuning_with_scikit-learn.py are quite similar. Maybe we can add hyperparameters search to the title of this one and just keep this tutorial? What do you think?

examples/plot_unified_validation_scheme.py

bruAristimunha

Is very good! I tried to give some small suggestions, I tried not to change the text. Feel free to accept or not the suggestions. I'll try to put some image to illustrate the options.

examples/plot_unified_validation_scheme.py

Co-authored-by: Maciej Sliwowski <maciejo988@wp.pl>

bruAristimunha · 2022-08-04T22:07:33Z

Hi @martinwimpff,

At a tutorial, I was wondering if we could add:

It's a vertorized image, so possibly it will be bigger in the tutorial.

martinwimpff · 2022-08-05T07:46:23Z

@sliwy: first of all, thanks a lot for your review!

One more thing, now this tutorial and plot_hyperparameter_tuning_with_scikit-learn.py are quite similar. Maybe we can add hyperparameters search to the title of this one and just keep this tutorial? What do you think?

I agree with that, there shouldn't be two tutorials. The new title could be something like: How to train, test and tune your model. But I am open for your suggestions ;)

As you suggested I added additional parts for option 2 and 3 that do not use GridSearch.

@bruAristimunha: thanks for your suggestions as well, I added them as well.
Regarding the visualization, I also agree with both of you, we should include that. The image you posted looks really nice but in my opinion, something simpler would probably make the point clearer. Here is a quick example of what I had in mind:

agramfort · 2022-08-05T08:04:19Z

I would produce images with matplotlib like sklearn does to demo the cross validation generators

sliwy · 2022-08-05T11:25:08Z

@martinwimpff thanks for the corrections!

How to train, test and tune your model - works for me :)

Having visualization done with matplotlib and sklearn could be also awesome.

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

Option 1 - train-test split
Option 2 - train-valid-test split
Option 3 - k-fold blockwise CV
In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I took a while (quite short, so I just scanned paper for CV method) to prepare summary of evaluation methods used for models that are currently available in braindecode. Maybe it can be useful for discussion and showing how it is used in papers. Feel free to include it in the tutorial if you think it may be useful. I feel that it would be nice if we can express all the methods from this table in terms of our evaluation tutorial terminology. What do you think?

Article	Models	Hyperparameters optimization	Final evaluation
Schirrmeister et al. 2017	ShallowFBSCPNet Deep4Net	Option 2	Option 1
Lawhern et al. 2018	EEGNet	n/a	Option 3 (hold-out test only for SMR dataset)
Chambon et al. 2018	SleepStagerChambon2018	Nested option 3	Option 3, no hold-out test set
Eldele et al. 2021	SleepStagerEldele2021	Option 3 (without hold-out test)	Option 3 (without hold-out test)
Perslev et al. 2021	USleep	Option 1	Option 1 on hold-out test

Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F., Burgard, W. and Ball, T. (2017), Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp., 38: 5391-5420. https://doi.org/10.1002/hbm.23730

Lawhern, V.J., Solon, A.J., Waytowich, N.R., Gordon, S.M., Hung, C.P., & Lance, B. (2018). EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. Journal of Neural Engineering, 15.

Chambon, S., Galtier, M.N., Arnal, P.J., Wainrib, G., & Gramfort, A. (2018). A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 758-769.

Eldele, E., Chen, Z., Liu, C., Wu, M., Kwoh, C., Li, X., & Guan, C. (2021). An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 809-818.

Perslev, M., Darkner, S., Kempfner, L., Nikolic, M., Jennum, P.J., & Igel, C. (2021). U-Sleep: resilient high-frequency sleep staging. NPJ Digital Medicine, 4.

martinwimpff · 2022-08-05T12:52:49Z

How to train, test and tune your model - works for me :)

👍

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

Option 1 - train-test split

Option 2 - train-valid-test split

Option 3 - k-fold blockwise CV
In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

Agree. I am not sure about the nested CV. What we do in option 3 is a k-fold blockwise CV (with a fixed Hold-Out set, the untouched test set). As far as I understand this should be used with BCI research as the test set should always be from an unseen session or subject. Further, most datasets provide this two-fold split anyways.
With nested CV one would throw everything in one data set and run two loops: one outer loop for splitting out the test set and one inner loop for the validation set. I don't really see a benefit from doing that for BCI research, as one would have to be extremely careful with how to design the splitters and the improvement is quite marginal.

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I actually had the same thought while creating the image. Maybe we have to make the division between HP search and "normal" training with 3 splits clearer. For HP tuning the test set should never be used.
For option 2 it may be used if you need to use the validation set for i.e. EarlyStopping. Else (if all HPs are fixed) option 1 should be used.
The same holds for option 3 with the addition that the final test score is the average over k test scores.

We can include the table that you created in something like an additional section that we call Reporting or something like that? There we could also highlight the (large) differences between the publications + the occasional lack of documentation regarding the tuning/evaluation to further motivate why this page was created.

agramfort · 2022-08-05T14:30:22Z

To avoid confusing the community I would aim to follow sklearn naming conventions and refer to sklearn documentation as much as possible. Maintaining a good documentation takes energy so let’s put our energy where we are very unique

bruAristimunha · 2022-08-05T21:51:28Z

I would produce images with matplotlib like sklearn does to demo the cross validation generators

Something like that @agramfort?

https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html

bruAristimunha · 2022-08-05T22:12:01Z

How to train, test and tune your model - works for me :)

+2. I love the new title =)

Can we find some names for CV options (it is easier to use outside of tutorial context), my proposition:

Option 1 - train-test split

Option 2 - train-valid-test split

Option 3 - k-fold blockwise CV
In the case inside k-fold blockwise CV there is another one for hyperparams this could be indicated with adding 'nested'?

Agree. I am not sure about the nested CV. What we do in option 3 is a k-fold blockwise CV (with a fixed Hold-Out set, the untouched test set). As far as I understand this should be used with BCI research as the test set should always be from an unseen session or subject. Further, most datasets provide this two-fold split anyways. With nested CV one would throw everything in one data set and run two loops: one outer loop for splitting out the test set and one inner loop for the validation set. I don't really see a benefit from doing that for BCI research, as one would have to be extremely careful with how to design the splitters and the improvement is quite marginal.

@martinwimpff For the option 3 visualization I would add test_set as a supplementary option because in some articles CV result is the final one while hyperparams are optimized in a nested CV.

I actually had the same thought while creating the image. Maybe we have to make the division between HP search and "normal" training with 3 splits clearer. For HP tuning the test set should never be used. For option 2 it may be used if you need to use the validation set for i.e. EarlyStopping. Else (if all HPs are fixed) option 1 should be used. The same holds for option 3 with the addition that the final test score is the average over k test scores.

We can include the table that you created in something like an additional section that we call Reporting or something like that? There we could also highlight the (large) differences between the publications + the occasional lack of documentation regarding the tuning/evaluation to further motivate why this page was created.

I love the idea of a reporting subsection to discuss a little the potential difference that we can find in metrics reported in articles and during the replication moment. I think you just passed this with deep4net, right @martinwimpff?

I would love this table in my thesis, but not in the library. With more deep learning methods, it may become unfeasible to maintain the table.

I wonder if we're putting too much information into a tutorial. I would suggest that we don't spend too much energy on definitions, as indicated by @agramfort, and close with a simple matplotlib view, like https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html. What do you think, @martinwimpff, @sliwy and @agramfort?

martinwimpff · 2022-08-07T13:24:13Z

To avoid confusing the community I would aim to follow sklearn naming conventions and refer to sklearn documentation as much as possible. Maintaining a good documentation takes energy so let’s put our energy where we are very unique

I think that this is a valid point and it might be better if we reorganize the content of this tutorial, also regarding the merge with the current hyperparameter tuning tutorial.

So, as mentioned above, I would like to make the division between "normal" training and HP tuning clearer. Therefore I would propose the following struture:

Introduction (motivate the topic)
How to train and evaluate your model
2.1 Simple Train-Test Split
2.2 Train-Val-Test Split
2.3 (blockwise) k-Fold CV (with Holdout test set)
Hyperparameter Tuning
3.1 Train-Val(-Test) Split
3.2 (blockwise) k-Fold CV
Reporting

As @agramfort mentioned, we should reference the sklearn library as often as possible (especially in sections 2 and 3). Further, the explanations of the schemes should be minimal and rather focus on the specifics of BCI data (e.g. test set should be from another session or subject, train-val split should be blockwise, is there anything else?). Is this what you meant with too much information @bruAristimunha ?

bruAristimunha · 2022-08-08T09:26:21Z

Yes @martinwimpff

sliwy · 2022-08-08T11:57:56Z

@bruAristimunha I agree, the table may be too much for the tutorial. The matplotlib simple plots look good to me! :)

@martinwimpff I like the new structure of the tutorial. Just for point 2.3 I would mention that it can be used as well without hold-out test set.

martinwimpff · 2022-08-13T09:00:35Z

I just committed a new version with the restructuring mentioned above.
@sliwy @bruAristimunha what do you think of this?
@bruAristimunha How should we proceed with the visualizations? Have you already had a look into it?

bruAristimunha · 2022-08-17T17:28:01Z

Hello @martinwimpff and @sliwy,
I have included three visualizations. Can you look and review it?

martinwimpff · 2022-08-18T06:07:04Z

Great job @bruAristimunha!

I have two small suggestions for the visualizations:

I would keep the colors consistent throughout the images (currently the test set color changes from 1 to 2&3)
I would keep the scaling for option 3 (x-axis ratio) s.t. its clearer that the test set (size) remains the same in all 3 options. You probably wanted to make the kFold visualization as big as possible + clearly separate the test set. So this one is probably a matter of taste ;)

sliwy

Great job @martinwimpff @bruAristimunha! Thanks for that!

The only small thing is that the code for generating the visualization complicates the tutorial a little bit, but I don't have any idea how to fix that apart from moving those functions to some module. I am not a big fan of this option either.

To me, after the small changes in the visualization mentioned by @martinwimpff it is ready to be merged! :)

examples/plot_how_train_test_and_tune.py

bruAristimunha · 2022-08-18T17:57:53Z

I have incorporated all comments. For me, we can apply merge, @agramfort.

sliwy · 2022-08-18T20:05:04Z

@bruAristimunha I almost forgot, do we finally remove the plot_hyperparameter_tuning_with_scikit-learn.py?

bruAristimunha · 2022-08-18T21:29:21Z

Can we handle this in another pull request?

sliwy · 2022-08-19T07:53:23Z

@bruAristimunha sure!

Once again, thanks for this great tutorial! 👏

examples/plot_how_train_test_and_tune.py

agramfort · 2022-08-19T07:53:14Z

examples/plot_how_train_test_and_tune.py

+from torch.utils.data import Subset
+from sklearn.model_selection import train_test_split
+from skorch.helper import predefined_split, SliceDataset
+
+X_train = SliceDataset(train_set, idx=0)
+y_train = np.array([y for y in SliceDataset(train_set, idx=1)])
+train_indices, val_indices = train_test_split(
+    X_train.indices_, test_size=0.2, shuffle=False
+)


this is neat but a bit convoluted for an intro tutorial.

I would consider exposing in train_set indices or something that can simply
be passed to train_test_split

One could use

train_test_split(np.array(len(train_set)), test_size=0.2, shuffle=False)

instead, but X_train and y_train would still be necessary for cross_val_score, search.fit() and the visualizations.

examples/plot_how_train_test_and_tune.py

agramfort · 2022-08-19T07:56:54Z

examples/plot_how_train_test_and_tune.py

+train_val_split = KFold(n_splits=5, shuffle=False)
+
+######################################################################
+# .. note::


this is something that should appear at the top and maybe as a warning as it's not a detail.

maybe we can use copy.deepcopy(model) in every EEGClassifier init, so we won't modify the weights of original model?

I prefer the warning option. If we use deepcopy we would have to include a small note on that as well to avoid confusion.

agramfort · 2022-08-19T08:36:44Z

arff I would use the word CV splitter

…

Message ID: ***@***.*** com>

martinwimpff · 2022-08-20T17:24:25Z

I included the comments from @agramfort.

I did not change this part:

from torch.utils.data import Subset
from sklearn.model_selection import train_test_split
from skorch.helper import predefined_split, SliceDataset

X_train = SliceDataset(train_set, idx=0)
y_train = np.array([y for y in SliceDataset(train_set, idx=1)])
train_indices, val_indices = train_test_split(
    X_train.indices_, test_size=0.2, shuffle=False
)

as X_train and y_train are used multiple times later in the tutorial.

I changed the final note to a warning and moved it to the start as I personally prefer this option for the tutorial. I also like the deepcopy option in general (since it it a very common thing to do) but I think that it might confuse people in this tutorial.

agramfort

looks good !

https://output.circle-artifacts.com/output/job/218ce5d3-5b86-4094-a596-ddce84388a1a/artifacts/0/dev/auto_examples/plot_how_train_test_and_tune.html

agramfort · 2022-08-20T20:14:08Z

thanks @martinwimpff @bruAristimunha 🙏

bruAristimunha · 2022-08-20T20:29:14Z

A huge thank you to you, @martinwimpff. For building the entire tutorial! Welcome to the braindecode contributor group, and we look forward to your next contributions! Always feel welcome to contribute and interact with the library.

robintibor · 2022-08-26T07:12:23Z

Also wanted to say a big thank you to @martinwimpff @bruAristimunha and also @agramfort the tutorial really looks incredible, really a valuable addition to the library. I had been a bit absent and feel super happy to see such great work once I found the time to look haha :)

WIP: added unified validation scheme

188b0a4

bruAristimunha requested a review from sliwy August 4, 2022 13:49

sliwy requested changes Aug 4, 2022

View reviewed changes

bruAristimunha reviewed Aug 4, 2022

View reviewed changes

Apply suggestions from code review

ffb7d4f

Co-authored-by: Maciej Sliwowski <maciejo988@wp.pl>

implemented further suggestions

149c73a

bruAristimunha closed this Aug 5, 2022

bruAristimunha reopened this Aug 5, 2022

martinwimpff added 2 commits August 13, 2022 10:46

added issue number

ecc07ff

restructured tutorial

8b4906a

bruAristimunha added 2 commits August 17, 2022 18:04

Adding three useful functions to improve visualization.

267a172

black/flake8 + renaming file

f6f9610

bruAristimunha approved these changes Aug 17, 2022

View reviewed changes

bruAristimunha requested a review from robintibor August 17, 2022 17:34

bruAristimunha removed the request for review from robintibor August 17, 2022 17:34

sliwy approved these changes Aug 18, 2022

View reviewed changes

examples/plot_how_train_test_and_tune.py Outdated Show resolved Hide resolved

small changes and suggestions.

26f3cac

agramfort reviewed Aug 19, 2022

View reviewed changes

added suggestions from review

b099a1a

agramfort approved these changes Aug 20, 2022

View reviewed changes

agramfort merged commit 4feffb7 into braindecode:master Aug 20, 2022

bruAristimunha mentioned this pull request Sep 27, 2022

About deprecated functions expected in version 0.7 #412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: added unified validation scheme #397

WIP: added unified validation scheme #397

martinwimpff commented Aug 4, 2022

codecov bot commented Aug 4, 2022 •

edited

bruAristimunha commented Aug 4, 2022

sliwy left a comment

bruAristimunha left a comment

bruAristimunha commented Aug 4, 2022 •

edited

martinwimpff commented Aug 5, 2022

agramfort commented Aug 5, 2022 via email

sliwy commented Aug 5, 2022 •

edited

martinwimpff commented Aug 5, 2022

agramfort commented Aug 5, 2022 via email

bruAristimunha commented Aug 5, 2022

bruAristimunha commented Aug 5, 2022

martinwimpff commented Aug 7, 2022

bruAristimunha commented Aug 8, 2022

sliwy commented Aug 8, 2022

martinwimpff commented Aug 13, 2022

bruAristimunha commented Aug 17, 2022

martinwimpff commented Aug 18, 2022

sliwy left a comment

bruAristimunha commented Aug 18, 2022

sliwy commented Aug 18, 2022

bruAristimunha commented Aug 18, 2022

sliwy commented Aug 19, 2022

agramfort Aug 19, 2022

martinwimpff Aug 19, 2022

agramfort Aug 19, 2022

sliwy Aug 19, 2022

martinwimpff Aug 19, 2022

agramfort commented Aug 19, 2022 via email

martinwimpff commented Aug 20, 2022

agramfort left a comment

agramfort commented Aug 20, 2022

bruAristimunha commented Aug 20, 2022

robintibor commented Aug 26, 2022

WIP: added unified validation scheme #397

WIP: added unified validation scheme #397

Conversation

martinwimpff commented Aug 4, 2022

codecov bot commented Aug 4, 2022 • edited

Codecov Report

bruAristimunha commented Aug 4, 2022

sliwy left a comment

Choose a reason for hiding this comment

bruAristimunha left a comment

Choose a reason for hiding this comment

bruAristimunha commented Aug 4, 2022 • edited

martinwimpff commented Aug 5, 2022

agramfort commented Aug 5, 2022 via email

sliwy commented Aug 5, 2022 • edited

martinwimpff commented Aug 5, 2022

agramfort commented Aug 5, 2022 via email

bruAristimunha commented Aug 5, 2022

bruAristimunha commented Aug 5, 2022

martinwimpff commented Aug 7, 2022

bruAristimunha commented Aug 8, 2022

sliwy commented Aug 8, 2022

martinwimpff commented Aug 13, 2022

bruAristimunha commented Aug 17, 2022

martinwimpff commented Aug 18, 2022

sliwy left a comment

Choose a reason for hiding this comment

bruAristimunha commented Aug 18, 2022

sliwy commented Aug 18, 2022

bruAristimunha commented Aug 18, 2022

sliwy commented Aug 19, 2022

agramfort Aug 19, 2022

Choose a reason for hiding this comment

martinwimpff Aug 19, 2022

Choose a reason for hiding this comment

agramfort Aug 19, 2022

Choose a reason for hiding this comment

sliwy Aug 19, 2022

Choose a reason for hiding this comment

martinwimpff Aug 19, 2022

Choose a reason for hiding this comment

agramfort commented Aug 19, 2022 via email

martinwimpff commented Aug 20, 2022

agramfort left a comment

Choose a reason for hiding this comment

agramfort commented Aug 20, 2022

bruAristimunha commented Aug 20, 2022

robintibor commented Aug 26, 2022

codecov bot commented Aug 4, 2022 •

edited

bruAristimunha commented Aug 4, 2022 •

edited

sliwy commented Aug 5, 2022 •

edited