Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Drop last in train #411

Merged
merged 4 commits into from
Oct 7, 2022

Conversation

robintibor
Copy link
Contributor

Closes #16
This PR would by default drop the last batch of the training set during training if it is smaller than the given batch size (happens if #examples in training set is not exactly divisible by the batch size). This would affect existing training code, i.e., might change results (hence tests had to be updated), but I think it is such a sensible default it should be on by default...
If you give a batch size that is larger than the entire training set, the behavior might be somewhat unexpected (training loop is simply skipped in skorch, but it does not crash, just no training happens)... But might be such a rare case we can live with it? otherwise we would have to catch that on call of fit I guess..
What do you think @bruAristimunha @agramfort @gemeinl @sliwy ?

@codecov
Copy link

codecov bot commented Sep 23, 2022

Codecov Report

Merging #411 (fb06e90) into master (3fc8cd2) will increase coverage by 0.01%.
The diff coverage is n/a.

❗ Current head fb06e90 differs from pull request most recent head cf962d8. Consider uploading reports for the commit cf962d8 to get more accurate results

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage   84.17%   84.18%   +0.01%     
==========================================
  Files          46       46              
  Lines        3601     3617      +16     
==========================================
+ Hits         3031     3045      +14     
- Misses        570      572       +2     

@agramfort
Copy link
Collaborator

what experiment on your end did motivate you change this default? did you observe issues without?

@robintibor
Copy link
Contributor Author

So actually when writing original HBM paper (https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730), this issue caused the longest "bughunt" I ever had (~about 2 weeks). One subject had a dataset size that led to the final training batch having size 1, which would cause the loss to randomly diverge after some random number of epochs (let's say 200 epochs everything fine, then suddenly diverge).
As even today, even very state-of-the-art deep learning papers encounter mysterious training instabilities, I think anything that is likely to increase training stability without costing anything should be done.
E.g., a current paper with mysterious instabilities related to precise batches being used: https://arxiv.org/abs/2204.02311
image

So should have maybe done this much earlier, but it is what it is.

@agramfort
Copy link
Collaborator

agramfort commented Sep 26, 2022 via email

@robintibor
Copy link
Contributor Author

ok then just still to consider whether to catch the case with batch_size < first batch size, so case where this would mean no training. In general I think in this case anyways one should be made aware that the requested batch size is impossible as dataset too small just question how to solve this nicely

@bruAristimunha bruAristimunha self-requested a review September 29, 2022 16:36
Copy link
Collaborator

@bruAristimunha bruAristimunha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes seems ok to me. No change in logic. Robin, can you update the whats new file?

@cedricrommel
Copy link
Collaborator

what experiment on your end did motivate you change this default? did you observe issues without?

We were getting this in our experiments if you remember well @agramfort :)
We were doing the same in our exp code and I think it makes total sense to make it the default.

@bruAristimunha bruAristimunha mentioned this pull request Sep 30, 2022
@robintibor
Copy link
Contributor Author

so with regard to giving the user some error/warning in case batch size < min batch size, should we do this? I think it is not super trivial, probably on call to fit(), one would need to check the lengths of the train loader iterator and then give some warning (or error?) if it is 0... should we still do this now or defer to 0.8 even? Skorch doesn't raise an error in that case and runs without training and without computing training metrics...

@bruAristimunha
Copy link
Collaborator

so with regard to giving the user some error/warning in case batch size < min batch size, should we do this? I think it is not super trivial, probably on call to fit(), one would need to check the lengths of the train loader iterator and then give some warning (or error?) if it is 0... should we still do this now or defer to 0.8 even? Skorch doesn't raise an error in that case and runs without training and without computing training metrics...

It looks like this is more unexpected behaviour for the skorch team than it does for us. For me, we don't do anything in version 0.7 and try to solve it together (braindecode and skorch) for version 0.8 with a warning.

@bruAristimunha
Copy link
Collaborator

Can we merge @robintibor?

@robintibor
Copy link
Contributor Author

yes feel free to merge then @bruAristimunha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure small training batches are dropped
4 participants