-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Random sampler #295
[MRG] Random sampler #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where the balanced sampling is done.
have you tried this on a real problem / dataset already?
it not I would suggest you do it first to see how well it works. thx @tgnassou
I tried my fonction on dataset. It returns sampling as i want. But i'm not sure that is what we want. I choose a random number of subject, I choose a random class (every class has the same probabilitie to be chosen) and I create the sequence with this window. Because all the class have the same probabilitie i thought it was balanced. |
I mean does it enable you to get better performance on real data?
does it lead to a running time increase when learning?
basically is it useful according to you and if so is it a free lunch in
running time
… |
Codecov Report
@@ Coverage Diff @@
## master #295 +/- ##
==========================================
+ Coverage 80.38% 80.59% +0.20%
==========================================
Files 49 49
Lines 3085 3123 +38
==========================================
+ Hits 2480 2517 +37
- Misses 605 606 +1 |
braindecode/samplers/base.py
Outdated
@@ -81,6 +87,10 @@ def __iter__(self): | |||
def n_recordings(self): | |||
return self.info.shape[0] | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be actually safer to supply n_classes explicitly? Otherwise you may run into problems:
- if in some subset not all classes are present (imagine small evaluation subset in debugging experiment)
- targets somewhere else as in Discrete and synchronized targets support #261 may not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first point is fixed in the latest commit. I'm not sure about the second point though - the sampler requires categorical targets in the BaseConcatDataset's metadata
attribute. What use case exactly did you have in mind @robintibor ?
I added tests and uniformized the implementation so it follows the other samplers' logic more closely. Next step is to test #282 with this sampler and make sure we get better results than with naive sampling of sequences. |
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: robintibor <robintibor@gmail.com>
- modifying _init_info to check for required keys in metadata when creating info dataframe - reviewing computation of start_ind - improving docstrings - adding tests
69734db
to
9a0b77c
Compare
Thanks @robintibor ! The validation of the USleep architecture might lead to some minor changes, but otherwise I think this is ready to be merged. |
Great, merged! |
Add a new sampler inspired by USleep article. We define a fixed number of sequence. For each sequence, a window is chosen for a random recording and a random class. The sequence is created around this window with a random position. Each class has the same probability to be chosen. So this method allow to equitably represent the classes despite their difference in proportion.
@agramfort @hubertjb @l-omar-chehab