Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Random sampler #295

Merged
merged 14 commits into from
Aug 9, 2021
Merged

Conversation

tgnassou
Copy link
Contributor

@tgnassou tgnassou commented Jul 6, 2021

Add a new sampler inspired by USleep article. We define a fixed number of sequence. For each sequence, a window is chosen for a random recording and a random class. The sequence is created around this window with a random position. Each class has the same probability to be chosen. So this method allow to equitably represent the classes despite their difference in proportion.

@agramfort @hubertjb @l-omar-chehab

Copy link
Collaborator

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where the balanced sampling is done.

have you tried this on a real problem / dataset already?

it not I would suggest you do it first to see how well it works. thx @tgnassou

docs/whats_new.rst Outdated Show resolved Hide resolved
docs/api.rst Outdated Show resolved Hide resolved
braindecode/samplers/base.py Outdated Show resolved Hide resolved
braindecode/samplers/base.py Outdated Show resolved Hide resolved
braindecode/samplers/base.py Outdated Show resolved Hide resolved
braindecode/samplers/base.py Outdated Show resolved Hide resolved
braindecode/samplers/base.py Show resolved Hide resolved
@tgnassou
Copy link
Contributor Author

tgnassou commented Jul 7, 2021

I tried my fonction on dataset. It returns sampling as i want. But i'm not sure that is what we want.

I choose a random number of subject, I choose a random class (every class has the same probabilitie to be chosen) and I create the sequence with this window. Because all the class have the same probabilitie i thought it was balanced.

@agramfort
Copy link
Collaborator

agramfort commented Jul 7, 2021 via email

@codecov
Copy link

codecov bot commented Jul 8, 2021

Codecov Report

Merging #295 (9a0b77c) into master (5b826c4) will increase coverage by 0.20%.
The diff coverage is 97.72%.

@@            Coverage Diff             @@
##           master     #295      +/-   ##
==========================================
+ Coverage   80.38%   80.59%   +0.20%     
==========================================
  Files          49       49              
  Lines        3085     3123      +38     
==========================================
+ Hits         2480     2517      +37     
- Misses        605      606       +1     

@@ -81,6 +87,10 @@ def __iter__(self):
def n_recordings(self):
return self.info.shape[0]

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be actually safer to supply n_classes explicitly? Otherwise you may run into problems:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first point is fixed in the latest commit. I'm not sure about the second point though - the sampler requires categorical targets in the BaseConcatDataset's metadata attribute. What use case exactly did you have in mind @robintibor ?

@hubertjb hubertjb marked this pull request as ready for review July 24, 2021 04:33
@hubertjb
Copy link
Collaborator

I added tests and uniformized the implementation so it follows the other samplers' logic more closely. Next step is to test #282 with this sampler and make sure we get better results than with naive sampling of sequences.

@robintibor
Copy link
Contributor

maybe rebase this on #319 then tests should be fine. is there more to do before this is ready on merge @hubertjb ?

tgnassou and others added 14 commits August 9, 2021 13:50
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: robintibor <robintibor@gmail.com>
- modifying _init_info to check for required keys in metadata when creating info dataframe
- reviewing computation of start_ind
- improving docstrings
- adding tests
@hubertjb hubertjb changed the title [WIP] Random sampler [MRG] Random sampler Aug 9, 2021
@hubertjb
Copy link
Collaborator

hubertjb commented Aug 9, 2021

Thanks @robintibor ! The validation of the USleep architecture might lead to some minor changes, but otherwise I think this is ready to be merged.

@robintibor robintibor merged commit 5266556 into braindecode:master Aug 9, 2021
@robintibor
Copy link
Contributor

Great, merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants