-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Self-supervised learning example on Sleep Physionet #178
Conversation
It looks like the failed CI is caused by differences in the recorded loss of other tests ( In any case, the rendered example can be found here! |
braindecode/datasets/base.py
Outdated
|
||
@property | ||
def metadata(self): | ||
"""Concatenate the metadata and description of the wrapped Epochs. | ||
|
||
NOTE: This is implemented as a property to avoid having to keep a very | ||
large DataFrame in case the dataset contains very long or many | ||
recordings. | ||
|
||
Returns | ||
------- | ||
pd.DataFrame: | ||
DataFrame containing as many rows as there are windows in the | ||
BaseConcatDataset, with the metadata and description information | ||
for each window. | ||
""" | ||
if not all([isinstance(ds, WindowsDataset) for ds in self.datasets]): | ||
raise TypeError('Property metadata can only be computed when all ' | ||
'datasets are WindowsDataset.') | ||
|
||
all_dfs = list() | ||
for ds in self.datasets: | ||
df = ds.windows.metadata | ||
for k, v in ds.description.items(): | ||
df[k] = v | ||
all_dfs.append(df) | ||
|
||
return pd.concat(all_dfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the critical part of this PR with regards to discussing. I am not sure I have a good mental overview over the dataset part. so metadata here will be concatenated from windows datasets metadata as far as I see.
Regarding implementation as property, could be fine, I am a bit worried as I don't think we use properties in other places, and it can always cause some confusion if there is non-neglible execution time on access to a member (meaning looks like access to a member from caller side). But as said, could be fine.
@gemeinl do you think this metadata here fits in nicely with rest of data structures as you see them? or do you envision any potential conflicts/confusions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, this part requires some discussion. I guess this could instead be a method get_metadata()
, which would make the fact that this requires computations more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemeinl can maybe also give is view tomorrow on metadata vs get_metadata() tomorrow. I am not sure. metadata more consistent with windows.metadata, get_metadata() more explicit that there is a computation.... what does yoda @agramfort think?
braindecode/samplers/base.py
Outdated
metadata : pd.DataFrame | ||
DataFrame describing at least one of {subject, session, run} | ||
information for each window in the BaseConcatDataset to sample examples | ||
from. Normally obtained as the `metadata` propery of BaseConcatDataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo on "propery".
Also this is still not so clear to me, maybe you an also put an minimal example in the docstring? Does not have be something that runs by itself, like just show relevant lines for understanding what is in metadata and what would happen.
Hm does it not help to rebase from master? like I thought the problem is solved now. |
In any case thanks for sharing self-supervised code, really appreciate it! |
It looks like it's already based on master actually. I haven't looked yet but maybe there's been another skorch update in the meantime...? |
yeah since in sliwys PR the same problem, feel free to ignore. we will have to look at it separately again :( |
rebasing on master should now make all tests run again @hubertjb. Seems some strange travis UI delays atm, but I received a mail that they pass. |
> what does yoda @agramfort <https://github.com/agramfort> think?
😂
+1 to avoid properties is they involved computations unless you cache the
results
|
- adding metadata property to BaseConcatDataset - adding samplers module, with base class RecordingSampler and implementation of a relative positioning sampler - changing default on_missing in SleepPhysionet from raise to warn - adding batch norm option to SleepStagerChambon2018 model - adding tests
- adding example of `metadata` in docstring of `Sampler` class - changing `metadata` property of `BaseConcatDataset` to a method `get_metadata`
@robintibor I just pushed the latest changes, including changing the |
I am asking myself if it is necessary or good that the dataset implements |
preload: bool | ||
if True, preload the data of the Raw objects. | ||
If True, preload the data of the Raw objects. | ||
load_eeg_only: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is seems that this is currently not used. Do you think it is necessary to have this option? It looks like a shortcut to avoid using mne.Raw.drop_channels
or mne.Raw.pick
/ mne.Raw.pick_channels
/ mne.Raw.pick_types
in a preprocessing step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nevermind. It is used and channels are already excluded on read from HDD.
Hm to me |
@gemeinl I think it makes sense to have BaseConcatDataset return a |
ok, will merge this! Keep in mind:
We may want to adapt the code at that point. But until then: |
🍾 🎉 🍺 |
Awesome, thanks @robintibor! :) Yes, I agree, if this can be implemented with data augmentations it will be worth it to look into changing the implementation of this example. To keep in mind though, I think using a |
This PR introduces a new example, along with a new
samplers
module, to demonstrate how self-supervised learning can be used to learn representations in an unsupervised manner on raw EEG. The example implements the relative positioning task described in this preprint.I've kept the number of recordings used in the example as well as the number of self-supervised samples and training epochs pretty low, so the example could run relatively quickly. The results are already interesting in that setting, although nicer results are obtained when more examples are included. For instance, with 40 recordings: