[WIP] Self-supervised learning example on Sleep Physionet #178

hubertjb · 2020-11-04T22:55:29Z

This PR introduces a new example, along with a new samplers module, to demonstrate how self-supervised learning can be used to learn representations in an unsupervised manner on raw EEG. The example implements the relative positioning task described in this preprint.

I've kept the number of recordings used in the example as well as the number of self-supervised samples and training epochs pretty low, so the example could run relatively quickly. The results are already interesting in that setting, although nicer results are obtained when more examples are included. For instance, with 40 recordings:

hubertjb · 2020-11-05T00:02:41Z

It looks like the failed CI is caused by differences in the recorded loss of other tests (test_cropped_decoding.py etc.), similar to what happened some time ago due to a change in skorch version. I'm not sure there's a link with the changes of this PR.

In any case, the rendered example can be found here!

robintibor · 2020-11-05T17:32:34Z

braindecode/datasets/base.py

+
+    @property
+    def metadata(self):
+        """Concatenate the metadata and description of the wrapped Epochs.
+
+        NOTE: This is implemented as a property to avoid having to keep a very
+              large DataFrame in case the dataset contains very long or many
+              recordings.
+
+        Returns
+        -------
+        pd.DataFrame:
+            DataFrame containing as many rows as there are windows in the
+            BaseConcatDataset, with the metadata and description information
+            for each window.
+        """
+        if not all([isinstance(ds, WindowsDataset) for ds in self.datasets]):
+            raise TypeError('Property metadata can only be computed when all '
+                            'datasets are WindowsDataset.')
+
+        all_dfs = list()
+        for ds in self.datasets:
+            df = ds.windows.metadata
+            for k, v in ds.description.items():
+                df[k] = v
+            all_dfs.append(df)
+
+        return pd.concat(all_dfs)


I guess this is the critical part of this PR with regards to discussing. I am not sure I have a good mental overview over the dataset part. so metadata here will be concatenated from windows datasets metadata as far as I see.
Regarding implementation as property, could be fine, I am a bit worried as I don't think we use properties in other places, and it can always cause some confusion if there is non-neglible execution time on access to a member (meaning looks like access to a member from caller side). But as said, could be fine.

@gemeinl do you think this metadata here fits in nicely with rest of data structures as you see them? or do you envision any potential conflicts/confusions?

I agree with you, this part requires some discussion. I guess this could instead be a method get_metadata(), which would make the fact that this requires computations more explicit.

@gemeinl can maybe also give is view tomorrow on metadata vs get_metadata() tomorrow. I am not sure. metadata more consistent with windows.metadata, get_metadata() more explicit that there is a computation.... what does yoda @agramfort think?

robintibor · 2020-11-05T17:34:03Z

braindecode/samplers/base.py

+    metadata : pd.DataFrame
+        DataFrame describing at least one of {subject, session, run}
+        information for each window in the BaseConcatDataset to sample examples
+        from. Normally obtained as the `metadata` propery of BaseConcatDataset.


typo on "propery".
Also this is still not so clear to me, maybe you an also put an minimal example in the docstring? Does not have be something that runs by itself, like just show relevant lines for understanding what is in metadata and what would happen.

robintibor · 2020-11-05T17:36:02Z

It looks like the failed CI is caused by differences in the recorded loss of other tests (test_cropped_decoding.py etc.), similar to what happened some time ago due to a change in skorch version. I'm not sure there's a link with the changes of this PR.

In any case, the rendered example can be found here!

Hm does it not help to rebase from master? like I thought the problem is solved now.

robintibor · 2020-11-05T17:41:33Z

In any case thanks for sharing self-supervised code, really appreciate it!

hubertjb · 2020-11-05T17:45:32Z

It looks like the failed CI is caused by differences in the recorded loss of other tests (test_cropped_decoding.py etc.), similar to what happened some time ago due to a change in skorch version. I'm not sure there's a link with the changes of this PR.
In any case, the rendered example can be found here!

Hm does it not help to rebase from master? like I thought the problem is solved now.

It looks like it's already based on master actually. I haven't looked yet but maybe there's been another skorch update in the meantime...?

robintibor · 2020-11-06T00:56:05Z

It looks like the failed CI is caused by differences in the recorded loss of other tests (test_cropped_decoding.py etc.), similar to what happened some time ago due to a change in skorch version. I'm not sure there's a link with the changes of this PR.
In any case, the rendered example can be found here!

Hm does it not help to rebase from master? like I thought the problem is solved now.

It looks like it's already based on master actually. I haven't looked yet but maybe there's been another skorch update in the meantime...?

yeah since in sliwys PR the same problem, feel free to ignore. we will have to look at it separately again :(

robintibor · 2020-11-11T14:26:58Z

rebasing on master should now make all tests run again @hubertjb. Seems some strange travis UI delays atm, but I received a mail that they pass.
Then you could also fix typo, and we can merge once decision about metadata vs get_metadata() is done (within next few days?).

agramfort · 2020-11-11T15:38:03Z

> what does yoda @agramfort <https://github.com/agramfort> think?

😂 +1 to avoid properties is they involved computations unless you cache the results

- adding metadata property to BaseConcatDataset - adding samplers module, with base class RecordingSampler and implementation of a relative positioning sampler - changing default on_missing in SleepPhysionet from raise to warn - adding batch norm option to SleepStagerChambon2018 model - adding tests

- adding example of `metadata` in docstring of `Sampler` class - changing `metadata` property of `BaseConcatDataset` to a method `get_metadata`

hubertjb · 2020-11-11T21:32:19Z

@robintibor I just pushed the latest changes, including changing the metadata property to a get_metadata method, and making sure that the model that is used for visualization is the one that achieved the best valid loss.

gemeinl · 2020-11-12T10:37:11Z

I am asking myself if it is necessary or good that the dataset implements get_metadata. From what I see, all the information is accessible from the outside, dataset.description as well as dataset.windows.metadata so we could just move the code into the RecordingSampler and RelativePositionSampler. This would avoid all the problems regarding class properties or naming and could also avoid potential confusion about the term metadata and what it is. Any thoughts on this?

gemeinl · 2020-11-12T10:49:35Z

braindecode/datasets/sleep_physionet.py

    preload: bool
-        if True, preload the data of the Raw objects.
+        If True, preload the data of the Raw objects.
    load_eeg_only: bool


Is seems that this is currently not used. Do you think it is necessary to have this option? It looks like a shortcut to avoid using mne.Raw.drop_channels or mne.Raw.pick / mne.Raw.pick_channels / mne.Raw.pick_types in a preprocessing step.

Ok, nevermind. It is used and channels are already excluded on read from HDD.

robintibor · 2020-11-12T11:33:23Z

I am asking myself if it is necessary or good that the dataset implements get_metadata. From what I see, all the information is accessible from the outside, dataset.description as well as dataset.windows.metadata so we could just move the code into the RecordingSampler and RelativePositionSampler. This would avoid all the problems regarding class properties or naming and could also avoid potential confusion about the term metadata and what it is. Any thoughts on this?

Hm to me get_metadata seems still ok to me, like it makes sense to have in the class, since it uses a lot of the class members, and to me logically makes sense to be there... also does not introduce any additional changes outside the new function itself...

hubertjb · 2020-11-13T22:22:56Z

I am asking myself if it is necessary or good that the dataset implements get_metadata. From what I see, all the information is accessible from the outside, dataset.description as well as dataset.windows.metadata so we could just move the code into the RecordingSampler and RelativePositionSampler. This would avoid all the problems regarding class properties or naming and could also avoid potential confusion about the term metadata and what it is. Any thoughts on this?

Hm to me get_metadata seems still ok to me, like it makes sense to have in the class, since it uses a lot of the class members, and to me logically makes sense to be there... also does not introduce any additional changes outside the new function itself...

@gemeinl I think it makes sense to have BaseConcatDataset return a metadata dataframe directly as it's pretty much what we're already doing with the description attribute (the description attribute of the individual WindowsDatasets are also concatenated into a single dataframe, also called description). I think there are other scenarios in which it might be convenient to have access to this concatenated metadata DataFrame directly too, e.g., for splitting datasets, converting to a (X,y) dataset, sanity checking, etc. For that reason I would at least implement get_metadata outside of the Sampler objects so it can be easily reused down the line.

robintibor · 2020-11-16T12:24:31Z

ok, will merge this!

Keep in mind:

If we agree on some specific data augmentation implementation
AND
this way of implementing augmentations can also very nicely cover the case here (I am aware this is not same as augmentation here)
AND
it is different from what you have done

We may want to adapt the code at that point.

But until then:
https://twitter.com/malonebarry/status/1325284260777955328 👍 💯 🥳 😃

agramfort · 2020-11-16T14:17:39Z

🍾 🎉 🍺

hubertjb · 2020-11-16T14:27:34Z

Awesome, thanks @robintibor! :)

Yes, I agree, if this can be implemented with data augmentations it will be worth it to look into changing the implementation of this example. To keep in mind though, I think using a Sampler object is pretty natural and intuitive here, so this could be a reason to keep this implementation.

hubertjb requested review from robintibor, gemeinl and agramfort November 4, 2020 22:55

robintibor reviewed Nov 5, 2020

View reviewed changes

hubertjb added 5 commits November 11, 2020 16:27

minor changes

bc684e7

improving descriptions and fixing minor mistakes

96d2f6d

typo

ca57efd

- making sure best model is used

e2944bb

- adding example of `metadata` in docstring of `Sampler` class - changing `metadata` property of `BaseConcatDataset` to a method `get_metadata`

hubertjb force-pushed the rel_pos branch from c0a2ad4 to e2944bb Compare November 11, 2020 21:31

gemeinl reviewed Nov 12, 2020

View reviewed changes

robintibor mentioned this pull request Nov 13, 2020

Experimentation data aug #177

Closed

robintibor merged commit 12334c4 into braindecode:master Nov 16, 2020

hubertjb deleted the rel_pos branch November 16, 2020 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Self-supervised learning example on Sleep Physionet #178

[WIP] Self-supervised learning example on Sleep Physionet #178

hubertjb commented Nov 4, 2020 •

edited

Loading

hubertjb commented Nov 5, 2020 •

edited

Loading

robintibor Nov 5, 2020

hubertjb Nov 5, 2020

robintibor Nov 11, 2020

robintibor Nov 5, 2020

robintibor commented Nov 5, 2020

robintibor commented Nov 5, 2020

hubertjb commented Nov 5, 2020

robintibor commented Nov 6, 2020

robintibor commented Nov 11, 2020

agramfort commented Nov 11, 2020 via email

hubertjb commented Nov 11, 2020

gemeinl commented Nov 12, 2020

gemeinl Nov 12, 2020

gemeinl Nov 12, 2020 •

edited

Loading

robintibor commented Nov 12, 2020

hubertjb commented Nov 13, 2020

robintibor commented Nov 16, 2020

agramfort commented Nov 16, 2020

hubertjb commented Nov 16, 2020

[WIP] Self-supervised learning example on Sleep Physionet #178

[WIP] Self-supervised learning example on Sleep Physionet #178

Conversation

hubertjb commented Nov 4, 2020 • edited Loading

hubertjb commented Nov 5, 2020 • edited Loading

robintibor Nov 5, 2020

Choose a reason for hiding this comment

hubertjb Nov 5, 2020

Choose a reason for hiding this comment

robintibor Nov 11, 2020

Choose a reason for hiding this comment

robintibor Nov 5, 2020

Choose a reason for hiding this comment

robintibor commented Nov 5, 2020

robintibor commented Nov 5, 2020

hubertjb commented Nov 5, 2020

robintibor commented Nov 6, 2020

robintibor commented Nov 11, 2020

agramfort commented Nov 11, 2020 via email

hubertjb commented Nov 11, 2020

gemeinl commented Nov 12, 2020

gemeinl Nov 12, 2020

Choose a reason for hiding this comment

gemeinl Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

robintibor commented Nov 12, 2020

hubertjb commented Nov 13, 2020

robintibor commented Nov 16, 2020

agramfort commented Nov 16, 2020

hubertjb commented Nov 16, 2020

hubertjb commented Nov 4, 2020 •

edited

Loading

hubertjb commented Nov 5, 2020 •

edited

Loading

gemeinl Nov 12, 2020 •

edited

Loading