Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve create_from_X_y #148

Closed
wants to merge 3 commits into from

Conversation

gemeinl
Copy link
Collaborator

@gemeinl gemeinl commented Jul 24, 2020

split create_from_X_y into two functions, one returning BaseDatasets for further preprocessing on trial level, the second directly returning WindowsDatasets.

…rther preprocessing on trial level, the second directly returning windows.
braindecode/datautil/xy.py Outdated Show resolved Hide resolved
braindecode/datautil/xy.py Outdated Show resolved Hide resolved

Returns
-------
windows_datasets: BaseConcatDataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why windows_datasets of type BaseConcatDataset and not WindowsDataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So probably naming is the problem here. In braindecode, a BaseDataset is a wrapper around a single mne.Raw. Similarly, a WindowsDataset is a wrapper around a single mne.Epochs. Both can be concatenated using the BaseConcatDataset class. So, since here multiple WindowsDatasets are returned, it is actually a BaseConcatDataset of WindowsDataset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation. In mne when you concat raw you still get raw and if you concat epochs you still get epochs. Is it really necessary to have another type of dataset in the public API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the dataset part we tried to stick to the PyTorch API as closely as possible. Our BaseDataset inherits from torch.utils.data.dataset.Dataset and our BaseConcatDataset inherits from torch.utils.data.dataset.ConcatDataset, see https://pytorch.org/docs/stable/data.html#torch.utils.data.ConcatDataset. But we can always discuss with @robintibor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then I would just propose to update the docstring:

Suggested change
windows_datasets: BaseConcatDataset
concat_dataset: BaseConcatDataset

@@ -102,7 +102,8 @@ Data Utils
.. autosummary::
:toctree: generated/

create_from_X_y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an API breakage without deprecation

Copy link
Collaborator Author

@gemeinl gemeinl Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort Is it fine to use mne.utils.deprecated to add a DeprecationWarning? Or is this confusing, since the function will then appear to be a mne and not a braindecode function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would copy such a util function in the braindecode somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks! I will discuss with @robintibor about where to put it and how to do it.

@@ -103,6 +103,8 @@ Data Utils
:toctree: generated/

create_from_X_y
create_trials_from_X_y
create_windows_from_X_y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to keep a single create_from_X_y and then internally, dependings with items of X have the same length use windows or base? Basically as a user do I need to care of I can just pass the chunks of EEG data I have an the models will just deal with them? if you expect some breakage as some models only work with WindowsDataset it's another story but if you expect models to 'just work' maybe we can avoid some complexity on the user side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to explain what's going on. We just had a usecase (see #136) where a user had a dataset in Xy format, however, it was not preprocessed. So he wanted to convert it into braindecode format to be able to use the available braindecode/mne preprocessing. So far, the create_from_X_y function assumed data was already preprocessed and directly transformed it into compute windows for decoding. So it did not allow for preprocessing. I suggested something similar to you (see #136 (comment)), @robintibor did not agree. It is still undecided / open for discussion.

@agramfort
Copy link
Collaborator

agramfort commented Jul 27, 2020 via email

@gemeinl
Copy link
Collaborator Author

gemeinl commented Jul 29, 2020

I totally see your points. @robintibor will be back on monday, we will then hopefully discuss all the PRs and issues.
Thank you very much for your valuable input @agramfort! I appreaciate it!

@robintibor
Copy link
Contributor

Ok yes in the end maybe we settle towards our main example:

  • starting from X,y
  • explicitly going through mne (like creating raws etc.) in the example (noting this can be skipped if one already has mne data)
  • then starting all braindecode from here

@robintibor
Copy link
Contributor

closing this as it has diverged a bit much from current code, still very useful for addressing #544

@robintibor robintibor closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default sfreq in create_from_X_y Import Error when importing create_from_X_y
3 participants