-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve create_from_X_y #148
Conversation
…rther preprocessing on trial level, the second directly returning windows.
-fixed examples
|
||
Returns | ||
------- | ||
windows_datasets: BaseConcatDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why windows_datasets of type BaseConcatDataset and not WindowsDataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So probably naming is the problem here. In braindecode, a BaseDataset
is a wrapper around a single mne.Raw
. Similarly, a WindowsDataset
is a wrapper around a single mne.Epochs
. Both can be concatenated using the BaseConcatDataset
class. So, since here multiple WindowsDatasets
are returned, it is actually a BaseConcatDataset
of WindowsDataset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation. In mne when you concat raw you still get raw and if you concat epochs you still get epochs. Is it really necessary to have another type of dataset in the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the dataset part we tried to stick to the PyTorch API as closely as possible. Our BaseDataset
inherits from torch.utils.data.dataset.Dataset
and our BaseConcatDataset
inherits from torch.utils.data.dataset.ConcatDataset
, see https://pytorch.org/docs/stable/data.html#torch.utils.data.ConcatDataset. But we can always discuss with @robintibor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then I would just propose to update the docstring:
windows_datasets: BaseConcatDataset | |
concat_dataset: BaseConcatDataset |
@@ -102,7 +102,8 @@ Data Utils | |||
.. autosummary:: | |||
:toctree: generated/ | |||
|
|||
create_from_X_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's an API breakage without deprecation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort Is it fine to use mne.utils.deprecated
to add a DeprecationWarning
? Or is this confusing, since the function will then appear to be a mne and not a braindecode function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would copy such a util function in the braindecode somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks! I will discuss with @robintibor about where to put it and how to do it.
@@ -103,6 +103,8 @@ Data Utils | |||
:toctree: generated/ | |||
|
|||
create_from_X_y | |||
create_trials_from_X_y | |||
create_windows_from_X_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to keep a single create_from_X_y
and then internally, dependings with items of X have the same length use windows or base? Basically as a user do I need to care of I can just pass the chunks of EEG data I have an the models will just deal with them? if you expect some breakage as some models only work with WindowsDataset it's another story but if you expect models to 'just work' maybe we can avoid some complexity on the user side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will try to explain what's going on. We just had a usecase (see #136) where a user had a dataset in Xy format, however, it was not preprocessed. So he wanted to convert it into braindecode format to be able to use the available braindecode/mne preprocessing. So far, the create_from_X_y
function assumed data was already preprocessed and directly transformed it into compute windows for decoding. So it did not allow for preprocessing. I suggested something similar to you (see #136 (comment)), @robintibor did not agree. It is still undecided / open for discussion.
to be honest rather than making complex the API of brain decode I would
have written an example
or referred the user to mne documentation to obtain raw instances and do
his preprocessing
properly. My worry is how are users going to discover the use cases for
these functions?
the more you have different ways to do the same thing the harder it is to
document it.
If there is only one prefered way it may force you to do an extra step but
at least all
scripts will look similar among users.
… |
I totally see your points. @robintibor will be back on monday, we will then hopefully discuss all the PRs and issues. |
Ok yes in the end maybe we settle towards our main example:
|
closing this as it has diverged a bit much from current code, still very useful for addressing #544 |
split create_from_X_y into two functions, one returning BaseDatasets for further preprocessing on trial level, the second directly returning WindowsDatasets.