-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Unifying preprocessors #197
Conversation
The tests failed because the examples have not been updated yet, and there is no updated version check for MNE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far so good !
@sbbrandt You also had something in this direction? How do you feel about this implementation? Is it compatible with what you had? |
@robintibor You mean the work on the data augmentation issue? There I used a 'DataLoader' to manipulate the data during the training. That wouldn't be affected. The proposed data augmentation feature by @Simon-Free worked on the data set level and could need some updates after this PR. Beside that I think unifying the preprocessors is good and I like the implementation. |
@robintibor @sbbrandt what do you guys think about going back to the old way of defining preprocessing steps with an |
04b931f
to
ad4f9fc
Compare
braindecode/datasets/tuh.py
Outdated
def read_all_file_names(directory, extension): | ||
"""Read all files with specified extension from given path and sorts them | ||
based on a given sorting key. | ||
|
||
Parameters | ||
---------- | ||
directory: str | ||
parent directory to be searched for files of the specified type | ||
extension: str | ||
file extension, i.e. ".edf" or ".txt" | ||
|
||
Returns | ||
------- | ||
file_paths: list(str) | ||
a list to all files found in (sub)directories of path | ||
""" | ||
assert extension.startswith(".") | ||
file_paths = glob.glob(directory + "**/*" + extension, recursive=True) | ||
assert len(file_paths) > 0, ( | ||
f"something went wrong. Found no {extension} files in {directory}") | ||
return file_paths | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a change in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I had to make this change for something else I was working on and I forgot to keep the change out of the commit. Essentially this read_all_file_names
function should not be specific to the TUH dataset and in fact I reused it for another dataset class I was working on. That's why I've moved it to braindecode.util
. Should I move this to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this function currently being used anywhere else but in tuh.py? In the TUH EEG Corpus PR it will currently be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, it's only used in tuh.py for now. We should leave it in this PR for the tests, but we can remove it after your PR @gemeinl ?
Do you think you can make it backwards-compatible with deprecation notice inside constructor of MNE/NumpyPreproc @hubertjb ? Otherwise too many things will break I feel. What do others feel about this preprocessing API change @gemeinl @sliwy ? |
Having one unified Preprocessor sounds great. Regarding the OrderedDict, I would prefer to use explicit Preprocessors definition - to me it's clearer and easier to understand. Generally, regarding the API, our Preprocessors do similar job to torchvision transforms (except we apply Preprocessors to whole datasets instead of tensors). I would prefer to have similar API with something close to torchvision.transforms.Compose, e.g.: from barindecode.preprocessors import Compose, Preprocessor
preprocessors = Compose([
Preprocessor('pick_types', eeg=True, meg=False, stim=False),
Preprocessor(scale, factor=factor)
])
preprocessors(ds) It is similar to what people can already know from torchvision (when it comes to way of creating and applying) and it allows easily applying preprocessors without need to import |
@robintibor Thanks for the feedback! Very good point about OrderedDict, I had forgotten about that problem. I'll take it out of the PR. I'll see how I can make a deprecation notice for @sliwy I really like the idea of a |
- fixing test_preprocess and adding new relevant tests - some formatting
- cutting down the number of training epochs in relative positioning example to speed up the CI
…d modifying tests and examples accordingly - adding deprecation notice for MNEPreproc and NumpyPreproc
ad4f9fc
to
225722b
Compare
@robintibor The tests have passed, let me know if you have any additional comments! |
great thanks merged |
don't forget to add line in what's new. after lukas PR closed I will also merge that one and than we can always directly add inside the PR @hubertjb |
Following the discussion in #160, I took a shot at unifying the preprocessors now that
apply_function
is available for theEpochs
object too (mne-tools/mne-python#9235).This is still work in progress, and I wanted to hear what @gemeinl @robintibor @agramfort think before going further. Notably, I made the choice to keep the following three cases possible:
fn
is a string -> the corresponding method of Raw or Epochs is usedfn
is a callable andapply_on_array
is True -> theapply_function
method of Raw or Epochs is used to apply the callable on the underlying numpy arrayfn
is a callable andapply_on_array
is False -> the callable must modify the Raw or Epochs object directly (e.g. by modifying its_data
attribute inplace or calling its methods)I'm not sure
apply_on_array
is the best choice of name for this. Also, although case 3 seems like it might enable bad practice in some cases, it was required to allow the existingpreprocess.filterbank
function to work and so I included it.The following are still missing:
fn
,kwargs
) instead of a list of preprocessor objects.