Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Unifying preprocessors #197

Merged
merged 9 commits into from
May 15, 2021

Conversation

hubertjb
Copy link
Collaborator

@hubertjb hubertjb commented Apr 7, 2021

Following the discussion in #160, I took a shot at unifying the preprocessors now that apply_function is available for the Epochs object too (mne-tools/mne-python#9235).

This is still work in progress, and I wanted to hear what @gemeinl @robintibor @agramfort think before going further. Notably, I made the choice to keep the following three cases possible:

  1. fn is a string -> the corresponding method of Raw or Epochs is used
  2. fn is a callable and apply_on_array is True -> the apply_function method of Raw or Epochs is used to apply the callable on the underlying numpy array
  3. fn is a callable and apply_on_array is False -> the callable must modify the Raw or Epochs object directly (e.g. by modifying its _data attribute inplace or calling its methods)

I'm not sure apply_on_array is the best choice of name for this. Also, although case 3 seems like it might enable bad practice in some cases, it was required to allow the existing preprocess.filterbank function to work and so I included it.

The following are still missing:

  • I haven't modified the examples yet.
  • Version check for the latest MNE main (commit 5cc6eb8d7ebd102049843336d9d4a342f0f3e966 from 2021-04-05, which is not in the latest release v0.22.1)
  • If we decide to go with this new unified preprocessor object we might be able to go back to defining the preprocessing steps inside an OrderedDict as pairs (fn, kwargs) instead of a list of preprocessor objects.

@hubertjb
Copy link
Collaborator Author

hubertjb commented Apr 7, 2021

The tests failed because the examples have not been updated yet, and there is no updated version check for MNE.

Copy link
Collaborator

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far so good !

braindecode/datautil/preprocess.py Outdated Show resolved Hide resolved
@robintibor
Copy link
Contributor

@sbbrandt You also had something in this direction? How do you feel about this implementation? Is it compatible with what you had?

@sbbrandt
Copy link
Collaborator

@sbbrandt You also had something in this direction? How do you feel about this implementation? Is it compatible with what you had?

@robintibor You mean the work on the data augmentation issue? There I used a 'DataLoader' to manipulate the data during the training. That wouldn't be affected. The proposed data augmentation feature by @Simon-Free worked on the data set level and could need some updates after this PR.

Beside that I think unifying the preprocessors is good and I like the implementation.

@hubertjb
Copy link
Collaborator Author

@robintibor @sbbrandt what do you guys think about going back to the old way of defining preprocessing steps with an OrderedDict? I like it because it simplifies the API (the user doesn't have to understand another object).

@hubertjb hubertjb force-pushed the unifying-preprocs branch from 04b931f to ad4f9fc Compare April 15, 2021 17:07
@hubertjb hubertjb changed the title [WIP] Unifying preprocessors [MRG] Unifying preprocessors Apr 19, 2021
Comment on lines 106 to 213
def read_all_file_names(directory, extension):
"""Read all files with specified extension from given path and sorts them
based on a given sorting key.

Parameters
----------
directory: str
parent directory to be searched for files of the specified type
extension: str
file extension, i.e. ".edf" or ".txt"

Returns
-------
file_paths: list(str)
a list to all files found in (sub)directories of path
"""
assert extension.startswith(".")
file_paths = glob.glob(directory + "**/*" + extension, recursive=True)
assert len(file_paths) > 0, (
f"something went wrong. Found no {extension} files in {directory}")
return file_paths


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a change in this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I had to make this change for something else I was working on and I forgot to keep the change out of the commit. Essentially this read_all_file_names function should not be specific to the TUH dataset and in fact I reused it for another dataset class I was working on. That's why I've moved it to braindecode.util. Should I move this to a separate PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this function currently being used anywhere else but in tuh.py? In the TUH EEG Corpus PR it will currently be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, it's only used in tuh.py for now. We should leave it in this PR for the tests, but we can remove it after your PR @gemeinl ?

@robintibor
Copy link
Contributor

Do you think you can make it backwards-compatible with deprecation notice inside constructor of MNE/NumpyPreproc @hubertjb ? Otherwise too many things will break I feel. What do others feel about this preprocessing API change @gemeinl @sliwy ?
regarding ordered dict, I am more against it, you will have to learn the structure then anyways what the different fields mean, to me object just more explicit about it... also remember last time we had the bug-generating issue that of course if you have twice the same key in the ordereddict only one will exist... but against it in any case :)

@sliwy
Copy link
Collaborator

sliwy commented May 12, 2021

Having one unified Preprocessor sounds great.

Regarding the OrderedDict, I would prefer to use explicit Preprocessors definition - to me it's clearer and easier to understand.

Generally, regarding the API, our Preprocessors do similar job to torchvision transforms (except we apply Preprocessors to whole datasets instead of tensors). I would prefer to have similar API with something close to torchvision.transforms.Compose, e.g.:

from barindecode.preprocessors import Compose, Preprocessor
preprocessors = Compose([
        Preprocessor('pick_types', eeg=True, meg=False, stim=False),
        Preprocessor(scale, factor=factor)
])
preprocessors(ds)

It is similar to what people can already know from torchvision (when it comes to way of creating and applying) and it allows easily applying preprocessors without need to import preprocess function. Created preprocessors object can be next saved and as it contains all the needed tools/transforms to preprocess dataset, it can be used for other datasets without need to import preprocess function. I am not sure if we need this additional layer of abstraction with Compose class but in the end now we have it as well in a form of preprocess function. If you like the idea, maybe we should change it later and not in this PR just to not block the release.

@hubertjb
Copy link
Collaborator Author

@robintibor Thanks for the feedback! Very good point about OrderedDict, I had forgotten about that problem. I'll take it out of the PR. I'll see how I can make a deprecation notice for MNEPreproc and NumpyPreproc too.

@sliwy I really like the idea of a Compose object for preprocessors, it would be a lot more consistent with on-the-fly transforms. I agree with you, maybe let's keep this for a future PR?

@hubertjb hubertjb force-pushed the unifying-preprocs branch from ad4f9fc to 225722b Compare May 13, 2021 00:47
@hubertjb
Copy link
Collaborator Author

@robintibor The tests have passed, let me know if you have any additional comments!

@robintibor robintibor merged commit 7b29379 into braindecode:master May 15, 2021
@robintibor
Copy link
Contributor

robintibor commented May 15, 2021

great thanks merged

@robintibor
Copy link
Contributor

don't forget to add line in what's new. after lukas PR closed I will also merge that one and than we can always directly add inside the PR @hubertjb

@hubertjb hubertjb deleted the unifying-preprocs branch May 16, 2021 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants