[MRG] Unifying preprocessors #197

hubertjb · 2021-04-07T17:00:49Z

Following the discussion in #160, I took a shot at unifying the preprocessors now that apply_function is available for the Epochs object too (mne-tools/mne-python#9235).

This is still work in progress, and I wanted to hear what @gemeinl @robintibor @agramfort think before going further. Notably, I made the choice to keep the following three cases possible:

fn is a string -> the corresponding method of Raw or Epochs is used
fn is a callable and apply_on_array is True -> the apply_function method of Raw or Epochs is used to apply the callable on the underlying numpy array
fn is a callable and apply_on_array is False -> the callable must modify the Raw or Epochs object directly (e.g. by modifying its _data attribute inplace or calling its methods)

I'm not sure apply_on_array is the best choice of name for this. Also, although case 3 seems like it might enable bad practice in some cases, it was required to allow the existing preprocess.filterbank function to work and so I included it.

The following are still missing:

I haven't modified the examples yet.
Version check for the latest MNE main (commit 5cc6eb8d7ebd102049843336d9d4a342f0f3e966 from 2021-04-05, which is not in the latest release v0.22.1)
If we decide to go with this new unified preprocessor object we might be able to go back to defining the preprocessing steps inside an OrderedDict as pairs (fn, kwargs) instead of a list of preprocessor objects.

hubertjb · 2021-04-07T20:02:08Z

The tests failed because the examples have not been updated yet, and there is no updated version check for MNE.

agramfort

so far so good !

braindecode/datautil/preprocess.py

robintibor · 2021-04-12T12:23:39Z

@sbbrandt You also had something in this direction? How do you feel about this implementation? Is it compatible with what you had?

sbbrandt · 2021-04-12T14:50:52Z

@sbbrandt You also had something in this direction? How do you feel about this implementation? Is it compatible with what you had?

@robintibor You mean the work on the data augmentation issue? There I used a 'DataLoader' to manipulate the data during the training. That wouldn't be affected. The proposed data augmentation feature by @Simon-Free worked on the data set level and could need some updates after this PR.

Beside that I think unifying the preprocessors is good and I like the implementation.

hubertjb · 2021-04-13T18:57:48Z

@robintibor @sbbrandt what do you guys think about going back to the old way of defining preprocessing steps with an OrderedDict? I like it because it simplifies the API (the user doesn't have to understand another object).

robintibor · 2021-04-13T16:27:14Z

braindecode/datasets/tuh.py

-def read_all_file_names(directory, extension):
-    """Read all files with specified extension from given path and sorts them
-    based on a given sorting key.
-
-    Parameters
-    ----------
-    directory: str
-        parent directory to be searched for files of the specified type
-    extension: str
-        file extension, i.e. ".edf" or ".txt"
-
-    Returns
-    -------
-    file_paths: list(str)
-        a list to all files found in (sub)directories of path
-    """
-    assert extension.startswith(".")
-    file_paths = glob.glob(directory + "**/*" + extension, recursive=True)
-    assert len(file_paths) > 0, (
-        f"something went wrong. Found no {extension} files in {directory}")
-    return file_paths
-
-


Why is there a change in this file?

Good point, I had to make this change for something else I was working on and I forgot to keep the change out of the commit. Essentially this read_all_file_names function should not be specific to the TUH dataset and in fact I reused it for another dataset class I was working on. That's why I've moved it to braindecode.util. Should I move this to a separate PR?

So is this function currently being used anywhere else but in tuh.py? In the TUH EEG Corpus PR it will currently be removed.

That's correct, it's only used in tuh.py for now. We should leave it in this PR for the tests, but we can remove it after your PR @gemeinl ?

robintibor · 2021-05-12T11:47:42Z

Do you think you can make it backwards-compatible with deprecation notice inside constructor of MNE/NumpyPreproc @hubertjb ? Otherwise too many things will break I feel. What do others feel about this preprocessing API change @gemeinl @sliwy ?
regarding ordered dict, I am more against it, you will have to learn the structure then anyways what the different fields mean, to me object just more explicit about it... also remember last time we had the bug-generating issue that of course if you have twice the same key in the ordereddict only one will exist... but against it in any case :)

sliwy · 2021-05-12T14:07:17Z

Having one unified Preprocessor sounds great.

Regarding the OrderedDict, I would prefer to use explicit Preprocessors definition - to me it's clearer and easier to understand.

Generally, regarding the API, our Preprocessors do similar job to torchvision transforms (except we apply Preprocessors to whole datasets instead of tensors). I would prefer to have similar API with something close to torchvision.transforms.Compose, e.g.:

from barindecode.preprocessors import Compose, Preprocessor
preprocessors = Compose([
        Preprocessor('pick_types', eeg=True, meg=False, stim=False),
        Preprocessor(scale, factor=factor)
])
preprocessors(ds)

It is similar to what people can already know from torchvision (when it comes to way of creating and applying) and it allows easily applying preprocessors without need to import preprocess function. Created preprocessors object can be next saved and as it contains all the needed tools/transforms to preprocess dataset, it can be used for other datasets without need to import preprocess function. I am not sure if we need this additional layer of abstraction with Compose class but in the end now we have it as well in a form of preprocess function. If you like the idea, maybe we should change it later and not in this PR just to not block the release.

hubertjb · 2021-05-12T21:31:33Z

@robintibor Thanks for the feedback! Very good point about OrderedDict, I had forgotten about that problem. I'll take it out of the PR. I'll see how I can make a deprecation notice for MNEPreproc and NumpyPreproc too.

@sliwy I really like the idea of a Compose object for preprocessors, it would be a lot more consistent with on-the-fly transforms. I agree with you, maybe let's keep this for a future PR?

- fixing test_preprocess and adding new relevant tests - some formatting

- cutting down the number of training epochs in relative positioning example to speed up the CI

…d modifying tests and examples accordingly - adding deprecation notice for MNEPreproc and NumpyPreproc

…Epochs objects)

hubertjb · 2021-05-13T13:52:26Z

@robintibor The tests have passed, let me know if you have any additional comments!

robintibor · 2021-05-15T10:14:33Z

great thanks merged

robintibor · 2021-05-15T10:15:21Z

don't forget to add line in what's new. after lukas PR closed I will also merge that one and than we can always directly add inside the PR @hubertjb

hubertjb requested review from agramfort, robintibor and gemeinl April 7, 2021 17:00

agramfort reviewed Apr 7, 2021

View reviewed changes

braindecode/datautil/preprocess.py Outdated Show resolved Hide resolved

hubertjb force-pushed the unifying-preprocs branch from 04b931f to ad4f9fc Compare April 15, 2021 17:07

hubertjb changed the title ~~[WIP] Unifying preprocessors~~ [MRG] Unifying preprocessors Apr 19, 2021

robintibor requested changes Apr 26, 2021

View reviewed changes

hubertjb added 8 commits May 12, 2021 17:53

cleaning preprocessors

fe15da3

- combining both preprocessors into a single Preprocessor class

fe4335f

- fixing test_preprocess and adding new relevant tests - some formatting

fixing tests

6ac9463

adding back functionality to define preprocessors with OrderedDicts

7d4e5be

- updating examples to use the new preprocessing API

d2d075d

- cutting down the number of training epochs in relative positioning example to speed up the CI

specifying mne commit number with apply_function method in mne.Epochs

65ad003

speeding up preprocessing unit tests

f3a2d7e

- removing support for specifying preprocessors with OrderedDicts, an…

225722b

…d modifying tests and examples accordingly - adding deprecation notice for MNEPreproc and NumpyPreproc

hubertjb force-pushed the unifying-preprocs branch from ad4f9fc to 225722b Compare May 13, 2021 00:47

- going back to main MNE release (0.23 has the apply_func method for …

2f71a5f

…Epochs objects)

robintibor approved these changes May 15, 2021

View reviewed changes

robintibor merged commit 7b29379 into braindecode:master May 15, 2021

hubertjb deleted the unifying-preprocs branch May 16, 2021 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Unifying preprocessors #197

[MRG] Unifying preprocessors #197

hubertjb commented Apr 7, 2021

hubertjb commented Apr 7, 2021

agramfort left a comment

robintibor commented Apr 12, 2021

sbbrandt commented Apr 12, 2021

hubertjb commented Apr 13, 2021

robintibor Apr 13, 2021

hubertjb Apr 26, 2021

gemeinl Apr 28, 2021

hubertjb Apr 28, 2021

robintibor commented May 12, 2021

sliwy commented May 12, 2021

hubertjb commented May 12, 2021

hubertjb commented May 13, 2021

robintibor commented May 15, 2021 •

edited

Loading

robintibor commented May 15, 2021

[MRG] Unifying preprocessors #197

[MRG] Unifying preprocessors #197

Conversation

hubertjb commented Apr 7, 2021

hubertjb commented Apr 7, 2021

agramfort left a comment

Choose a reason for hiding this comment

robintibor commented Apr 12, 2021

sbbrandt commented Apr 12, 2021

hubertjb commented Apr 13, 2021

robintibor Apr 13, 2021

Choose a reason for hiding this comment

hubertjb Apr 26, 2021

Choose a reason for hiding this comment

gemeinl Apr 28, 2021

Choose a reason for hiding this comment

hubertjb Apr 28, 2021

Choose a reason for hiding this comment

robintibor commented May 12, 2021

sliwy commented May 12, 2021

hubertjb commented May 12, 2021

hubertjb commented May 13, 2021

robintibor commented May 15, 2021 • edited Loading

robintibor commented May 15, 2021

robintibor commented May 15, 2021 •

edited

Loading