Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk caching of preprocessing/transformation result #385

Closed
PierreGtch opened this issue May 26, 2023 · 5 comments · Fixed by #408
Closed

Disk caching of preprocessing/transformation result #385

PierreGtch opened this issue May 26, 2023 · 5 comments · Fixed by #408

Comments

@PierreGtch
Copy link
Collaborator

PierreGtch commented May 26, 2023

As mentioned in #367, it would be great to have the option to save on disk the results of computationally expensive preprocessing/transformations.

Such disk cache result should be unique for every combination of:

Bonus: save the preprocessed data in a BIDS format!

@PierreGtch
Copy link
Collaborator Author

It could be interesting to introduce a notion of paradigm hierarchy, for example the following would evaluate to true:

MotorImagery(events=['right_hand', 'left_hand']) <= MotorImagery(events=['right_hand', 'left_hand', 'feet']) 
# True

FilterBankMotorImagery(filters=[(8, 12), (12, 16)]) <= MotorImagery(fmin=1, fmax=40) 
# True (but we could have edge effects if we apply filters on epochs directly)

MotorImagery(channels=["C3",]) <= MotorImagery(channels=None) 
# tricky... a dataset can be without channel C3, even if we use all it's channels

The semantic of a<=b would be: a can be computed from b.
This way, for every new preprocessing we want to compute, if the result of a preprocessing higher in the hierarchy has already been cached, we could re-use it instead of loading again the raw signals.

@sylvchev
Copy link
Member

sylvchev commented Jun 1, 2023

Interesting. Let's discuss this in the BCI meeting.

@PierreGtch
Copy link
Collaborator Author

Yes looking forward to it!

@PierreGtch
Copy link
Collaborator Author

Another note: maybe we should save the preprocessed raws on disk because the expensive steps of the pre-processings are loading the data, applying the frequency filters, and resampling.
If we saved the raw data (already filtered and resampled), we could then read it with preload=False and the epoching would only load in memory the channels and events we need.

With this solution, each cached dataset would use more disk space but they would also be more general. Also, the BIDS format is only compatible with mne.Raw, not mne.Epochs (https://mne.tools/mne-bids/stable/index.html#supported-file-formats)

@sylvchev
Copy link
Member

sylvchev commented Jun 5, 2023

Ok, this is something to consider as preloading is mandatory for MOABB but it is a big limitation for huge datasets (like those that could use to train DL). Also, if we could have some clever approach that encompass BIDS format, this will really be nice, see #391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants