Disk caching of preprocessing/transformation result #385

PierreGtch · 2023-05-26T09:26:15Z

As mentioned in #367, it would be great to have the option to save on disk the results of computationally expensive preprocessing/transformations.

Such disk cache result should be unique for every combination of:

dataset code
dataset args
paradigm name
paradigm args
eventually transformer (see Allow passing fixed transformers to evaluations #367)

Bonus: save the preprocessed data in a BIDS format!

PierreGtch · 2023-05-27T14:57:56Z

It could be interesting to introduce a notion of paradigm hierarchy, for example the following would evaluate to true:

MotorImagery(events=['right_hand', 'left_hand']) <= MotorImagery(events=['right_hand', 'left_hand', 'feet']) 
# True

FilterBankMotorImagery(filters=[(8, 12), (12, 16)]) <= MotorImagery(fmin=1, fmax=40) 
# True (but we could have edge effects if we apply filters on epochs directly)

MotorImagery(channels=["C3",]) <= MotorImagery(channels=None) 
# tricky... a dataset can be without channel C3, even if we use all it's channels

The semantic of a<=b would be: a can be computed from b.
This way, for every new preprocessing we want to compute, if the result of a preprocessing higher in the hierarchy has already been cached, we could re-use it instead of loading again the raw signals.

sylvchev · 2023-06-01T23:36:02Z

Interesting. Let's discuss this in the BCI meeting.

PierreGtch · 2023-06-02T07:55:02Z

Yes looking forward to it!

PierreGtch · 2023-06-05T07:11:19Z

Another note: maybe we should save the preprocessed raws on disk because the expensive steps of the pre-processings are loading the data, applying the frequency filters, and resampling.
If we saved the raw data (already filtered and resampled), we could then read it with preload=False and the epoching would only load in memory the channels and events we need.

With this solution, each cached dataset would use more disk space but they would also be more general. Also, the BIDS format is only compatible with mne.Raw, not mne.Epochs (https://mne.tools/mne-bids/stable/index.html#supported-file-formats)

sylvchev · 2023-06-05T07:57:42Z

Ok, this is something to consider as preloading is mandatory for MOABB but it is a big limitation for huge datasets (like those that could use to train DL). Also, if we could have some clever approach that encompass BIDS format, this will really be nice, see #391

PierreGtch mentioned this issue May 26, 2023

Allow passing fixed transformers to evaluations #367

Closed

sylvchev added the enhancement label Jun 1, 2023

PierreGtch mentioned this issue Jun 23, 2023

Re-structuring the moabb core, implementing caching, creating tutorials, fixing some bugs and more #408

Merged

bruAristimunha closed this as completed in #408 Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk caching of preprocessing/transformation result #385

Disk caching of preprocessing/transformation result #385

PierreGtch commented May 26, 2023 •

edited

PierreGtch commented May 27, 2023

sylvchev commented Jun 1, 2023

PierreGtch commented Jun 2, 2023

PierreGtch commented Jun 5, 2023

sylvchev commented Jun 5, 2023

Disk caching of preprocessing/transformation result #385

Disk caching of preprocessing/transformation result #385

Comments

PierreGtch commented May 26, 2023 • edited

PierreGtch commented May 27, 2023

sylvchev commented Jun 1, 2023

PierreGtch commented Jun 2, 2023

PierreGtch commented Jun 5, 2023

sylvchev commented Jun 5, 2023

PierreGtch commented May 26, 2023 •

edited