Sanity check script #184

jsosulski · 2021-04-09T14:38:04Z

I just noticed in the BNCI2015003 dataset, the first two subjects have:

150 Target / 5250 NonTarget events

whereas the remaining ones have

300 Target / 1500 NonTarget events

I did not find any indication in the docstring why that may be the case.
Maybe we should create a dummy paradigm that just loads data without any filtering etc and use that in an example script to simply show subject/session/run-wise information for each dataset.

jsosulski · 2021-05-10T07:18:05Z

Example script.
This script would create a suite of 5 plots for each subject (pooled over sessions and additionally each session individually) using a somewhat default P300 preprocessing (0.5 Hz to 16 Hz BP-filtering). Note that this takes way too long for CI, however something like this could be useful for debugging / assessing datasets. One could also add a feature to this script that writes out the characteristics for each dataset, i.e. #channels, #stimuli, SNR, etc....

If someone wants to look at all sessions/subjects in detail, let me know, the plots are 150MB.

Occasional issues I found so far:

BNCI2015003: subjects 1 and 2 have 150 / 5250 Target / Non-target stimuli, whereas the remaining subjects have 300 / 1500. This is not documented. Subject 4 and 5 have very artifactual channels:
Subject 3

Subject 4
DemonsP300: no clear ERP discernible for most subjects. However, due to the low number of target stimuli (50) and dry/sponge (?) electrodes, I guess that is expected. E.g.:
Subject 45, this also shows that my chosen 0.5 Hz filtering of P300 is not sufficient for this dataset. However, I have some closed auditory word datasets, where highpass above 0.5 cuts out a lot of discriminative information due to late activations.
All datasets: Some sessions/subjects have very suspicious ERPs for specific channels (frontal mostly). This could mean that classification information is partly based on eye blinks.

sylvchev · 2021-06-01T15:09:51Z

This is a nice script. Indeed, it will be difficult to include it in the CI. Maybe we could have two level of sanity check :

a complete one, like you did, with plots and visual checks, that is not part of the CI
a short one, that extract the number of events from the MNE conversion of the downloaded datasets.

As both these methods required to download the whole datasets they are not suitable for CI (or with small and/or cached datasets). But it could be useful to have something like a describe (pandas-like) or info (MNE-style) method, that could be called for a dataset and summarize this information. What do you think?

jsosulski added the question label Apr 9, 2021

sylvchev mentioned this issue Jun 4, 2021

Utility for testing EEG data-cleaning pipelines? #193

Open

sylvchev added documentation enhancement labels Jun 4, 2021

jsosulski mentioned this issue Dec 1, 2021

DemonsP300 has only targets in the first third of data #216

Open

sylvchev mentioned this issue Jan 21, 2022

Creating a Global Benchmarking Pipeline and Results Page #190

Closed

Div12345 added this to Datasets in Benchmarking paper Jan 21, 2022

jsosulski mentioned this issue Feb 2, 2022

Visualize all ERP datasets #261

Merged

sylvchev closed this as completed in #261 Feb 21, 2022

Benchmarking paper automation moved this from Datasets to Done Feb 21, 2022

sylvchev mentioned this issue Dec 31, 2022

bi2013a not downloadable #309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanity check script #184

Sanity check script #184

jsosulski commented Apr 9, 2021

jsosulski commented May 10, 2021

sylvchev commented Jun 1, 2021

Sanity check script #184

Sanity check script #184

Comments

jsosulski commented Apr 9, 2021

jsosulski commented May 10, 2021

sylvchev commented Jun 1, 2021