Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanity check script #184

Closed
jsosulski opened this issue Apr 9, 2021 · 2 comments · Fixed by #261
Closed

Sanity check script #184

jsosulski opened this issue Apr 9, 2021 · 2 comments · Fixed by #261

Comments

@jsosulski
Copy link
Collaborator

I just noticed in the BNCI2015003 dataset, the first two subjects have:

150 Target / 5250 NonTarget events

whereas the remaining ones have

300 Target / 1500 NonTarget events

I did not find any indication in the docstring why that may be the case.
Maybe we should create a dummy paradigm that just loads data without any filtering etc and use that in an example script to simply show subject/session/run-wise information for each dataset.

@jsosulski
Copy link
Collaborator Author

Example script.
This script would create a suite of 5 plots for each subject (pooled over sessions and additionally each session individually) using a somewhat default P300 preprocessing (0.5 Hz to 16 Hz BP-filtering). Note that this takes way too long for CI, however something like this could be useful for debugging / assessing datasets. One could also add a feature to this script that writes out the characteristics for each dataset, i.e. #channels, #stimuli, SNR, etc....

If someone wants to look at all sessions/subjects in detail, let me know, the plots are 150MB.

Occasional issues I found so far:

  • BNCI2015003: subjects 1 and 2 have 150 / 5250 Target / Non-target stimuli, whereas the remaining subjects have 300 / 1500. This is not documented. Subject 4 and 5 have very artifactual channels:
    Subject 3
    butterflyplot
    Subject 4
    butterflyplot

  • DemonsP300: no clear ERP discernible for most subjects. However, due to the low number of target stimuli (50) and dry/sponge (?) electrodes, I guess that is expected. E.g.:
    Subject 45, this also shows that my chosen 0.5 Hz filtering of P300 is not sufficient for this dataset. However, I have some closed auditory word datasets, where highpass above 0.5 cuts out a lot of discriminative information due to late activations.
    butterflyplot

  • All datasets: Some sessions/subjects have very suspicious ERPs for specific channels (frontal mostly). This could mean that classification information is partly based on eye blinks.

@sylvchev
Copy link
Member

sylvchev commented Jun 1, 2021

This is a nice script. Indeed, it will be difficult to include it in the CI. Maybe we could have two level of sanity check :

  • a complete one, like you did, with plots and visual checks, that is not part of the CI
  • a short one, that extract the number of events from the MNE conversion of the downloaded datasets.

As both these methods required to download the whole datasets they are not suitable for CI (or with small and/or cached datasets). But it could be useful to have something like a describe (pandas-like) or info (MNE-style) method, that could be called for a dataset and summarize this information. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants