Fix metadata across 45 datasets based on paper validation by bruAristimunha · Pull Request #1001 · NeuroTechX/moabb

bruAristimunha · 2026-02-28T17:45:40Z

Summary

Validated METADATA blocks for all 45 MOABB datasets against their original publications (~920 corrections total)
Fixed systematic hallucination patterns across all datasets:
- Country codes (41 datasets): full names → ISO 3166-1 alpha-2
- Preprocessing conflation (38 datasets): removed analysis pipeline steps incorrectly listed as shared data state
- BCI application inflation (36 datasets): removed fabricated/aspirational applications
- Feedback type confusion (28 datasets): corrected cues listed as feedback
- Software misattribution (21 datasets): removed analysis tools (EEGLAB, MATLAB, FieldTrip) listed as acquisition software
- Acquisition reference (19 datasets): fixed CAR → correct hardware reference (CMS/DRL for BioSemi, named electrodes for others)
- Auxiliary channel fabrication (16 datasets): removed fabricated GSR/PPG/EMG channels
Added BIDS export fallback using publication_year for missing meas_date
Added validation tests for DOI format and metadata quality

Methodology

Each dataset was validated by an isolated agent with access to only:

The dataset .py file
The original publication PDF
The schema definition (schema.py)
AlexMI as a gold-standard reference

Corrections were classified by confidence (HIGH/MEDIUM/LOW) and only HIGH and MEDIUM confidence corrections were applied. Individual validation reports are available in moabb_tmp_folder/papers/validation_results/.

Files changed

34 dataset .py files (metadata corrections)
bids_interface.py (publication_year fallback)
2 new test files (DOI validation, BIDS enrichment tests)

Test plan

All 229 test_datasets.py tests pass
Pre-commit hooks pass (black, ruff, codespell, isort)
CI pipeline validation

chatgpt-codex-connector

💡 Codex Review

moabb/moabb/datasets/thielen2021.py

Line 184 in 57aff95

doi="10.1088/1741-2552/abecef",

Preserve Thielen2021 paper DOI linkage in metadata

Updating the metadata DOI here without retaining the previously tracked paper DOI leaves 10.1088/1741-2552/ab4057 (still referenced in the module comments/doc context) untracked, which is why test_docstring_dois_tracked[Thielen2021] now fails on this commit; the metadata should continue to carry that DOI (e.g., via associated_paper_doi) so DOI auditing remains consistent.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

moabb/datasets/stieger2021.py

Validated METADATA blocks for all 45 MOABB datasets against their original publications. Key corrections (~920 total): - Country codes: full names → ISO 3166-1 alpha-2 (41 datasets) - Preprocessing conflation: removed analysis pipeline steps listed as shared data state (38 datasets) - BCI application inflation: removed fabricated applications (36 datasets) - Acquisition reference: fixed CAR → correct hardware reference (19 datasets) - Software misattribution: removed analysis tools (EEGLAB, MATLAB, etc.) listed as acquisition software (21 datasets) - Auxiliary channels: removed fabricated GSR/PPG/EMG channels (16 datasets) - Hardware/electrode: removed fabricated materials and manufacturers Also includes: - BIDS export: use publication_year as fallback for missing meas_date - Validation tests for DOI format and metadata quality

The unversioned figshare DOI (10.6084/m9.figshare.13123148) does not resolve via citeproc+json, causing test_dois_resolve[Stieger2021] and test_doi_cache_complete to fail. Use the versioned DOI (.v1) which resolves correctly, and add it to doi_cache.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n-fixes # Conflicts: # moabb/tests/doi_cache.json

bruAristimunha · 2026-03-01T01:54:27Z

26 dataset classes NOT changed (already correct or not yet validated):

BI2015b, BNCI2014_001, BNCI2014_002, BNCI2014_008, BNCI2014_009, BNCI2015_001, BNCI2015_003, BNCI2015_004, BNCI2015_006, BNCI2015_007, BNCI2015_008, BNCI2015_009, BNCI2015_010,
BNCI2016_002, BNCI2019_001, BNCI2020_001, BNCI2020_002, BNCI2022_001, BNCI2024_001, BNCI2025_001, BNCI2025_002, Dreyer2023A, Dreyer2023B, Dreyer2023C, MAMEM3, PhysionetMI

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

moabb/datasets/stieger2021.py Outdated Show resolved Hide resolved

bruAristimunha added 2 commits February 28, 2026 21:05

Add whats_new entry for metadata validation fixes (#1001)

cea0f19

bruAristimunha force-pushed the metadata-validation-fixes branch from 57aff95 to cea0f19 Compare February 28, 2026 23:49

bruAristimunha and others added 3 commits March 1, 2026 01:06

Add Stieger2021 Figshare DOI to doi_cache.json

6e961eb

Merge remote-tracking branch 'origin/develop' into metadata-validatio…

2992368

…n-fixes # Conflicts: # moabb/tests/doi_cache.json

bruAristimunha merged commit 74910fb into develop Mar 1, 2026
14 checks passed

bruAristimunha deleted the metadata-validation-fixes branch March 1, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix metadata across 45 datasets based on paper validation#1001

Fix metadata across 45 datasets based on paper validation#1001
bruAristimunha merged 5 commits intodevelopfrom
metadata-validation-fixes

bruAristimunha commented Feb 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

bruAristimunha commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bruAristimunha commented Feb 28, 2026

Summary

Methodology

Files changed

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

bruAristimunha commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant