Fix metadata across 45 datasets based on paper validation#1001
Fix metadata across 45 datasets based on paper validation#1001bruAristimunha merged 5 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
moabb/moabb/datasets/thielen2021.py
Line 184 in 57aff95
Updating the metadata DOI here without retaining the previously tracked paper DOI leaves 10.1088/1741-2552/ab4057 (still referenced in the module comments/doc context) untracked, which is why test_docstring_dois_tracked[Thielen2021] now fails on this commit; the metadata should continue to carry that DOI (e.g., via associated_paper_doi) so DOI auditing remains consistent.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validated METADATA blocks for all 45 MOABB datasets against their original publications. Key corrections (~920 total): - Country codes: full names → ISO 3166-1 alpha-2 (41 datasets) - Preprocessing conflation: removed analysis pipeline steps listed as shared data state (38 datasets) - BCI application inflation: removed fabricated applications (36 datasets) - Acquisition reference: fixed CAR → correct hardware reference (19 datasets) - Software misattribution: removed analysis tools (EEGLAB, MATLAB, etc.) listed as acquisition software (21 datasets) - Auxiliary channels: removed fabricated GSR/PPG/EMG channels (16 datasets) - Hardware/electrode: removed fabricated materials and manufacturers Also includes: - BIDS export: use publication_year as fallback for missing meas_date - Validation tests for DOI format and metadata quality
57aff95 to
cea0f19
Compare
The unversioned figshare DOI (10.6084/m9.figshare.13123148) does not resolve via citeproc+json, causing test_dois_resolve[Stieger2021] and test_doi_cache_complete to fail. Use the versioned DOI (.v1) which resolves correctly, and add it to doi_cache.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n-fixes # Conflicts: # moabb/tests/doi_cache.json
|
26 dataset classes NOT changed (already correct or not yet validated): BI2015b, BNCI2014_001, BNCI2014_002, BNCI2014_008, BNCI2014_009, BNCI2015_001, BNCI2015_003, BNCI2015_004, BNCI2015_006, BNCI2015_007, BNCI2015_008, BNCI2015_009, BNCI2015_010, |
Summary
publication_yearfor missingmeas_dateMethodology
Each dataset was validated by an isolated agent with access to only:
.pyfileschema.py)Corrections were classified by confidence (HIGH/MEDIUM/LOW) and only HIGH and MEDIUM confidence corrections were applied. Individual validation reports are available in
moabb_tmp_folder/papers/validation_results/.Files changed
.pyfiles (metadata corrections)bids_interface.py(publication_year fallback)Test plan
test_datasets.pytests pass