Refactor: consolidate BIDS, add orchestration layer, extract longitudinal#267
Merged
Conversation
Move all BIDS-related code from scattered locations into a cohesive rbc.bids/ package and extract per-workflow export/resolve functions from CLI modules. Structural changes: - core/bids/ -> bids/builder.py (Bids class), bids/_schema.py (auto-generated) - core/bids2table.py -> bids/query.py (load_table, find_file, etc.) - New bids/anatomical.py, functional.py, metrics.py, qc.py with export_*() and resolve_*() functions extracted from CLI modules - Tests moved: test_bids.py, test_bids2table.py -> tests/unit/bids/ - New tests/unit/bids/test_exports.py with real Bids instance tests Bug fixes caught during refactor: - Atlas names with underscores (schaefer_200) caused BIDS validation errors; now sanitized via bids_safe_label() inside export functions - Regressor names with hyphens (36-parameter) had the same issue - QC resolve used .mat extension but functional export writes .txt The CLI modules are now thin orchestration layers that call shared export/resolve functions instead of duplicating BIDS naming logic.
Unifies the pattern across all resolve functions: resolve_qc already returned QCInputs (TypedDict), now resolve_functional returns FunctionalInputs and resolve_metrics returns MetricsInputs. This gives callers proper type safety when accessing resolved paths. Part of #259.
Moves SessionTables, load_session, iter_session_files, and the BIDS grouping constants (SUB_SES_QUERY, ANAT_GROUP_ENTITIES, FUNC_GROUP_ENTITIES) from cli/ into bids/session.py so that cli/ exclusively handles CLI concerns. Constants are renamed to drop the leading underscore since they are now public API in the bids package. Closes #259.
Adds discover_anatomical(), discover_functional(), and discover_derivative_runs() to the bids package, removing BIDS-specific iteration logic (entity extraction, row-to-path conversion, DataFrame filtering/grouping) from CLI modules. CLI modules now call discovery functions and receive structured NamedTuples (AnatomicalRun, FunctionalRun, DerivativeRun) instead of manually iterating DataFrames.
Creates src/rbc/orchestration/ with per-workflow run() functions that own the full pipeline loop: filtering, sub/ses iteration, discovery, processing, and export. CLI modules are now thin arg-parsing wrappers that delegate to orchestration.run(). New modules: - orchestration/__init__.py: Filters dataclass - orchestration/anatomical.py: run() + process_session() - orchestration/functional.py: run() + process_session() - orchestration/metrics.py: run() + process_run() - orchestration/qc.py: run() - orchestration/all.py: run() composing per-session stages - orchestration/longitudinal.py: run() + process_anat() + process_func() - bids/longitudinal.py: resolve + export for longitudinal workflow CLI modules now contain only: Args dataclass, main() (setup runner + call orchestration.run()), and register_command(). Closes #266.
Orchestration run() functions now handle runner setup (init_runner) and workflow start/complete logging. CLI modules are reduced to pure arg parsing: construct Filters + RunnerConfig, call run(), return 0. Introduces RunnerConfig dataclass and init_runner() in orchestration. Moves _DEFAULT_ENV_VARS from cli/__init__.py to orchestration/.
This was referenced Apr 3, 2026
kaitj
reviewed
Apr 6, 2026
Contributor
kaitj
left a comment
There was a problem hiding this comment.
A few questions, one small documentation change, and a few notes to revisit, otherwise largely looks good.
Comment on lines
+70
to
+89
| aex.save( | ||
| _require_file(outputs.brain_mask, "brain_mask"), | ||
| suffix=Suffix.MASK, | ||
| desc="T1w", | ||
| ) | ||
| aex.save( | ||
| _require_file(outputs.csf_mask, "csf_mask"), | ||
| suffix=Suffix.MASK, | ||
| desc="csf", | ||
| ) | ||
| aex.save( | ||
| _require_file(outputs.gm_mask, "gm_mask"), | ||
| suffix=Suffix.MASK, | ||
| desc="gm", | ||
| ) | ||
| aex.save( | ||
| _require_file(outputs.wm_mask, "wm_mask"), | ||
| suffix=Suffix.MASK, | ||
| desc="wm", | ||
| ) |
Contributor
There was a problem hiding this comment.
We should revisit if these files are still needed. I think this was originally included because they were needed to compute regressors in template space (which isn't the case anymore I think).
| func_q: Bids, | ||
| tpl_q: Bids, | ||
| func_df: pl.DataFrame, | ||
| tpl_df: pl.DataFrame, |
Contributor
There was a problem hiding this comment.
More I look at this, I wonder if we should rename this variable to make distinct "standard" templates (e.g. MNI, OASIS, etc.) vs "longitudinal template" or is that more confusing?
| extension=".txt", | ||
| extra={"from": "bold", "to": "T1w", "mode": "image"}, | ||
| ), | ||
| "sbref": func_q.expect(func_df, suffix=Suffix.SBREF, without=["space"]), |
Contributor
There was a problem hiding this comment.
Oh cool, first time I noticed the without argument 🚀
kaitj
approved these changes
Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Major architecture refactor that introduces clean separation of concerns across four layers:
bids/- BIDS naming contracts (discover, resolve, export per workflow)orchestration/- Pipeline loops (filter, iterate, discover -> process -> export)workflows/- Processing step chains (computation only)cli/- Arg parsing only (delegates to orchestration)What changed
New
rbc.bids/package (moved fromcore/bids/+core/bids2table.py+cli/query.py):builder.py- Bids class_schema.py- auto-generated entity validationquery.py- bids2table wrapperssession.py- session loading, iteration, grouping constantsanatomical.py- discover + exportfunctional.py- discover + resolve + export, withFunctionalInputsTypedDictmetrics.py- resolve + export, withMetricsInputsTypedDictqc.py- resolve + export, withQCInputsTypedDictlongitudinal.py- resolve + export for longitudinal workflowsNew
rbc.orchestration/package:__init__.py-Filtersdataclass (participant/session/task filters)anatomical.py-run()+process_session()functional.py-run()+process_session()(returns outputs for downstream use)metrics.py-run()+process_run()qc.py-run()all.py-run()composing per-session stages with in-memory output passinglongitudinal.py-run()+process_anat()+process_func()Simplified CLI modules: Each CLI is now ~40-60 lines (Args dataclass +
main()that callsorchestration.run()+register_command()). Zero BIDS or workflow logic.Deleted
src/rbc/core/bids/(entire package)src/rbc/core/bids2table.pysrc/rbc/cli/query.pyBugs fixed
schaefer_200) caused BIDS validation errors36-parameter) caused the same issue.matextension but functional export writes.txtTests
test_bids.py,test_bids2table.py,test_query.pytotests/unit/bids/tests/unit/bids/test_exports.py(11 tests with real Bids instances)Closes #259, closes #266.