Skip to content

[codex] Add explicit PE SOI path override for rebuild CLI#155

Draft
anth-volk wants to merge 2 commits into
mainfrom
codex/fix-147-soi-targets-fallback
Draft

[codex] Add explicit PE SOI path override for rebuild CLI#155
anth-volk wants to merge 2 commits into
mainfrom
codex/fix-147-soi-targets-fallback

Conversation

@anth-volk
Copy link
Copy Markdown
Contributor

@anth-volk anth-volk commented Jun 1, 2026

Fixes #147

Summary

  • Keeps the existing repo-based SOI route unchanged: policyengine_us_data/storage/soi.csv under --policyengine-us-data-repo.
  • Adds explicit soi_path plumbing through the default PE-US rebuild source-provider factory and checkpoint CLI (--soi-path).
  • Adds regression coverage that the source-provider factory forwards soi_path to the PUF provider.

Root Cause

The PUF PE-SOI uprating path expects policyengine_us_data/storage/soi.csv when only --policyengine-us-data-repo is provided. In the affected fresh-checkout run, that file was absent while the tracked calibration target file existed elsewhere. Rather than guessing alternate storage paths, callers can now supply the intended SOI file explicitly with --soi-path.

SOI Route Check

  • microplex-us default repo route: <policyengine-us-data repo>/policyengine_us_data/storage/soi.csv.
  • Current PE-US-data download_prerequisites.py does not list soi.csv; it downloads PUF/demographic/NP and geography prerequisites, not this SOI artifact.
  • In the local PE-US-data checkout, policyengine_us_data/storage/soi.csv exists but is not tracked; policyengine_us_data/storage/calibration_targets/soi_targets.csv is tracked. Use --soi-path for that alternate file when needed.

Validation

  • uv run --no-sync ruff check src/microplex_us/data_sources/puf.py src/microplex_us/pipelines/pe_us_data_rebuild.py src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py tests/test_puf_source_provider.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py
  • uv run --no-sync python -m py_compile src/microplex_us/data_sources/puf.py src/microplex_us/pipelines/pe_us_data_rebuild.py src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py tests/test_puf_source_provider.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py

Notes

@anth-volk anth-volk changed the title [codex] Resolve PE SOI targets from fresh data checkout [codex] Add explicit PE SOI path override for rebuild CLI Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PE-US rebuild smoke expects missing policyengine_us_data/storage/soi.csv

1 participant