Skip to content

Add compact PolicyEngine dataset export#31

Merged
MaxGhenis merged 1 commit into
mainfrom
codex/mp-compact-export
May 28, 2026
Merged

Add compact PolicyEngine dataset export#31
MaxGhenis merged 1 commit into
mainfrom
codex/mp-compact-export

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • add a supported CLI to compact PolicyEngine H5 datasets by keeping the largest household weights
  • preserve linked person/tax-unit/SPM/family/marital-unit rows and optionally rescale household weights to the source total
  • cover entity filtering, external weight selection, CLI summary output, and mismatched-weight validation

Validation

  • uv run --python 3.13 --extra dev ruff check pyproject.toml src/microplex_us/pipelines/compact_policyengine_dataset.py tests/pipelines/test_compact_policyengine_dataset.py
  • uv run --python 3.13 --extra dev ruff format --check pyproject.toml src/microplex_us/pipelines/compact_policyengine_dataset.py tests/pipelines/test_compact_policyengine_dataset.py
  • uv run --python 3.13 --extra dev --extra policyengine pytest -q tests/pipelines/test_compact_policyengine_dataset.py
  • real-data smoke: compacted current MP candidate to 1,000 households and wrote summary under artifacts/mp300k_compact_screen_20260528

Notes

This supports the compact mp-size release path. Current screening found top-100k/top-120k compacts can meet size/runtime gates on the stored May 17 target matrix, but latest-us-data scoring still loses to eCPS because the current broad target surface is dominated by national IRS other and state AGI distribution regressions.

@MaxGhenis MaxGhenis merged commit c5caea0 into main May 28, 2026
3 checks passed
@MaxGhenis MaxGhenis deleted the codex/mp-compact-export branch May 28, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant