Skip to content

Construct CPS tax units from household records#824

Merged
MaxGhenis merged 2 commits intomainfrom
codex/cps-tax-unit-construction
Apr 25, 2026
Merged

Construct CPS tax units from household records#824
MaxGhenis merged 2 commits intomainfrom
codex/cps-tax-unit-construction

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Builds CPS tax units from ASEC household records instead of using Census TAX_ID as the production tax-unit graph.

Key points:

  • Constructs tax units from reciprocal spouses, parent pointers, student/disability/age eligibility, gross-income limits, and relationship rules.
  • Keeps only constructed TAX_IDs in the staged CPS data path; PE-US infers tax-unit roles and filing statuses from the resulting graph.
  • Preserves CENSUS_TAX_ID for audit/comparison.
  • Adds a census_documented comparison mode and two validation harnesses.
  • Adds unit and integration coverage for assignment edge cases and CPS staging.

Closes #815.

Validation

Structural comparison against the 2025 ASEC public person file for tax year 2024:

  • Household exact match vs Census: 93.55%
  • Exact match excluding Census spouse-split households: 96.91%
  • Person same-unit match: 92.97%
  • Reciprocal spouse splits: Census 6.82%, constructed 0.00%
  • Qualifying-child parent-pointer splits: Census 1.86%, constructed 0.34%
  • Minor singleton rate: Census 1.35%, constructed 1.28%

Outcome comparison holding staged CPS arrays fixed and swapping only tax-unit IDs:

  • Taxable SOI rows mean absolute relative error: 1.12884 -> 1.12773
  • Selected tax rows mean absolute relative error: 0.68472 -> 0.67852
  • Selected tax rows RMSE relative error: 2.45406 -> 2.40138
  • Aggregate SOI rows mean absolute relative error: 1.17441 -> 1.17380

Commands run:

  • uv run ruff check policyengine_us_data/datasets/cps/census_cps.py policyengine_us_data/datasets/cps/cps.py policyengine_us_data/datasets/cps/tax_unit_construction.py policyengine_us_data/datasets/cps/tax_unit_rule_helpers.py tests/integration/test_census_cps.py tests/integration/test_cps.py tests/unit/datasets/test_cps_tax_unit_construction.py validation/cps_tax_unit_validation.py validation/cps_tax_unit_outcome_validation.py
  • uv run pytest tests/unit/datasets/test_cps_tax_unit_construction.py -q
  • uv run pytest tests/integration/test_census_cps.py -k 'resolve_person_usecols or fill_missing_optional_person_columns or create_tax_unit_table' -q
  • uv run pytest tests/integration/test_cps.py -k 'add_personal_variables or add_id_variables or validate_raw_cps_schema' -q
  • PYTHONPATH=/Users/maxghenis/PolicyEngine/policyengine-us uv run python validation/cps_tax_unit_outcome_validation.py policyengine_us_data/storage/cps_2024.h5 /tmp/asecpub25csv.zip --csv-name pppub25.csv --year 2024 --output /tmp/cps_tax_unit_outcome_validation_ids_only.json

@MaxGhenis MaxGhenis marked this pull request as ready for review April 25, 2026 00:59
@MaxGhenis MaxGhenis enabled auto-merge (squash) April 25, 2026 00:59
@MaxGhenis MaxGhenis merged commit 40314a7 into main Apr 25, 2026
10 checks passed
@MaxGhenis MaxGhenis deleted the codex/cps-tax-unit-construction branch April 25, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore rules-based tax-unit construction as alternative to Census TAX_ID

1 participant