Add state tax-credit program validations (40 baseline levels + 12 repeal reforms) by DTrim99 · Pull Request #295 · PolicyEngine/populace

DTrim99 · 2026-07-02T19:10:35Z

Extends reform validation beyond federal benchmarks with out-of-sample state program checks — modeled state EITC/WFC/CTC totals vs official state statistics — so the calibration dashboard can show how well populace reproduces the state programs downstream tools (e.g. the child-poverty dashboard) simulate.

What's added

us/state_program_levels.json — 40 baseline-level backtests across 28 states (26 official actuals, 14 documented approximations flagged benchmark.score_type: "approximation"). Each compares the populace baseline weighted total of a state credit variable to a published program cost. State credit variables are defined_for their state, so the national sum is the state total — no geography filtering needed. Sources are state DOR/FTB/comptroller statistical or tax-expenditure reports (every figure fetched and independently re-verified at the cited page); approximations are things like match_rate × IRS federal EITC in the state (IRS EITC Central TY2024) or ACS counts × statutory amounts, with the method in the source string.

us/state_program_reforms.json — 12 repeal-style reform validations (neutralize the credit variable, score the change in state_income_tax) for the programs with the strongest tax-year actuals: MN CTC+WFC, CA CalEITC + YCTC, NY EITC + ESCC, CO EITC + CTC, CT, NM ×2, MD (refundable+nonrefundable sum), IL.

Harness (backward-compatible): BaselineLevelSpec gains per-spec category and benchmark_score_type (SOI rows keep "IRS SOI actual" / "actual" — tested); new loaders state_program_level_specs / state_program_reform_specs; default_baseline_level_specs() aggregates SOI + state levels and the release tool uses it. load_default_reform_specs includes the state repeals. The calibration-diagnostics dashboard renders the new category with zero changes.

Out-of-sample by construction

State program totals are not calibration targets (the surface calibrates federal EITC by AGI bracket and aggregate state income-tax liability), so every row lands in the dashboard's out-of-sample KPIs.

Pilot (released `populace_us_2024.h5`)

Program	populace vs official
MN CTC + WFC (TY2024)	+6.1%
CO EITC (TY2023, 50% match both years)	−7.8%
NY EITC (TY2024)	−9.7%
CA CalEITC (TY2023)	−12.9%
CA YCTC (TY2023)	+38.7%
CO CTC (TY2023)	+30.4%

The under-6 credits (YCTC, CO CTC) running ~30–40% hot is a genuine dataset finding this validation is designed to surface. EITCs land modestly below officials, consistent with take-up.

Documented exclusions (in the file comments)

VA (take-better-of refundable/nonrefundable — no single published line matches the variable); GA/DC/RI CTCs + PA EITC (not in effect in 2024); CT CTC (no ongoing credit, only the 2022 rebate); ME Dependent Exemption Tax Credit (only the refundable excess is published); UT CTC; WA WFTC repeal row (WA levies no income tax — it is levels-only).

Tests

test_reform_validation.py: +6 tests (loader parsing, per-spec category/score_type rows, SOI-default preservation, aggregator, shipped-config well-formedness). 29/29 pass; ruff format + check clean. Variable names verified against policyengine-us 1.755.5.

Suggested verification before publishing: a staging build, then check /populace/reforms?release=staging:{run_id} on the dashboard shows the "State program" rows.

🤖 Generated with Claude Code

…eal reforms) Extends reform validation beyond federal benchmarks with out-of-sample STATE program checks, sourced from official state statistics collected and double-verified from revenue-department reports (Jul 2026): - us/state_program_levels.json: 40 baseline-level backtests (26 official actuals, 14 documented approximations) comparing populace baseline totals of state credit variables (EITC/WFC/CTC family across 28 states) to published program costs. State credit variables are defined_for their state, so the national weighted sum IS the state total - no geography filtering needed. - us/state_program_reforms.json: 12 neutralize-variable repeal reforms scored on state_income_tax against strong tax-year actuals (MN, CA x2, NY x2, CO x2, CT, NM x2, MD, IL). - BaselineLevelSpec gains per-spec category and benchmark score_type (approximation vs actual); SOI rows keep their historical defaults. - New loaders state_program_level_specs / state_program_reform_specs wired into load_default_reform_specs and a new default_baseline_level_specs used by the release tool. The calibration-diagnostics dashboard renders the new category with no changes. None of these totals are calibration targets, so every row is genuinely out-of-sample. Documented exclusions: VA (take-better-of structure), GA/DC/RI CTCs + PA EITC (post-2024 start), CT CTC (no ongoing credit), ME DETC (only refundable excess published), UT CTC, WA WFTC repeal row (no income tax). Pilot on the released populace_us_2024.h5: MN CTC+WFC +6.1%, CO EITC -7.8%, NY EITC -9.7%, CA CalEITC -12.9%, CA YCTC +38.7%, CO CTC +30.4% vs official. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The country-package contract requires every shipped file to be declared; test_us_package_loads and the spec-only manifest test enforce it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DTrim99 and others added 2 commits July 2, 2026 15:09

Declare the state-program validation configs in country_package.json

0eadf24

The country-package contract requires every shipped file to be declared; test_us_package_loads and the spec-only manifest test enforce it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295

Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295
DTrim99 wants to merge 2 commits into
PolicyEngine:mainfrom
DTrim99:state-program-validation

DTrim99 commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DTrim99 commented Jul 2, 2026

What's added

Out-of-sample by construction

Pilot (released populace_us_2024.h5)

Documented exclusions (in the file comments)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pilot (released `populace_us_2024.h5`)