Skip to content

Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295

Open
DTrim99 wants to merge 2 commits into
PolicyEngine:mainfrom
DTrim99:state-program-validation
Open

Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295
DTrim99 wants to merge 2 commits into
PolicyEngine:mainfrom
DTrim99:state-program-validation

Conversation

@DTrim99

@DTrim99 DTrim99 commented Jul 2, 2026

Copy link
Copy Markdown

Extends reform validation beyond federal benchmarks with out-of-sample state program checks — modeled state EITC/WFC/CTC totals vs official state statistics — so the calibration dashboard can show how well populace reproduces the state programs downstream tools (e.g. the child-poverty dashboard) simulate.

What's added

us/state_program_levels.json — 40 baseline-level backtests across 28 states (26 official actuals, 14 documented approximations flagged benchmark.score_type: "approximation"). Each compares the populace baseline weighted total of a state credit variable to a published program cost. State credit variables are defined_for their state, so the national sum is the state total — no geography filtering needed. Sources are state DOR/FTB/comptroller statistical or tax-expenditure reports (every figure fetched and independently re-verified at the cited page); approximations are things like match_rate × IRS federal EITC in the state (IRS EITC Central TY2024) or ACS counts × statutory amounts, with the method in the source string.

us/state_program_reforms.json — 12 repeal-style reform validations (neutralize the credit variable, score the change in state_income_tax) for the programs with the strongest tax-year actuals: MN CTC+WFC, CA CalEITC + YCTC, NY EITC + ESCC, CO EITC + CTC, CT, NM ×2, MD (refundable+nonrefundable sum), IL.

Harness (backward-compatible): BaselineLevelSpec gains per-spec category and benchmark_score_type (SOI rows keep "IRS SOI actual" / "actual" — tested); new loaders state_program_level_specs / state_program_reform_specs; default_baseline_level_specs() aggregates SOI + state levels and the release tool uses it. load_default_reform_specs includes the state repeals. The calibration-diagnostics dashboard renders the new category with zero changes.

Out-of-sample by construction

State program totals are not calibration targets (the surface calibrates federal EITC by AGI bracket and aggregate state income-tax liability), so every row lands in the dashboard's out-of-sample KPIs.

Pilot (released populace_us_2024.h5)

Program populace vs official
MN CTC + WFC (TY2024) +6.1%
CO EITC (TY2023, 50% match both years) −7.8%
NY EITC (TY2024) −9.7%
CA CalEITC (TY2023) −12.9%
CA YCTC (TY2023) +38.7%
CO CTC (TY2023) +30.4%

The under-6 credits (YCTC, CO CTC) running ~30–40% hot is a genuine dataset finding this validation is designed to surface. EITCs land modestly below officials, consistent with take-up.

Documented exclusions (in the file comments)

VA (take-better-of refundable/nonrefundable — no single published line matches the variable); GA/DC/RI CTCs + PA EITC (not in effect in 2024); CT CTC (no ongoing credit, only the 2022 rebate); ME Dependent Exemption Tax Credit (only the refundable excess is published); UT CTC; WA WFTC repeal row (WA levies no income tax — it is levels-only).

Tests

test_reform_validation.py: +6 tests (loader parsing, per-spec category/score_type rows, SOI-default preservation, aggregator, shipped-config well-formedness). 29/29 pass; ruff format + check clean. Variable names verified against policyengine-us 1.755.5.

Suggested verification before publishing: a staging build, then check /populace/reforms?release=staging:{run_id} on the dashboard shows the "State program" rows.

🤖 Generated with Claude Code

DTrim99 and others added 2 commits July 2, 2026 15:09
…eal reforms)

Extends reform validation beyond federal benchmarks with out-of-sample STATE
program checks, sourced from official state statistics collected and
double-verified from revenue-department reports (Jul 2026):

- us/state_program_levels.json: 40 baseline-level backtests (26 official
  actuals, 14 documented approximations) comparing populace baseline totals of
  state credit variables (EITC/WFC/CTC family across 28 states) to published
  program costs. State credit variables are defined_for their state, so the
  national weighted sum IS the state total - no geography filtering needed.
- us/state_program_reforms.json: 12 neutralize-variable repeal reforms scored
  on state_income_tax against strong tax-year actuals (MN, CA x2, NY x2,
  CO x2, CT, NM x2, MD, IL).
- BaselineLevelSpec gains per-spec category and benchmark score_type
  (approximation vs actual); SOI rows keep their historical defaults.
- New loaders state_program_level_specs / state_program_reform_specs wired
  into load_default_reform_specs and a new default_baseline_level_specs used
  by the release tool. The calibration-diagnostics dashboard renders the new
  category with no changes.

None of these totals are calibration targets, so every row is genuinely
out-of-sample. Documented exclusions: VA (take-better-of structure), GA/DC/RI
CTCs + PA EITC (post-2024 start), CT CTC (no ongoing credit), ME DETC (only
refundable excess published), UT CTC, WA WFTC repeal row (no income tax).

Pilot on the released populace_us_2024.h5: MN CTC+WFC +6.1%, CO EITC -7.8%,
NY EITC -9.7%, CA CalEITC -12.9%, CA YCTC +38.7%, CO CTC +30.4% vs official.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The country-package contract requires every shipped file to be declared;
test_us_package_loads and the spec-only manifest test enforce it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant