Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295
Open
DTrim99 wants to merge 2 commits into
Open
Add state tax-credit program validations (40 baseline levels + 12 repeal reforms)#295DTrim99 wants to merge 2 commits into
DTrim99 wants to merge 2 commits into
Conversation
…eal reforms) Extends reform validation beyond federal benchmarks with out-of-sample STATE program checks, sourced from official state statistics collected and double-verified from revenue-department reports (Jul 2026): - us/state_program_levels.json: 40 baseline-level backtests (26 official actuals, 14 documented approximations) comparing populace baseline totals of state credit variables (EITC/WFC/CTC family across 28 states) to published program costs. State credit variables are defined_for their state, so the national weighted sum IS the state total - no geography filtering needed. - us/state_program_reforms.json: 12 neutralize-variable repeal reforms scored on state_income_tax against strong tax-year actuals (MN, CA x2, NY x2, CO x2, CT, NM x2, MD, IL). - BaselineLevelSpec gains per-spec category and benchmark score_type (approximation vs actual); SOI rows keep their historical defaults. - New loaders state_program_level_specs / state_program_reform_specs wired into load_default_reform_specs and a new default_baseline_level_specs used by the release tool. The calibration-diagnostics dashboard renders the new category with no changes. None of these totals are calibration targets, so every row is genuinely out-of-sample. Documented exclusions: VA (take-better-of structure), GA/DC/RI CTCs + PA EITC (post-2024 start), CT CTC (no ongoing credit), ME DETC (only refundable excess published), UT CTC, WA WFTC repeal row (no income tax). Pilot on the released populace_us_2024.h5: MN CTC+WFC +6.1%, CO EITC -7.8%, NY EITC -9.7%, CA CalEITC -12.9%, CA YCTC +38.7%, CO CTC +30.4% vs official. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The country-package contract requires every shipped file to be declared; test_us_package_loads and the spec-only manifest test enforce it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends reform validation beyond federal benchmarks with out-of-sample state program checks — modeled state EITC/WFC/CTC totals vs official state statistics — so the calibration dashboard can show how well populace reproduces the state programs downstream tools (e.g. the child-poverty dashboard) simulate.
What's added
us/state_program_levels.json— 40 baseline-level backtests across 28 states (26 official actuals, 14 documented approximations flaggedbenchmark.score_type: "approximation"). Each compares the populace baseline weighted total of a state credit variable to a published program cost. State credit variables aredefined_fortheir state, so the national sum is the state total — no geography filtering needed. Sources are state DOR/FTB/comptroller statistical or tax-expenditure reports (every figure fetched and independently re-verified at the cited page); approximations are things likematch_rate × IRS federal EITC in the state(IRS EITC Central TY2024) or ACS counts × statutory amounts, with the method in the source string.us/state_program_reforms.json— 12 repeal-style reform validations (neutralize the credit variable, score the change instate_income_tax) for the programs with the strongest tax-year actuals: MN CTC+WFC, CA CalEITC + YCTC, NY EITC + ESCC, CO EITC + CTC, CT, NM ×2, MD (refundable+nonrefundable sum), IL.Harness (backward-compatible):
BaselineLevelSpecgains per-speccategoryandbenchmark_score_type(SOI rows keep"IRS SOI actual"/"actual"— tested); new loadersstate_program_level_specs/state_program_reform_specs;default_baseline_level_specs()aggregates SOI + state levels and the release tool uses it.load_default_reform_specsincludes the state repeals. The calibration-diagnostics dashboard renders the new category with zero changes.Out-of-sample by construction
State program totals are not calibration targets (the surface calibrates federal EITC by AGI bracket and aggregate state income-tax liability), so every row lands in the dashboard's out-of-sample KPIs.
Pilot (released
populace_us_2024.h5)The under-6 credits (YCTC, CO CTC) running ~30–40% hot is a genuine dataset finding this validation is designed to surface. EITCs land modestly below officials, consistent with take-up.
Documented exclusions (in the file comments)
VA (take-better-of refundable/nonrefundable — no single published line matches the variable); GA/DC/RI CTCs + PA EITC (not in effect in 2024); CT CTC (no ongoing credit, only the 2022 rebate); ME Dependent Exemption Tax Credit (only the refundable excess is published); UT CTC; WA WFTC repeal row (WA levies no income tax — it is levels-only).
Tests
test_reform_validation.py: +6 tests (loader parsing, per-spec category/score_type rows, SOI-default preservation, aggregator, shipped-config well-formedness). 29/29 pass; ruff format + check clean. Variable names verified against policyengine-us 1.755.5.Suggested verification before publishing: a staging build, then check
/populace/reforms?release=staging:{run_id}on the dashboard shows the "State program" rows.🤖 Generated with Claude Code