Add CBO aggregate + per-AGI-bracket targets for cap gains, dividends, interest#868
Conversation
… interest The calibration optimizer was leaving capital gains, dividends, and interest aggregates 5-15x inflated relative to CBO targets in the default enhanced_cps_2024 dataset. Two records with raw $62M / $79M LTCG ended up with calibration weights of 73k / 51k — together contributing 87% of the inflated $9.92T national LTCG aggregate (real CBO target: $1.29T for 2024). Root cause is the same as #555 at state level: when PR #537 removed the AGI ceiling on PUF imputation, calibration weights weren't re-tuned to constrain the new high-income tail. The bracket-level SOI targets in build_loss_matrix can be satisfied while a few records absorb extreme weight to fill the population/AGI targets, overshooting the un-constrained component aggregates as a side effect. Two changes: 1. utils/loss.py (build_loss_matrix, used by enhanced_cps_2024): Add three CBO income_by_source aggregate targets — net_capital_gains, qualified_dividend_income, and taxable_interest_and_ordinary_dividends — so the optimizer has hard upper bounds on these national totals. 2. calibration/target_config.yaml (used by unified_calibration / national/US.h5): add per-AGI-bracket net_capital_gains targets (the DB already has them from SOI ETL, they just weren't included), plus re-include dividend_income, qualified_dividend_income, and taxable_interest_income aggregates that were previously dropped for "high error or tension". 30% rel-error on a soft target is still vastly better than no constraint. Refs #555, #866. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Local rebuild / validation update after the latest fixes. Pushed additional fixes in
I also rolled the HF Local build path run: The final Aggregate sanity check, 2026
UBI + top LTCG rate 20% to 25%, 2026
Interpretation: the national capital-income aggregate bug is fixed. The old negative-AGI LTCG whale is gone: Local verification: |
Summary
Adds calibration targets to constrain capital-gains, dividend, and interest-income aggregates that were running 5-15x over CBO/SOI targets in the default
enhanced_cps_2024dataset.Fixes the national-level half of #555 (which only flagged the issue at state level) and resolves #866.
Diagnosis
In the current default
enhanced_cps_2024.h5, two records with raw $62M / $79M LTCG ended up with calibration weights of 73k / 51k — together contributing 87% of the $9.92T national long-term capital gains aggregate. CBO target for 2024: $1.29T. So we were 7.7× over for LTCG, 12× over fornet_capital_gains, and Gini was reading 0.93 (vs. real ~0.45) on the resulting income distribution.Same mechanism as #555: PR #537 removed the AGI ceiling on PUF imputation, but the calibration targets weren't tightened to constrain the new high-income tail. The bracket-level SOI targets in
build_loss_matrixcan be satisfied while a few records absorb extreme weight to fill the population/AGI targets — overshooting the un-constrained component aggregates as a side effect.Changes
1.
utils/loss.pybuild_loss_matrix(consumed by the legacyEnhancedCPS_2024.generate()path that buildsenhanced_cps_2024.h5):Add three CBO
income_by_sourceaggregate targets so the optimizer has hard upper bounds on national totals:2.
calibration/target_config.yaml(consumed byunified_calibrationfornational/US.h5):net_capital_gainstargets (DB already had them from the SOI ETL; just weren't included).tax_unit_count× cap-gains targets (per-bracket counts of returns with cap gains).dividend_income,qualified_dividend_income, andtaxable_interest_incomeaggregates that were previously dropped for "high error or tension". 30% rel-error on a soft target is much better than no constraint at all when the alternative is 5-15× inflation.Test plan
test_policyengine_utils.py)python -m policyengine_us_data.datasets.cps.enhanced_cpsnet_capital_gainslands within ~50% of CBO $1.29T target (currently ~16x over)qualified_dividend_incomelands within ~50% of CBO $354B target (currently ~6x over)household_net_incomedrops from 0.93 to a more plausible leveltarget_config.yamland checknational_unified_diagnostics.csvfor new per-bracket cap-gains rel errorsThe rebuild will reveal whether soft targets are sufficient or whether we also need the per-record contribution cap proposed in #555. If aggregates still run 2-3× over after this change, we'll need to escalate to a hard-constraint solution in
microcalibrate.Refs
🤖 Generated with Claude Code