National enhanced_cps_2024 has 5-15x inflated capital-gains/dividend/interest aggregates (same as #555 at state level)

## Summary

The `enhanced_cps_2024` dataset (national, the `Microsimulation()` default) has the same inflated-aggregates problem that #555 reported at the state level. Capital gains, dividends, and interest income aggregate to 5–15× their CBO/SOI targets, even though `adjusted_gross_income` and `income_tax` hit their targets correctly.

This breaks any analysis that touches the income distribution: top-share metrics, Gini, capital-gains revenue scoring, etc.

## Aggregates (2026, default `Microsimulation()`)

| Variable | Model 2026 | CBO/real 2026 | Ratio |
|---|---:|---:|---:|
| `net_capital_gains` | $20.75T | ~$1.7T (CBO) | **12.2×** |
| `long_term_capital_gains` | $13.30T | ~$1.7T (CBO) | 7.8× |
| `short_term_capital_gains` | $7.45T | ~$0.3T | **24.8×** |
| `qualified_dividend_income` | $2.25T | ~$0.4T | 5.6× |
| `taxable_interest_income` | $3.12T | ~$0.5T | 6.2× |
| `partnership_s_corp_income` | $1.38T | ~$0.6T | 2.3× |
| `household_net_income` | $82.21T | ~$22T | **3.7×** |
| `household_market_income` | $85.14T | ~$22T | 3.9× |
| `adjusted_gross_income` | $16.69T | $18.81T (CBO) | 0.9× ✓ |
| `income_tax` | $2.48T | ~$2.2T | 1.1× ✓ |

## Concentration

The bulk of the inflation comes from very few records. Top 30 records by weighted LTCG contribution to the $9.92T 2024 aggregate:

```
rank    idx     weight  raw_ltcg($M)  wtd_ltcg($B)   cum%
   1  52148   73,763.5         62.2     4,586.55   46.3
   2  52077   50,965.1         79.0     4,024.01   86.8
   3  99526   27,023.2          4.9       131.40   88.2
   4  60528   10,563.7          7.0        73.57   88.9
   ...
```

Two records account for 87% of the inflated aggregate. Their **raw** LTCG values ($62M, $79M) are realistic for an individual top-tail tax return — the problem is they got assigned calibration weights of 73,000 and 51,000, meaning each record represents tens of thousands of households at that income level. That's roughly 2–3 orders of magnitude more than the actual count of $50M+ LTCG households in the US.

This matches the diagnosis in #555 ("calibration weights were not re-tuned" after PR #537 removed the AGI ceiling), but #555 closed scope to state-level files. The same problem exists in the *national* `enhanced_cps_2024.h5` served as the default dataset.

## Repro

```python
from policyengine_us import Microsimulation
sim = Microsimulation()
print(f"net_capital_gains 2026: ${sim.calc('net_capital_gains', period=2026).sum() / 1e12:.2f}T")
# Output: net_capital_gains 2026: $20.75T   (CBO target: ~$1.7T)
```

## Knock-on effects

- **Top-share / Gini metrics are broken.** Person-weighted household_net_income Gini is 0.93 (real US ~0.45–0.50). The 99.99th weighted percentile of household_net_income is $579M.
- **Cap-gains revenue scoring is overstated** by ~3–10× for any reform that hits the top LTCG bracket. PolicyEngine API impact estimates that touch this part of the distribution will report inflated revenue effects.
- **Distributional analyses** that use deciles based on per-capita household income will show extreme top-decile means ($6.5M for D10) and underweight the lower deciles.

## Calibration target is in `build_loss_matrix` but isn't binding

`utils/loss.py` does add `capital_gains_gross` per AGI bracket × filing status (and an "All" aggregate row), but the L0 optimizer either doesn't converge to the cap-gains target or trades it off against sparsity. Either:
- The L0 regularization is too strong and the optimizer prefers concentrated weights (a few records with very high weight) over distributed weights;
- Or a competing target (e.g., AGI total in a high-AGI bracket) is forcing weight onto these specific records.

The result is the same as #555: a couple of high-income records absorb extreme weight to satisfy other constraints, blowing up the income-component aggregates.

## Suggested fixes

1. **Add a hard per-record contribution cap** to the L0 optimizer in `microcalibrate`: `max(weight × value)` per (record, calibration variable) bounded by some fraction of the national target.
2. **Or shrink the AGI ceiling back** for PUF-imputed records (effectively, cap raw LTCG/dividend/interest values at, e.g., $50M). This was the pre-#537 behavior.
3. **Or add explicit national aggregate targets** as separate (not summable) loss-matrix rows for `capital_gains_gross`, `qualified_dividends`, `ordinary_dividends`, `taxable_interest_income`, `partnership_and_s_corp_income` and tighten their relative-error weight.

#555 suggests fix (1). Whichever is chosen, this needs to ship before the dataset is used for any income-distribution analysis.

## Related

- #555 — same bug at state level (Feb 2026), open
- #530 / #537 — original AGI ceiling removal (the trigger)
- a83a93ab — Forbes-backed PUF top tail (April 2026); not the cause but exacerbates the top tail


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

National enhanced_cps_2024 has 5-15x inflated capital-gains/dividend/interest aggregates (same as #555 at state level) #866

Summary

Aggregates (2026, default `Microsimulation()`)

Concentration

Repro

Knock-on effects

Calibration target is in `build_loss_matrix` but isn't binding

Suggested fixes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Variable	Model 2026	CBO/real 2026	Ratio
`net_capital_gains`	$20.75T	~$1.7T (CBO)	12.2×
`long_term_capital_gains`	$13.30T	~$1.7T (CBO)	7.8×
`short_term_capital_gains`	$7.45T	~$0.3T	24.8×
`qualified_dividend_income`	$2.25T	~$0.4T	5.6×
`taxable_interest_income`	$3.12T	~$0.5T	6.2×
`partnership_s_corp_income`	$1.38T	~$0.6T	2.3×
`household_net_income`	$82.21T	~$22T	3.7×
`household_market_income`	$85.14T	~$22T	3.9×
`adjusted_gross_income`	$16.69T	$18.81T (CBO)	0.9× ✓
`income_tax`	$2.48T	~$2.2T	1.1× ✓

National enhanced_cps_2024 has 5-15x inflated capital-gains/dividend/interest aggregates (same as #555 at state level) #866

Description

Summary

Aggregates (2026, default Microsimulation())

Concentration

Repro

Knock-on effects

Calibration target is in build_loss_matrix but isn't binding

Suggested fixes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Aggregates (2026, default `Microsimulation()`)

Calibration target is in `build_loss_matrix` but isn't binding