Data-quality issue in the default US dataset (Enhanced CPS), 2026-05-19 to 2026-06-12 #429

MaxGhenis · 2026-06-19T20:21:48Z

MaxGhenis
Jun 19, 2026
Maintainer

Summary

Between 2026-05-19 and 2026-06-12, the Enhanced CPS — the US microdata PolicyEngine shipped as the default — had inflated top-of-distribution aggregates due to a calibration change. Society-wide microsimulation outputs that depend on high incomes or top-concentrated deductions (total AGI, top earners, itemized/charitable deductions, and reforms scored against them) were overstated during that window, on the surfaces that used the affected dataset.

This was not a bug in the tax-benefit rules — those were unchanged. It was in how the dataset's survey weights were calibrated. It's now fixed on every surface, and we're sorry to anyone whose analysis it affected. We're using this incident to invest in more transparent calibration reporting. Specifics and links below.

What went wrong (specifically)

We calibrate the Enhanced CPS by reweighting survey records to match hundreds of administrative targets (IRS SOI, JCT, CBO, Census, NIPA/BEA). On 2026-05-19, a recalibration (policyengine-us-data 1.115.x) added new targets to better capture the income of nonfilers — people who don't file tax returns, whom the filer-based IRS SOI data misses — which improves low-income and poverty accuracy. These included NIPA employment targets (PolicyEngine/policyengine-us-data#1020) and BEA state wage targets (PolicyEngine/policyengine-us-data#1034).

The Enhanced CPS carries relatively little high-income mass. Under a single flat-weighted loss, pushing harder on the new nonfiler-income targets created a tension the optimizer resolved by over-weighting a handful of synthetic very-high-income records to still hit top-bracket AGI totals. The result (TY2024):

total AGI jumped from ~$14T (sane, through 2026-05-18) to ~$19.7T (05-19) and ~$23.9T (05-20) — against an SOI level of ~$15.5T;
the $100M+ AGI band went from ~$0.4T to ~$7.5T;
top-concentrated itemized deductions inflated — modeled charitable deductions ran ~9× the SOI level — so reforms that repeal or cap deductions scored far too high.

We bisected the onset to the day: sane through 2026-05-18, broke at the 2026-05-19 recalibration. The general failure mode — aggregate targets "drowned out" by flat loss weighting — is tracked in PolicyEngine/policyengine-us-data#1107.

Who was affected

This affects society-wide / microsimulation outputs — population aggregates, distributional results, and reform scores — which use the microdata. Single-household calculations are not affected: they apply the tax-benefit rules to a household you enter and don't touch the microdata at all.

Across surfaces, society-wide outputs were affected for these windows:

Surface	Affected window
policyengine.org society-wide / economy analysis (routes to the v2 simulation API → the certified bundle dataset)	2026-05-19 → 2026-06-12
`policyengine` bundle (policyengine.py) and any `managed_microsimulation` user	2026-05-19 → 2026-06-12
`policyengine-us`, default `Microsimulation()` called directly	2026-05-19 → 2026-06-19 (fixed in 1.739.1)
policyengine.org single-household calculator	not affected (no microdata)

The web app and the bundle were remediated together when the certified dataset moved to Populace (bundle 4.16.1, 2026-06-12); direct policyengine-us was the last surface, fixed today. If you ran society-wide analyses in those windows — especially anything sensitive to high incomes or to itemized/charitable deductions — we recommend re-running on the current certified data.

What's fixed

The certified US dataset is now Populace, a primary-source build that supplies real top-of-distribution mass rather than over-weighting synthetic records: https://github.com/PolicyEngine/populace
Populace also calibrates differently. Each target's contribution to the loss is weighted by the square root of its value, so large aggregates like total AGI carry more weight than the many small targets and can't be drowned out — the failure mode behind this incident. And a hard cap limits how much any single record's weight can grow (50× its starting weight in the US build), so a handful of records can't be over-weighted to force a fit.
The policyengine bundle moved to Populace at 4.16.1 (2026-06-12); the v2 simulation API — and the policyengine.org society-wide analysis it powers — followed via the bundle manifest.
policyengine-us now defaults to the certified Populace build as of 1.739.1: Point default US dataset to certified Populace build policyengine-us#8692. A fresh Microsimulation() returns TY2026 total AGI ≈ $17.0T (vs ≈ $24.9T on the broken default) and claimed itemized deductions ≈ $0.87T.

What to use going forward

For analysis, use the policyengine bundle (policyengine.py) rather than calling policyengine-us directly. The bundle's managed_microsimulation pins a specific certified, gate-checked dataset build per release and advances only when a new build passes our calibration gates — which is also where we're strengthening those gates after this incident. The default Microsimulation() in policyengine-us, by contrast, tracks whatever dataset is latest with no certification step; that's why direct callers stayed on the broken Enhanced CPS even after the managed surfaces moved to Populace on 2026-06-12. The bundle is the managed, reproducible entry point.

In fact, direct microsimulation in the country packages (policyengine-us, policyengine-uk) is being deprecated and consolidated in policyengine.py, which is where dataset certification and calibration gates live. The country packages remain the home of the tax-benefit rules and household-level calculations; for society-wide microsimulation, use policyengine.py. (This migration is also why the direct policyengine-us default was the last surface to be repointed — it is a legacy entry point we're moving away from.)

Inspect the calibration yourself

We publish per-build calibration diagnostics. You can see how the current certified dataset matches each administrative target here:

→ https://calibration-diagnostics.vercel.app/populace

We're investing further in this, and we'd value your input: what information about our data's calibration would most help you judge its fit for your analysis — which targets, distributions, or programs matter most for your use case? Let us know in the comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-quality issue in the default US dataset (Enhanced CPS), 2026-05-19 to 2026-06-12 #429

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Data-quality issue in the default US dataset (Enhanced CPS), 2026-05-19 to 2026-06-12 #429

Uh oh!

Uh oh!

MaxGhenis Jun 19, 2026 Maintainer

Summary

What went wrong (specifically)

Who was affected

What's fixed

What to use going forward

Inspect the calibration yourself

Replies: 0 comments

MaxGhenis
Jun 19, 2026
Maintainer