Data-quality issue in the default US dataset (Enhanced CPS), 2026-05-19 to 2026-06-12 #429
MaxGhenis
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Between 2026-05-19 and 2026-06-12, the Enhanced CPS — the US microdata PolicyEngine shipped as the default — had inflated top-of-distribution aggregates due to a calibration change. Society-wide microsimulation outputs that depend on high incomes or top-concentrated deductions (total AGI, top earners, itemized/charitable deductions, and reforms scored against them) were overstated during that window, on the surfaces that used the affected dataset.
This was not a bug in the tax-benefit rules — those were unchanged. It was in how the dataset's survey weights were calibrated. It's now fixed on every surface, and we're sorry to anyone whose analysis it affected. We're using this incident to invest in more transparent calibration reporting. Specifics and links below.
What went wrong (specifically)
We calibrate the Enhanced CPS by reweighting survey records to match hundreds of administrative targets (IRS SOI, JCT, CBO, Census, NIPA/BEA). On 2026-05-19, a recalibration (
policyengine-us-data1.115.x) added new targets to better capture the income of nonfilers — people who don't file tax returns, whom the filer-based IRS SOI data misses — which improves low-income and poverty accuracy. These included NIPA employment targets (PolicyEngine/policyengine-us-data#1020) and BEA state wage targets (PolicyEngine/policyengine-us-data#1034).The Enhanced CPS carries relatively little high-income mass. Under a single flat-weighted loss, pushing harder on the new nonfiler-income targets created a tension the optimizer resolved by over-weighting a handful of synthetic very-high-income records to still hit top-bracket AGI totals. The result (TY2024):
We bisected the onset to the day: sane through 2026-05-18, broke at the 2026-05-19 recalibration. The general failure mode — aggregate targets "drowned out" by flat loss weighting — is tracked in PolicyEngine/policyengine-us-data#1107.
Who was affected
This affects society-wide / microsimulation outputs — population aggregates, distributional results, and reform scores — which use the microdata. Single-household calculations are not affected: they apply the tax-benefit rules to a household you enter and don't touch the microdata at all.
Across surfaces, society-wide outputs were affected for these windows:
policyenginebundle (policyengine.py) and anymanaged_microsimulationuserpolicyengine-us, defaultMicrosimulation()called directlyThe web app and the bundle were remediated together when the certified dataset moved to Populace (bundle 4.16.1, 2026-06-12); direct
policyengine-uswas the last surface, fixed today. If you ran society-wide analyses in those windows — especially anything sensitive to high incomes or to itemized/charitable deductions — we recommend re-running on the current certified data.What's fixed
policyenginebundle moved to Populace at 4.16.1 (2026-06-12); the v2 simulation API — and the policyengine.org society-wide analysis it powers — followed via the bundle manifest.policyengine-usnow defaults to the certified Populace build as of 1.739.1: Point default US dataset to certified Populace build policyengine-us#8692. A freshMicrosimulation()returns TY2026 total AGI ≈ $17.0T (vs ≈ $24.9T on the broken default) and claimed itemized deductions ≈ $0.87T.What to use going forward
For analysis, use the
policyenginebundle (policyengine.py) rather than callingpolicyengine-usdirectly. The bundle'smanaged_microsimulationpins a specific certified, gate-checked dataset build per release and advances only when a new build passes our calibration gates — which is also where we're strengthening those gates after this incident. The defaultMicrosimulation()inpolicyengine-us, by contrast, tracks whatever dataset is latest with no certification step; that's why direct callers stayed on the broken Enhanced CPS even after the managed surfaces moved to Populace on 2026-06-12. The bundle is the managed, reproducible entry point.In fact, direct microsimulation in the country packages (
policyengine-us,policyengine-uk) is being deprecated and consolidated inpolicyengine.py, which is where dataset certification and calibration gates live. The country packages remain the home of the tax-benefit rules and household-level calculations; for society-wide microsimulation, usepolicyengine.py. (This migration is also why the directpolicyengine-usdefault was the last surface to be repointed — it is a legacy entry point we're moving away from.)Inspect the calibration yourself
We publish per-build calibration diagnostics. You can see how the current certified dataset matches each administrative target here:
→ https://calibration-diagnostics.vercel.app/populace
We're investing further in this, and we'd value your input: what information about our data's calibration would most help you judge its fit for your analysis — which targets, distributions, or programs matter most for your use case? Let us know in the comments.
Beta Was this translation helpful? Give feedback.
All reactions