Plan: replace eCPS with Microplex record-count datasets #16

MaxGhenis · 2026-05-26T13:49:04Z

MaxGhenis
May 26, 2026
Maintainer

Microplex path to replace eCPS

This is the operational plan for replacing PolicyEngine's Enhanced CPS (enhanced_cps_2024.h5) with Microplex-produced datasets.

It separates record-count size classes that should share source semantics, target compilation, scoring, and artifact discipline but should not share the same runtime envelope. Names are based only on approximate record count, similar to how LLMs are named by parameter scale:

mp-300k: a national default dataset small enough for normal PolicyEngine microsimulation.
mp-3m: a larger local-area pipeline for state, congressional district, and other subnational analyses.
mp-30m: a future full candidate universe that can produce smaller sparse tiers.

The exact realized row counts still live in artifact metadata. The names are record-count classes, not promises that every build has exactly that row count. The intended long-run path is hierarchical: build the largest feasible candidate universe, then use L0 sparsity and post-fit pruning to derive smaller tiers. In that framing, mp-300k and mp-3m should eventually derive from mp-30m or the largest available parent, rather than being unrelated builds.

That is not true of the first mp-300k candidate. The current mp-300k path is a directly sampled small build, without L0 culling from a larger parent universe. That is acceptable for the first replacement test because it answers whether the small record class can beat eCPS at routine microsimulation scale. It should not become the permanent construction contract. Over time, mp-300k should improve by selecting or culling from a larger candidate universe, not only by changing the initial sample.

The replacement claim is not "Microplex copies eCPS." The claim is that Microplex becomes the canonical data construction path for PolicyEngine-US, while policyengine-us remains the measurement and microsimulation runtime.

Current decision

Microplex should replace eCPS in stages, with the national artifact loop proven before broader local work expands:

Prove the mp-300k loop end to end: produce one candidate H5, verify PolicyEngine can load it, run a runtime smoke benchmark, compare it against a pinned eCPS baseline, and write a CI artifact-gate report that the dashboard can index.
Ship mp-300k as a beta national H5 once it passes schema compatibility, remains inside the runtime envelope, and beats the pinned latest eCPS baseline on the broad target suite.
Make mp-300k the default national dataset only after the frozen microsimulation benchmark suite, runtime gate, and rollback path pass.
Continue mp-3m as a separate local-calibration workstream, with sparse analysis artifacts or geography shards, after the national loop is mechanically reliable.

This avoids blocking the national replacement on the harder local-scale engineering problem and prevents target-quality work from masking loader, runtime, or release-contract failures.

Immediate execution loop

The first implementation slice is intentionally narrow. It is not a full local pipeline and not a permanent modeling contract, but it should be a standing CI gate set rather than a one-time readiness checklist. It answers one release-critical question on every candidate artifact: can Microplex produce a national candidate that PolicyEngine can load, run, compare, and index reproducibly?

The loop is:

Candidate artifact: build or accept one mp-300k H5 plus a manifest with record counts, source versions, config, target DB hash, and artifact hash.
Compatibility gate: run an automated H5 contract check against the current policyengine-us loader before judging model quality.
Runtime gate: run a fixed national smoke benchmark and compare candidate runtime to pinned eCPS runtime.
eCPS comparison gate: compare candidate and pinned eCPS on the same kept broad target suite and write target deltas. This is a named gate so it can become nonblocking or be removed when eCPS is retired.
Benchmark manifest gate: require the frozen microsimulation benchmark manifest before any candidate is judged on microsimulation outcomes.
Artifact gate report: emit one machine-readable mp300k_artifact_gates.json report that CI can fail on and the dashboard can index without scraping logs.

A candidate that fails compatibility or runtime stops at that gate. eCPS comparison and benchmark results only matter for artifacts that can load, run, and be reproduced. Compatibility, runtime, artifact, and benchmark-manifest gates should continue after eCPS is deprecated.

Size classes

Product	Intended use	Size target	Main blocker	Replacement bar
`mp-300k`	Default national PolicyEngine microsims and public analyses	about 300k records	model quality and benchmark confidence	beats latest eCPS on broad loss and microsim benchmarks while remaining fast enough for routine use
`mp-3m`	State, CD, and local-area analyses	about 3m records	local runtime, disk, and sparse output design	beats PE local L0 packages on local targets and can run practical local microsims
`mp-30m`	Future parent universe for sparse/distilled tiers	about 30m records	cloud-scale construction and storage	improves the derived `mp-3m` and `mp-300k` frontier after sparsification

User-facing routing:

National default microsimulation: mp-300k.
National results broken out by state or demographic group: mp-300k, unless the analysis explicitly asks for local-calibrated weights.
State-calibrated analyses: mp-3m-fast state shard when available; otherwise fall back to mp-300k only with an explicit "not state-calibrated" label.
Congressional district and smaller geographies: mp-3m-fast geography shard when available. Do not silently use mp-300k for official local claims.
Research sweeps and target debugging: mp-3m-rich.

Shared architecture

flowchart TD
  A["Arch target registry"] --> B["Microplex source providers"]
  B --> C["Source semantics and donor specs"]
  C --> D["Donor integration and imputation"]
  D --> E["Synthetic population spine"]
  E --> F["Fixed spine append: Forbes and other must-keep records"]
  E --> G["Relational add-ons: capital gains lots, assets, diagnostics"]
  F --> H["Residualized GD/L0 calibration"]
  G --> H
  H --> I["PE-ingestable H5 export"]
  I --> J["PolicyEngine target scoring"]
  I --> K["Microsimulation benchmark suite"]
  J --> L["Run dashboard and release gate"]
  K --> L

Important design rule: fixed records, such as Forbes top-tail records, should be added after ordinary population synthesis and excluded from donor fitting. Their weighted target contributions should be subtracted from calibration targets so nonfixed records are calibrated to the residual population.

`mp-300k` build path

mp-300k should use the best currently known Microplex construction path:

CPS ASEC as the core demographic and program scaffold.
ACS100k to improve household, geography, housing, and local demographic support without making the national H5 too slow.
PUF donor integration for filer/tax variables, including top-end wages, capital gains, business income, retirement income, itemized deductions, and filing concepts.
SIPP, SCF, SSA-style, and other Arch-backed sources where they improve disability, SSI, assets, and program variables.
Forbes fixed-spine append for ultra-high-wealth units, with target residualization rather than asking the regular population to absorb those aggregates.
Capital-gains lots as a relational extension, not as a reason to flatten every possible record-level detail into the primary H5.
Gradient-based calibration as the standard Microplex weight path. If l0_lambda=0 remains best for national quality, the implementation should disable hard-concrete gates rather than carrying a fake sparsity mechanism.

`mp-300k` release gates

mp-300k can become the default national dataset when all of these are true:

Pinned baseline: every comparison records the eCPS H5 path, eCPS SHA256, policyengine-us-data commit, policyengine-us version, target DB path, and target DB SHA256. "Latest eCPS" means that pinned baseline, not a moving label.
Schema compatibility: passes an automated H5 contract check against the current policyengine-us loader, including entity tables, IDs, joins, weights, periods, dtypes, missing-value conventions, and absence of source-dataset diagnostic variables.
Target loss: broad target loss is lower than the pinned eCPS baseline on common kept targets.
Protected target families: each protected family is no worse than eCPS by more than 5% relative loss or 0.005 absolute loss, whichever is larger. Protected families are SSI, SNAP, wages, self-employment income, capital gains, interest, dividends, retirement income, disability, and household net income.
High-salience aggregates: absolute percentage error is at least as good as eCPS, or the dashboard marks and explains the regression, for SSI recipients, SSI value, SNAP recipients, SNAP value, wage income, long-term capital gains, taxable interest, ordinary dividends, self-employment income, and household net income.
Microsimulation benchmarks: passes the fixed benchmark suite covering SSI asset limits, CTC/EITC-style tax reforms, capital-gains indexing, and at the frozen external Tara Watson SSI asset-limit benchmark. Pass means no unexplained fiscal or household-net-income delta exceeding 5% of the eCPS estimate or $5 billion, whichever is larger.
Runtime: median runtime over the fixed national benchmark suite is no more than 1.25x eCPS. A candidate can enter beta up to 2.0x eCPS, but cannot become default above 1.25x without an explicit product decision.
Artifact size: national H5 is no more than 2x the eCPS H5 size unless the extra size is from a separately loadable relational extension.
Artifacts: writes a complete immutable bundle containing config, source versions, target DB hash, score files, target deltas, record counts, nonzero weights, effective sample size, and benchmark outputs.
Release contract: has a stable H5 publication path, rollback path, and a documented policyengine-us-data integration point.

`mp-3m` build path

mp-3m should not be forced into the same artifact shape as mp-300k. Local accuracy needs more ACS/local target support, but routine local microsims need sparse or sharded outputs.

Recommended path:

Build from the same source and semantic registry as mp-300k.
Expand ACS and local targets incrementally, with the expectation that larger parent universes can feed smaller sparse tiers.
Calibrate against local target suites from Arch: state, congressional district, age/race/household-type, income, benefits, disability, and program participation where defensible.
Produce two outputs:
- mp-3m-rich: best-fit, larger research artifact.
- mp-3m-fast: sparse or sharded analysis artifact for routine PolicyEngine use.
Prefer geography shards, L0 sparsity, or post-fit sparse output over requiring every run to materialize the entire parent record universe.

`mp-3m` release gates

mp-3m can replace the PE local L0 pipeline when all of these are true:

Pinned local baselines: every comparison records the PE local small-L0 and big-L0 artifact paths or package IDs, their source commits, weight files, target DB path and SHA256, objective definition, and the exact target subset used by the incumbent objective.
Beats PE local small-L0 and big-L0 packages on their pinned actual objective.
Beats those packages on Microplex's broader Arch target suite.
Has explicit held-out target evaluation so local overfitting is visible.
Produces a fast analysis artifact whose median single-geography benchmark runtime is no more than 2x the incumbent PE local artifact.
Can rebuild mp-3m-fast on the standard cloud runner in less than 12 hours without manual cleanup, or has sharded build jobs whose slowest shard is below that bound.
Has recoverable, profiled build stages for donor integration, PE table materialization, scoring, and export.

Held-out target evaluation should hold out complete target groups, not random rows inside a target. The default split should include at least one geography family and one income/program family so epoch tuning cannot overfit only the headline national aggregates.

Dashboard contract

The living dashboard should be the source of truth for replacement gate status. It should show, at minimum:

latest mp-300k candidate versus latest eCPS
latest mp-3m-rich and mp-3m-fast candidates versus PE small-L0 and big-L0
broad loss, local loss, PE-actual objective loss, and microsim benchmark deltas
record counts, positive weights, effective sample size, weight concentration, H5 size, and median microsim runtime
top target wins and losses by source family
whether each release gate is passing, failing, or unmeasured

Every serious run should write a machine-readable loss record that the dashboard can index without scraping logs.

Dashboard/indexing and CI gate publication are release-blocking workstreams. It needs:

a stable loss-result JSON schema
a run indexer that can discover completed local artifacts
dashboard cells for each release gate
a published "current candidate" artifact path
CI or scheduled refresh for static score files, where practical

Cross-repo dependency graph

flowchart LR
  A["arch-data: target facts and semantic scope"] --> B["microplex-us: build, score, and H5 export"]
  B --> C["microplex-evals: microsim benchmark reports"]
  B --> D["policyengine-us-data: artifact publication and loader integration"]
  C --> D
  D --> E["policyengine-us: default dataset switch and fallback behavior"]

Required handoffs:

arch-data to microplex-us: source facts, target scopes, exclusions, and coverage reports are importable and pinned by content hash.
microplex-us to microplex-evals: candidate H5, manifest, score files, and source/target provenance are sufficient to run benchmarks without rebuilding.
microplex-us to policyengine-us-data: exported H5 and metadata satisfy the dataset publication contract.
policyengine-us-data to policyengine-us: loader names, default dataset selection, fallback behavior, and release notes are stable.
policyengine-us default switch: happens only after beta artifacts and benchmark reports are available for rollback comparison.

Epics and issue-sized tasks

Epic 1: National replacement candidate

Build and score the current best mp-300k path.

Child issues:

microplex-us: produce small ASEC + ACS100k build with PUF, SIPP/SCF/Arch additions, Forbes fixed spine, and capital-gains lots enabled.
microplex-us: score candidate against pinned latest eCPS and write top target delta report.
microplex-evals: run microsimulation benchmark suite against candidate and eCPS.

Exit:

decision on whether this candidate is release-track or needs another modeling iteration

Epic 2: Target registry hardening

Make Arch the canonical source of target semantics used by both national and local Microplex builds.

Child issues:

arch-data: add source/concept exclusions for misleading broad concepts.
arch-data: add explicit target scope labels: filer, full population, recipient, household, tax unit, SPM unit, state, CD, local.
microplex-us: consume importable target coverage reports by product.
arch-data and microplex-us: add tests that prevent known semantic regressions, including proprietors income and SSI recipient/value confusion.

Exit:

no material target in the release suite lacks source, scope, and entity provenance

Epic 3: Calibration simplification

Make the winning gradient-based weight path the standard Microplex path.

Child issues:

microplex-us: disable hard-concrete gates automatically when l0_lambda=0.
microplex-us: preserve L0 gates only for mp-3m-fast, parent-to-child derivation, or explicit experiments.
microplex-us: write loss curves and held-out target curves for every run.
microplex-us: define epoch stopping rules from held-out target performance, not only training loss.

Exit:

one standard mp-300k calibration command and one standard mp-3m-fast calibration command

Epic 4: Microsimulation benchmark suite

Codify policy outcomes that must be compared before replacement.

Child issues:

microplex-evals: add national benchmark suite covering SSI asset limits, CTC/EITC, capital-gains indexing, and the Tara Watson SSI asset-limit benchmark.
microplex-evals: freeze a benchmark manifest before judging any release candidate, including reform definitions, periods, expected output fields, and pinned baseline artifacts.
microplex-evals: report aggregate fiscal impact, household net income, winners/losers, poverty/SPM where applicable, and component deltas.
microplex-evals: enforce PolicyEngine MicroSeries operations throughout; no manual weight math.

Exit:

every candidate has a comparable benchmark report against eCPS

Epic 5: Dashboard and release gates

Make replacement claims visible from durable CI gate reports and a dashboard rather than ad hoc log inspection.

Child issues:

microplex-us: define stable artifact-gate and loss-result JSON schemas.
microplex-us: add run indexer support for national, local-rich, and local-fast candidate artifacts.
microplex-us: add dashboard gate cells for target loss, protected families, microsim benchmarks, compatibility, runtime, and artifact size.
microplex-us: publish the current candidate artifact path and score bundle.

Exit:

replacement gate status is visible from CI and one dashboard artifact

Epic 6: Compatibility and publication contract

Make the H5 and metadata contract explicit before any default switch.

Child issues:

microplex-us: add automated H5 compatibility check against policyengine-us.
policyengine-us-data: add loader/publication path for Microplex national beta artifact.
policyengine-us: define default dataset switch, feature flag, and rollback behavior.
policyengine-us-data: document the eCPS incumbent baseline and Microplex replacement status.

Exit:

PolicyEngine can load mp-300k through the normal dataset interface with a documented rollback path

Epic 7: Local pipeline scalability

Make local builds practical and diagnosable.

Child issues:

microplex-us: add profiled stage timings and RSS for donor integration, PE table construction, calibration, scoring, and export.
microplex-us: implement chunked or vectorized PE table construction where needed.
microplex-us: choose sparse output strategy: stronger L0, parent-to-child derivation, post-fit pruning, geography shards, or a combination.
microplex-us: add disk guardrails and resumable checkpoints.

Exit:

mp-3m-fast can run routine local microsims without a multi-day laptop pipeline

Initial milestones

Milestone	Owner repo	Blocking work	Success criterion
M0: National artifact loop	`microplex-us`	candidate/baseline inputs, H5 compatibility check, runtime smoke, eCPS comparison gate, artifact-gate JSON	one CI command can validate, compare, and index one `mp-300k` candidate against pinned eCPS, while keeping non-eCPS gates reusable after eCPS is retired
M1: `mp-300k` candidate scored	`microplex-us`	finish current small build and run it through the standing M0 artifact gates	CI and dashboard show `mp-300k` versus eCPS with target deltas and compatibility/runtime gate status
M2: Benchmark manifest frozen	`microplex-evals`, `microplex-us`	microsim benchmark definitions, periods, outputs, baseline artifacts, and pass thresholds	every later benchmark report uses the same manifest unless an explicit version bump is recorded
M3: National benchmark report	`microplex-us`, `microplex-evals`	run the frozen microsim benchmark suite	report explains fiscal and distributional differences for a candidate that already passed compatibility and runtime gates
M4: Target registry release gate	`arch-data`, `microplex-us`	target scope/provenance hardening	no known source-scope mismatches in release targets
M5: Calibration standard path	`microplex-us`	hard-concrete auto-disable, loss curves, held-out targets	documented `mp-300k` and `mp-3m-fast` calibration commands
M6: Compatibility and beta publication	`microplex-us`, `policyengine-us-data`	publication path and rollback contract	`mp-300k` loads through normal dataset interfaces with a documented rollback path
M7: `mp-3m-fast` artifact	`microplex-us`	sparse/sharded output and profiling	local microsim runtime is practical
M8: Default switch	`policyengine-us-data`, `policyengine-us`	benchmark pass and rollback path	`mp-300k` can replace eCPS in a controlled release

What not to do

Do not wait for the local pipeline to be perfect before shipping the national replacement candidate.
Do not spend local pipeline effort before the national loop can load, run, score, and index one candidate reproducibly.
Do not treat PE eCPS calibration choices as truth just because they are the incumbent.
Do not judge target loss before compatibility and runtime gates have passed.
Do not add source-dataset diagnostic variables to the PolicyEngine model just to make Microplex easier to debug.
Do not let one-off artifacts become release claims without CI gate reports, dashboard-indexed configs, and scores.
Do not flatten relational data, such as capital gains lots, into the main H5 unless PolicyEngine needs it at that entity level.

Open questions

Should the default national runtime gate remain 1.25x eCPS, or should it be stricter before public default switch?
For mp-3m, should the primary public artifact be state shards, CD shards, or both?
Which local target groups should be held out by default to tune epochs and prevent local overfit?
Which high-wealth records beyond Forbes should be fixed-spine rather than donor-imputed?

MaxGhenis · 2026-05-26T13:49:39Z

MaxGhenis
May 26, 2026
Maintainer Author

Tracking issues filed from this plan:

Build and score the first mp-300k replacement candidate #11 - Build and score the first mp-national replacement candidate
Harden Arch target semantics for Microplex eCPS replacement arch-data#4 - Harden Arch target semantics for Microplex eCPS replacement
Standardize GD calibration for mp-300k and mp-3m-fast #12 - Standardize GD calibration and held-out target curves
Create eCPS replacement microsimulation benchmark suite for mp-300k microplex-evals#1 - Create eCPS replacement microsimulation benchmark suite
Add replacement-readiness dashboard gates for Microplex record-count tiers #13 - Add replacement-readiness dashboard gates
Add H5 compatibility checks for mp-300k exports #14 - Add H5 compatibility checks for Microplex PolicyEngine exports
Publish and load Microplex mp-300k beta dataset policyengine-us-data#1142 - Publish and load Microplex national beta dataset
Define Microplex mp-300k default switch and rollback behavior policyengine-us#8506 - Define Microplex dataset default switch and rollback behavior
Make mp-3m-fast sparse, sharded, and profiled #15 - Make mp-local fast artifact sparse, sharded, and profiled

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: replace eCPS with Microplex record-count datasets #16

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Plan: replace eCPS with Microplex record-count datasets #16

Uh oh!

Uh oh!

MaxGhenis May 26, 2026 Maintainer

Microplex path to replace eCPS

Current decision

Immediate execution loop

Size classes

Shared architecture

mp-300k build path

mp-300k release gates

mp-3m build path

mp-3m release gates

Dashboard contract

Cross-repo dependency graph

Epics and issue-sized tasks

Epic 1: National replacement candidate

Epic 2: Target registry hardening

Epic 3: Calibration simplification

Epic 4: Microsimulation benchmark suite

Epic 5: Dashboard and release gates

Epic 6: Compatibility and publication contract

Epic 7: Local pipeline scalability

Initial milestones

What not to do

Open questions

Replies: 1 comment

Uh oh!

Uh oh!

MaxGhenis May 26, 2026 Maintainer Author

MaxGhenis
May 26, 2026
Maintainer

`mp-300k` build path

`mp-300k` release gates

`mp-3m` build path

`mp-3m` release gates

MaxGhenis
May 26, 2026
Maintainer Author