Plan: replace eCPS with Microplex national and local datasets

# Microplex path to replace eCPS

This is the operational plan for replacing PolicyEngine's Enhanced CPS
(`enhanced_cps_2024.h5`) with Microplex-produced datasets.

It separates two products that should share source semantics, target
compilation, scoring, and artifact discipline but should not share the same
runtime envelope:

- `mp-national`: a national default dataset small enough for normal
  PolicyEngine microsimulation.
- `mp-local`: a larger local-area pipeline for state, congressional district,
  and other subnational analyses.

The replacement claim is not "Microplex copies eCPS." The claim is that
Microplex becomes the canonical data construction path for PolicyEngine-US,
while `policyengine-us` remains the measurement and microsimulation runtime.

## Current decision

Microplex should replace eCPS in stages:

1. Ship `mp-national` as a beta national H5 once it passes schema compatibility
   and beats a pinned latest eCPS baseline on the broad target suite.
2. Make `mp-national` the default national dataset only after the microsimulation
   benchmark suite, runtime gate, and rollback path pass.
3. Continue `mp-local` as a heavier local-calibration pipeline, with a sparse
   analysis artifact or geography shards, before replacing local eCPS/L0
   workflows.

This avoids blocking the national replacement on the harder local-scale
engineering problem.

## Product split

| Product | Intended use | Size target | Main blocker | Replacement bar |
| --- | --- | --- | --- | --- |
| `mp-national` | Default national PolicyEngine microsims and public analyses | roughly current small ASEC + ACS100k scale | model quality and benchmark confidence | beats latest eCPS on broad loss and microsim benchmarks while remaining fast enough for routine use |
| `mp-local` | State, CD, and local-area analyses | larger ACS/local target coverage; may be sharded | local runtime, disk, and sparse output design | beats PE local L0 packages on local targets and can run practical local microsims |

User-facing routing:

- National default microsimulation: `mp-national`.
- National results broken out by state or demographic group: `mp-national`,
  unless the analysis explicitly asks for local-calibrated weights.
- State-calibrated analyses: `mp-local-fast` state shard when available;
  otherwise fall back to `mp-national` only with an explicit "not
  state-calibrated" label.
- Congressional district and smaller geographies: `mp-local-fast` geography
  shard when available. Do not silently use `mp-national` for official local
  claims.
- Research sweeps and target debugging: `mp-local-rich`.

## Shared architecture

```mermaid
flowchart TD
  A["Arch target registry"] --> B["Microplex source providers"]
  B --> C["Source semantics and donor specs"]
  C --> D["Donor integration and imputation"]
  D --> E["Synthetic population spine"]
  E --> F["Fixed spine append: Forbes and other must-keep records"]
  E --> G["Relational add-ons: capital gains lots, assets, diagnostics"]
  F --> H["Residualized GD/L0 calibration"]
  G --> H
  H --> I["PE-ingestable H5 export"]
  I --> J["PolicyEngine target scoring"]
  I --> K["Microsimulation benchmark suite"]
  J --> L["Run dashboard and release gate"]
  K --> L
```

Important design rule: fixed records, such as Forbes top-tail records, should be
added after ordinary population synthesis and excluded from donor fitting. Their
weighted target contributions should be subtracted from calibration targets so
nonfixed records are calibrated to the residual population.

## `mp-national` build path

`mp-national` should use the best currently known Microplex construction path:

- CPS ASEC as the core demographic and program scaffold.
- ACS100k to improve household, geography, housing, and local demographic
  support without making the national H5 too slow.
- PUF donor integration for filer/tax variables, including top-end wages,
  capital gains, business income, retirement income, itemized deductions, and
  filing concepts.
- SIPP, SCF, SSA-style, and other Arch-backed sources where they improve
  disability, SSI, assets, and program variables.
- Forbes fixed-spine append for ultra-high-wealth units, with target
  residualization rather than asking the regular population to absorb those
  aggregates.
- Capital-gains lots as a relational extension, not as a reason to flatten
  every possible record-level detail into the primary H5.
- Gradient-based calibration as the standard Microplex weight path. If
  `l0_lambda=0` remains best for national quality, the implementation should
  disable hard-concrete gates rather than carrying a fake sparsity mechanism.

### National release gates

`mp-national` can become the default national dataset when all of these are
true:

1. Pinned baseline: every comparison records the eCPS H5 path, eCPS SHA256,
   `policyengine-us-data` commit, `policyengine-us` version, target DB path,
   and target DB SHA256. "Latest eCPS" means that pinned baseline, not a moving
   label.
2. Schema compatibility: passes an automated H5 contract check against the
   current `policyengine-us` loader, including entity tables, IDs, joins,
   weights, periods, dtypes, missing-value conventions, and absence of
   source-dataset diagnostic variables.
3. Target loss: broad target loss is lower than the pinned eCPS baseline on
   common kept targets.
4. Protected target families: each protected family is no worse than eCPS by
   more than 5% relative loss or 0.005 absolute loss, whichever is larger.
   Protected families are SSI, SNAP, wages, self-employment income, capital
   gains, interest, dividends, retirement income, disability, and household net
   income.
5. High-salience aggregates: absolute percentage error is at least as good as
   eCPS, or the dashboard marks and explains the regression, for SSI recipients,
   SSI value, SNAP recipients, SNAP value, wage income, long-term capital gains,
   taxable interest, ordinary dividends, self-employment income, and household
   net income.
6. Microsimulation benchmarks: passes the fixed benchmark suite covering SSI
   asset limits, CTC/EITC-style tax reforms, capital-gains indexing, and at
   the frozen external Tara Watson SSI asset-limit benchmark. Pass means no
   unexplained fiscal or household-net-income delta exceeding 5% of the eCPS
   estimate or $5 billion, whichever is larger.
7. Runtime: median runtime over the fixed national benchmark suite is no more
   than 1.25x eCPS. A candidate can enter beta up to 2.0x eCPS, but cannot
   become default above 1.25x without an explicit product decision.
8. Artifact size: national H5 is no more than 2x the eCPS H5 size unless the
   extra size is from a separately loadable relational extension.
9. Artifacts: writes a complete immutable bundle containing config, source
   versions, target DB hash, score files, target deltas, record counts, nonzero
   weights, effective sample size, and benchmark outputs.
10. Release contract: has a stable H5 publication path, rollback path, and a
   documented `policyengine-us-data` integration point.

## `mp-local` build path

`mp-local` should not be forced into the same artifact shape as
`mp-national`. Local accuracy needs more ACS/local target support, but routine
local microsims need sparse or sharded outputs.

Recommended path:

1. Build from the same source and semantic registry as `mp-national`.
2. Expand ACS and local targets incrementally rather than jumping straight to a
   full monolithic artifact.
3. Calibrate against local target suites from Arch: state, congressional
   district, age/race/household-type, income, benefits, disability, and program
   participation where defensible.
4. Produce two outputs:
   - `mp-local-rich`: best-fit, larger research artifact.
   - `mp-local-fast`: sparse or sharded analysis artifact for routine
     PolicyEngine use.
5. Prefer geography shards or post-fit sparse output over requiring every
   national run to materialize the entire local record universe.

### Local release gates

`mp-local` can replace the PE local L0 pipeline when all of these are true:

1. Pinned local baselines: every comparison records the PE local small-L0 and
   big-L0 artifact paths or package IDs, their source commits, weight files,
   target DB path and SHA256, objective definition, and the exact target subset
   used by the incumbent objective.
2. Beats PE local small-L0 and big-L0 packages on their pinned actual objective.
3. Beats those packages on Microplex's broader Arch target suite.
4. Has explicit held-out target evaluation so local overfitting is visible.
5. Produces a fast analysis artifact whose median single-geography benchmark
   runtime is no more than 2x the incumbent PE local artifact.
6. Can rebuild `mp-local-fast` on the standard cloud runner in less than 12
   hours without manual cleanup, or has sharded build jobs whose slowest shard is
   below that bound.
7. Has recoverable, profiled build stages for donor integration, PE table
   materialization, scoring, and export.

Held-out target evaluation should hold out complete target groups, not random
rows inside a target. The default split should include at least one geography
family and one income/program family so epoch tuning cannot overfit only the
headline national aggregates.

## Dashboard contract

The living dashboard should be the source of truth for replacement readiness.
It should show, at minimum:

- latest `mp-national` candidate versus latest eCPS
- latest `mp-local-rich` and `mp-local-fast` candidates versus PE small-L0 and
  big-L0
- broad loss, local loss, PE-actual objective loss, and microsim benchmark
  deltas
- record counts, positive weights, effective sample size, weight concentration,
  H5 size, and median microsim runtime
- top target wins and losses by source family
- whether each release gate is passing, failing, or unmeasured

Every serious run should write a machine-readable loss record that the dashboard
can index without scraping logs.

Dashboard/indexing is a release-blocking workstream. It needs:

- a stable loss-result JSON schema
- a run indexer that can discover completed local artifacts
- dashboard cells for each release gate
- a published "current candidate" artifact path
- CI or scheduled refresh for static score files, where practical

## Cross-repo dependency graph

```mermaid
flowchart LR
  A["arch-data: target facts and semantic scope"] --> B["microplex-us: build, score, and H5 export"]
  B --> C["microplex-evals: microsim benchmark reports"]
  B --> D["policyengine-us-data: artifact publication and loader integration"]
  C --> D
  D --> E["policyengine-us: default dataset switch and fallback behavior"]
```

Required handoffs:

- `arch-data` to `microplex-us`: source facts, target scopes, exclusions, and
  coverage reports are importable and pinned by content hash.
- `microplex-us` to `microplex-evals`: candidate H5, manifest, score files, and
  source/target provenance are sufficient to run benchmarks without rebuilding.
- `microplex-us` to `policyengine-us-data`: exported H5 and metadata satisfy the
  dataset publication contract.
- `policyengine-us-data` to `policyengine-us`: loader names, default dataset
  selection, fallback behavior, and release notes are stable.
- `policyengine-us` default switch: happens only after beta artifacts and
  benchmark reports are available for rollback comparison.

## Epics and issue-sized tasks

### Epic 1: National replacement candidate

Build and score the current best `mp-national` path.

Child issues:

- `microplex-us`: produce small ASEC + ACS100k build with PUF, SIPP/SCF/Arch
  additions, Forbes fixed spine, and capital-gains lots enabled.
- `microplex-us`: score candidate against pinned latest eCPS and write top
  target delta report.
- `microplex-evals`: run microsimulation benchmark suite against candidate and
  eCPS.

Exit:

- decision on whether this candidate is release-track or needs another modeling
  iteration

### Epic 2: Target registry hardening

Make Arch the canonical source of target semantics used by both national and
local Microplex builds.

Child issues:

- `arch-data`: add source/concept exclusions for misleading broad concepts.
- `arch-data`: add explicit target scope labels: filer, full population,
  recipient, household, tax unit, SPM unit, state, CD, local.
- `microplex-us`: consume importable target coverage reports by product.
- `arch-data` and `microplex-us`: add tests that prevent known semantic
  regressions, including proprietors income and SSI recipient/value confusion.

Exit:

- no material target in the release suite lacks source, scope, and entity
  provenance

### Epic 3: Calibration simplification

Make the winning gradient-based weight path the standard Microplex path.

Child issues:

- `microplex-us`: disable hard-concrete gates automatically when
  `l0_lambda=0`.
- `microplex-us`: preserve L0 gates only for sparse local artifacts or explicit
  experiments.
- `microplex-us`: write loss curves and held-out target curves for every run.
- `microplex-us`: define epoch stopping rules from held-out target performance,
  not only training loss.

Exit:

- one standard national calibration command and one standard sparse-local
  calibration command

### Epic 4: Microsimulation benchmark suite

Codify policy outcomes that must be compared before replacement.

Child issues:

- `microplex-evals`: add national benchmark suite covering SSI asset limits,
  CTC/EITC, capital-gains indexing, and the Tara Watson SSI asset-limit
  benchmark.
- `microplex-evals`: freeze a benchmark manifest before judging any release
  candidate, including reform definitions, periods, expected output fields, and
  pinned baseline artifacts.
- `microplex-evals`: report aggregate fiscal impact, household net income,
  winners/losers, poverty/SPM where applicable, and component deltas.
- `microplex-evals`: enforce PolicyEngine MicroSeries operations throughout;
  no manual weight math.

Exit:

- every candidate has a comparable benchmark report against eCPS

### Epic 5: Dashboard and release readiness

Make replacement claims visible from a durable dashboard rather than ad hoc log
inspection.

Child issues:

- `microplex-us`: define stable loss-result JSON schema.
- `microplex-us`: add run indexer support for national, local-rich, and
  local-fast candidate artifacts.
- `microplex-us`: add dashboard gate cells for target loss, protected families,
  microsim benchmarks, compatibility, runtime, and artifact size.
- `microplex-us`: publish the current candidate artifact path and score bundle.

Exit:

- replacement readiness is visible from one dashboard artifact

### Epic 6: Compatibility and publication contract

Make the H5 and metadata contract explicit before any default switch.

Child issues:

- `microplex-us`: add automated H5 compatibility check against
  `policyengine-us`.
- `policyengine-us-data`: add loader/publication path for Microplex national
  beta artifact.
- `policyengine-us`: define default dataset switch, feature flag, and rollback
  behavior.
- `policyengine-us-data`: document the eCPS incumbent baseline and Microplex
  replacement status.

Exit:

- PolicyEngine can load `mp-national` through the normal dataset interface with
  a documented rollback path

### Epic 7: Local pipeline scalability

Make local builds practical and diagnosable.

Child issues:

- `microplex-us`: add profiled stage timings and RSS for donor integration, PE
  table construction, calibration, scoring, and export.
- `microplex-us`: implement chunked or vectorized PE table construction where
  needed.
- `microplex-us`: choose sparse output strategy: stronger L0, post-fit pruning,
  geography shards, or a combination.
- `microplex-us`: add disk guardrails and resumable checkpoints.

Exit:

- `mp-local-fast` can run routine local microsims without a multi-day laptop
  pipeline

## Initial milestones

| Milestone | Owner repo | Blocking work | Success criterion |
| --- | --- | --- | --- |
| M1: National candidate scored | `microplex-us` | finish current small build and score latest eCPS comparison | dashboard shows `mp-national` versus eCPS with target deltas |
| M2: National benchmark report | `microplex-us`, `microplex-evals` | microsim benchmark suite | report explains fiscal and distributional differences |
| M3: Target registry release gate | `arch-data`, `microplex-us` | target scope/provenance hardening | no known source-scope mismatches in release targets |
| M4: Calibration standard path | `microplex-us` | hard-concrete auto-disable, loss curves, held-out targets | documented default national and sparse-local calibration commands |
| M5: Dashboard readiness | `microplex-us` | loss schema, indexer, gate cells | replacement readiness is visible from one artifact |
| M6: Compatibility and beta publication | `microplex-us`, `policyengine-us-data` | H5 contract, publication path | `mp-national` loads through normal dataset interfaces |
| M7: Local fast artifact | `microplex-us` | sparse/sharded output and profiling | local microsim runtime is practical |
| M8: Default switch | `policyengine-us-data`, `policyengine-us` | benchmark pass and rollback path | `mp-national` can replace eCPS in a controlled release |

## What not to do

- Do not wait for the local pipeline to be perfect before shipping the national
  replacement candidate.
- Do not treat PE eCPS calibration choices as truth just because they are the
  incumbent.
- Do not add source-dataset diagnostic variables to the PolicyEngine model just
  to make Microplex easier to debug.
- Do not let one-off artifacts become release claims without dashboard-indexed
  configs and scores.
- Do not flatten relational data, such as capital gains lots, into the main H5
  unless PolicyEngine needs it at that entity level.

## Open questions

1. Should the default national runtime gate remain 1.25x eCPS, or should it be
   stricter before public default switch?
2. For `mp-local`, should the primary public artifact be state shards, CD
   shards, or both?
3. Which local target groups should be held out by default to tune epochs and
   prevent local overfit?
4. Which high-wealth records beyond Forbes should be fixed-spine rather than
   donor-imputed?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: replace eCPS with Microplex national and local datasets #10

Microplex path to replace eCPS

Current decision

Product split

Shared architecture

`mp-national` build path

National release gates

`mp-local` build path

Local release gates

Dashboard contract

Cross-repo dependency graph

Epics and issue-sized tasks

Epic 1: National replacement candidate

Epic 2: Target registry hardening

Epic 3: Calibration simplification

Epic 4: Microsimulation benchmark suite

Epic 5: Dashboard and release readiness

Epic 6: Compatibility and publication contract

Epic 7: Local pipeline scalability

Initial milestones

What not to do

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Product	Intended use	Size target	Main blocker	Replacement bar
`mp-national`	Default national PolicyEngine microsims and public analyses	roughly current small ASEC + ACS100k scale	model quality and benchmark confidence	beats latest eCPS on broad loss and microsim benchmarks while remaining fast enough for routine use
`mp-local`	State, CD, and local-area analyses	larger ACS/local target coverage; may be sharded	local runtime, disk, and sparse output design	beats PE local L0 packages on local targets and can run practical local microsims

Milestone	Owner repo	Blocking work	Success criterion
M1: National candidate scored	`microplex-us`	finish current small build and score latest eCPS comparison	dashboard shows `mp-national` versus eCPS with target deltas
M2: National benchmark report	`microplex-us`, `microplex-evals`	microsim benchmark suite	report explains fiscal and distributional differences
M3: Target registry release gate	`arch-data`, `microplex-us`	target scope/provenance hardening	no known source-scope mismatches in release targets
M4: Calibration standard path	`microplex-us`	hard-concrete auto-disable, loss curves, held-out targets	documented default national and sparse-local calibration commands
M5: Dashboard readiness	`microplex-us`	loss schema, indexer, gate cells	replacement readiness is visible from one artifact
M6: Compatibility and beta publication	`microplex-us`, `policyengine-us-data`	H5 contract, publication path	`mp-national` loads through normal dataset interfaces
M7: Local fast artifact	`microplex-us`	sparse/sharded output and profiling	local microsim runtime is practical
M8: Default switch	`policyengine-us-data`, `policyengine-us`	benchmark pass and rollback path	`mp-national` can replace eCPS in a controlled release

Plan: replace eCPS with Microplex national and local datasets #10

Description

Microplex path to replace eCPS

Current decision

Product split

Shared architecture

mp-national build path

National release gates

mp-local build path

Local release gates

Dashboard contract

Cross-repo dependency graph

Epics and issue-sized tasks

Epic 1: National replacement candidate

Epic 2: Target registry hardening

Epic 3: Calibration simplification

Epic 4: Microsimulation benchmark suite

Epic 5: Dashboard and release readiness

Epic 6: Compatibility and publication contract

Epic 7: Local pipeline scalability

Initial milestones

What not to do

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`mp-national` build path

`mp-local` build path