Skip to content

Plan: replace eCPS with Microplex national and local datasets #10

@MaxGhenis

Description

@MaxGhenis

Microplex path to replace eCPS

This is the operational plan for replacing PolicyEngine's Enhanced CPS
(enhanced_cps_2024.h5) with Microplex-produced datasets.

It separates two products that should share source semantics, target
compilation, scoring, and artifact discipline but should not share the same
runtime envelope:

  • mp-national: a national default dataset small enough for normal
    PolicyEngine microsimulation.
  • mp-local: a larger local-area pipeline for state, congressional district,
    and other subnational analyses.

The replacement claim is not "Microplex copies eCPS." The claim is that
Microplex becomes the canonical data construction path for PolicyEngine-US,
while policyengine-us remains the measurement and microsimulation runtime.

Current decision

Microplex should replace eCPS in stages:

  1. Ship mp-national as a beta national H5 once it passes schema compatibility
    and beats a pinned latest eCPS baseline on the broad target suite.
  2. Make mp-national the default national dataset only after the microsimulation
    benchmark suite, runtime gate, and rollback path pass.
  3. Continue mp-local as a heavier local-calibration pipeline, with a sparse
    analysis artifact or geography shards, before replacing local eCPS/L0
    workflows.

This avoids blocking the national replacement on the harder local-scale
engineering problem.

Product split

Product Intended use Size target Main blocker Replacement bar
mp-national Default national PolicyEngine microsims and public analyses roughly current small ASEC + ACS100k scale model quality and benchmark confidence beats latest eCPS on broad loss and microsim benchmarks while remaining fast enough for routine use
mp-local State, CD, and local-area analyses larger ACS/local target coverage; may be sharded local runtime, disk, and sparse output design beats PE local L0 packages on local targets and can run practical local microsims

User-facing routing:

  • National default microsimulation: mp-national.
  • National results broken out by state or demographic group: mp-national,
    unless the analysis explicitly asks for local-calibrated weights.
  • State-calibrated analyses: mp-local-fast state shard when available;
    otherwise fall back to mp-national only with an explicit "not
    state-calibrated" label.
  • Congressional district and smaller geographies: mp-local-fast geography
    shard when available. Do not silently use mp-national for official local
    claims.
  • Research sweeps and target debugging: mp-local-rich.

Shared architecture

flowchart TD
  A["Arch target registry"] --> B["Microplex source providers"]
  B --> C["Source semantics and donor specs"]
  C --> D["Donor integration and imputation"]
  D --> E["Synthetic population spine"]
  E --> F["Fixed spine append: Forbes and other must-keep records"]
  E --> G["Relational add-ons: capital gains lots, assets, diagnostics"]
  F --> H["Residualized GD/L0 calibration"]
  G --> H
  H --> I["PE-ingestable H5 export"]
  I --> J["PolicyEngine target scoring"]
  I --> K["Microsimulation benchmark suite"]
  J --> L["Run dashboard and release gate"]
  K --> L
Loading

Important design rule: fixed records, such as Forbes top-tail records, should be
added after ordinary population synthesis and excluded from donor fitting. Their
weighted target contributions should be subtracted from calibration targets so
nonfixed records are calibrated to the residual population.

mp-national build path

mp-national should use the best currently known Microplex construction path:

  • CPS ASEC as the core demographic and program scaffold.
  • ACS100k to improve household, geography, housing, and local demographic
    support without making the national H5 too slow.
  • PUF donor integration for filer/tax variables, including top-end wages,
    capital gains, business income, retirement income, itemized deductions, and
    filing concepts.
  • SIPP, SCF, SSA-style, and other Arch-backed sources where they improve
    disability, SSI, assets, and program variables.
  • Forbes fixed-spine append for ultra-high-wealth units, with target
    residualization rather than asking the regular population to absorb those
    aggregates.
  • Capital-gains lots as a relational extension, not as a reason to flatten
    every possible record-level detail into the primary H5.
  • Gradient-based calibration as the standard Microplex weight path. If
    l0_lambda=0 remains best for national quality, the implementation should
    disable hard-concrete gates rather than carrying a fake sparsity mechanism.

National release gates

mp-national can become the default national dataset when all of these are
true:

  1. Pinned baseline: every comparison records the eCPS H5 path, eCPS SHA256,
    policyengine-us-data commit, policyengine-us version, target DB path,
    and target DB SHA256. "Latest eCPS" means that pinned baseline, not a moving
    label.
  2. Schema compatibility: passes an automated H5 contract check against the
    current policyengine-us loader, including entity tables, IDs, joins,
    weights, periods, dtypes, missing-value conventions, and absence of
    source-dataset diagnostic variables.
  3. Target loss: broad target loss is lower than the pinned eCPS baseline on
    common kept targets.
  4. Protected target families: each protected family is no worse than eCPS by
    more than 5% relative loss or 0.005 absolute loss, whichever is larger.
    Protected families are SSI, SNAP, wages, self-employment income, capital
    gains, interest, dividends, retirement income, disability, and household net
    income.
  5. High-salience aggregates: absolute percentage error is at least as good as
    eCPS, or the dashboard marks and explains the regression, for SSI recipients,
    SSI value, SNAP recipients, SNAP value, wage income, long-term capital gains,
    taxable interest, ordinary dividends, self-employment income, and household
    net income.
  6. Microsimulation benchmarks: passes the fixed benchmark suite covering SSI
    asset limits, CTC/EITC-style tax reforms, capital-gains indexing, and at
    the frozen external Tara Watson SSI asset-limit benchmark. Pass means no
    unexplained fiscal or household-net-income delta exceeding 5% of the eCPS
    estimate or $5 billion, whichever is larger.
  7. Runtime: median runtime over the fixed national benchmark suite is no more
    than 1.25x eCPS. A candidate can enter beta up to 2.0x eCPS, but cannot
    become default above 1.25x without an explicit product decision.
  8. Artifact size: national H5 is no more than 2x the eCPS H5 size unless the
    extra size is from a separately loadable relational extension.
  9. Artifacts: writes a complete immutable bundle containing config, source
    versions, target DB hash, score files, target deltas, record counts, nonzero
    weights, effective sample size, and benchmark outputs.
  10. Release contract: has a stable H5 publication path, rollback path, and a
    documented policyengine-us-data integration point.

mp-local build path

mp-local should not be forced into the same artifact shape as
mp-national. Local accuracy needs more ACS/local target support, but routine
local microsims need sparse or sharded outputs.

Recommended path:

  1. Build from the same source and semantic registry as mp-national.
  2. Expand ACS and local targets incrementally rather than jumping straight to a
    full monolithic artifact.
  3. Calibrate against local target suites from Arch: state, congressional
    district, age/race/household-type, income, benefits, disability, and program
    participation where defensible.
  4. Produce two outputs:
    • mp-local-rich: best-fit, larger research artifact.
    • mp-local-fast: sparse or sharded analysis artifact for routine
      PolicyEngine use.
  5. Prefer geography shards or post-fit sparse output over requiring every
    national run to materialize the entire local record universe.

Local release gates

mp-local can replace the PE local L0 pipeline when all of these are true:

  1. Pinned local baselines: every comparison records the PE local small-L0 and
    big-L0 artifact paths or package IDs, their source commits, weight files,
    target DB path and SHA256, objective definition, and the exact target subset
    used by the incumbent objective.
  2. Beats PE local small-L0 and big-L0 packages on their pinned actual objective.
  3. Beats those packages on Microplex's broader Arch target suite.
  4. Has explicit held-out target evaluation so local overfitting is visible.
  5. Produces a fast analysis artifact whose median single-geography benchmark
    runtime is no more than 2x the incumbent PE local artifact.
  6. Can rebuild mp-local-fast on the standard cloud runner in less than 12
    hours without manual cleanup, or has sharded build jobs whose slowest shard is
    below that bound.
  7. Has recoverable, profiled build stages for donor integration, PE table
    materialization, scoring, and export.

Held-out target evaluation should hold out complete target groups, not random
rows inside a target. The default split should include at least one geography
family and one income/program family so epoch tuning cannot overfit only the
headline national aggregates.

Dashboard contract

The living dashboard should be the source of truth for replacement readiness.
It should show, at minimum:

  • latest mp-national candidate versus latest eCPS
  • latest mp-local-rich and mp-local-fast candidates versus PE small-L0 and
    big-L0
  • broad loss, local loss, PE-actual objective loss, and microsim benchmark
    deltas
  • record counts, positive weights, effective sample size, weight concentration,
    H5 size, and median microsim runtime
  • top target wins and losses by source family
  • whether each release gate is passing, failing, or unmeasured

Every serious run should write a machine-readable loss record that the dashboard
can index without scraping logs.

Dashboard/indexing is a release-blocking workstream. It needs:

  • a stable loss-result JSON schema
  • a run indexer that can discover completed local artifacts
  • dashboard cells for each release gate
  • a published "current candidate" artifact path
  • CI or scheduled refresh for static score files, where practical

Cross-repo dependency graph

flowchart LR
  A["arch-data: target facts and semantic scope"] --> B["microplex-us: build, score, and H5 export"]
  B --> C["microplex-evals: microsim benchmark reports"]
  B --> D["policyengine-us-data: artifact publication and loader integration"]
  C --> D
  D --> E["policyengine-us: default dataset switch and fallback behavior"]
Loading

Required handoffs:

  • arch-data to microplex-us: source facts, target scopes, exclusions, and
    coverage reports are importable and pinned by content hash.
  • microplex-us to microplex-evals: candidate H5, manifest, score files, and
    source/target provenance are sufficient to run benchmarks without rebuilding.
  • microplex-us to policyengine-us-data: exported H5 and metadata satisfy the
    dataset publication contract.
  • policyengine-us-data to policyengine-us: loader names, default dataset
    selection, fallback behavior, and release notes are stable.
  • policyengine-us default switch: happens only after beta artifacts and
    benchmark reports are available for rollback comparison.

Epics and issue-sized tasks

Epic 1: National replacement candidate

Build and score the current best mp-national path.

Child issues:

  • microplex-us: produce small ASEC + ACS100k build with PUF, SIPP/SCF/Arch
    additions, Forbes fixed spine, and capital-gains lots enabled.
  • microplex-us: score candidate against pinned latest eCPS and write top
    target delta report.
  • microplex-evals: run microsimulation benchmark suite against candidate and
    eCPS.

Exit:

  • decision on whether this candidate is release-track or needs another modeling
    iteration

Epic 2: Target registry hardening

Make Arch the canonical source of target semantics used by both national and
local Microplex builds.

Child issues:

  • arch-data: add source/concept exclusions for misleading broad concepts.
  • arch-data: add explicit target scope labels: filer, full population,
    recipient, household, tax unit, SPM unit, state, CD, local.
  • microplex-us: consume importable target coverage reports by product.
  • arch-data and microplex-us: add tests that prevent known semantic
    regressions, including proprietors income and SSI recipient/value confusion.

Exit:

  • no material target in the release suite lacks source, scope, and entity
    provenance

Epic 3: Calibration simplification

Make the winning gradient-based weight path the standard Microplex path.

Child issues:

  • microplex-us: disable hard-concrete gates automatically when
    l0_lambda=0.
  • microplex-us: preserve L0 gates only for sparse local artifacts or explicit
    experiments.
  • microplex-us: write loss curves and held-out target curves for every run.
  • microplex-us: define epoch stopping rules from held-out target performance,
    not only training loss.

Exit:

  • one standard national calibration command and one standard sparse-local
    calibration command

Epic 4: Microsimulation benchmark suite

Codify policy outcomes that must be compared before replacement.

Child issues:

  • microplex-evals: add national benchmark suite covering SSI asset limits,
    CTC/EITC, capital-gains indexing, and the Tara Watson SSI asset-limit
    benchmark.
  • microplex-evals: freeze a benchmark manifest before judging any release
    candidate, including reform definitions, periods, expected output fields, and
    pinned baseline artifacts.
  • microplex-evals: report aggregate fiscal impact, household net income,
    winners/losers, poverty/SPM where applicable, and component deltas.
  • microplex-evals: enforce PolicyEngine MicroSeries operations throughout;
    no manual weight math.

Exit:

  • every candidate has a comparable benchmark report against eCPS

Epic 5: Dashboard and release readiness

Make replacement claims visible from a durable dashboard rather than ad hoc log
inspection.

Child issues:

  • microplex-us: define stable loss-result JSON schema.
  • microplex-us: add run indexer support for national, local-rich, and
    local-fast candidate artifacts.
  • microplex-us: add dashboard gate cells for target loss, protected families,
    microsim benchmarks, compatibility, runtime, and artifact size.
  • microplex-us: publish the current candidate artifact path and score bundle.

Exit:

  • replacement readiness is visible from one dashboard artifact

Epic 6: Compatibility and publication contract

Make the H5 and metadata contract explicit before any default switch.

Child issues:

  • microplex-us: add automated H5 compatibility check against
    policyengine-us.
  • policyengine-us-data: add loader/publication path for Microplex national
    beta artifact.
  • policyengine-us: define default dataset switch, feature flag, and rollback
    behavior.
  • policyengine-us-data: document the eCPS incumbent baseline and Microplex
    replacement status.

Exit:

  • PolicyEngine can load mp-national through the normal dataset interface with
    a documented rollback path

Epic 7: Local pipeline scalability

Make local builds practical and diagnosable.

Child issues:

  • microplex-us: add profiled stage timings and RSS for donor integration, PE
    table construction, calibration, scoring, and export.
  • microplex-us: implement chunked or vectorized PE table construction where
    needed.
  • microplex-us: choose sparse output strategy: stronger L0, post-fit pruning,
    geography shards, or a combination.
  • microplex-us: add disk guardrails and resumable checkpoints.

Exit:

  • mp-local-fast can run routine local microsims without a multi-day laptop
    pipeline

Initial milestones

Milestone Owner repo Blocking work Success criterion
M1: National candidate scored microplex-us finish current small build and score latest eCPS comparison dashboard shows mp-national versus eCPS with target deltas
M2: National benchmark report microplex-us, microplex-evals microsim benchmark suite report explains fiscal and distributional differences
M3: Target registry release gate arch-data, microplex-us target scope/provenance hardening no known source-scope mismatches in release targets
M4: Calibration standard path microplex-us hard-concrete auto-disable, loss curves, held-out targets documented default national and sparse-local calibration commands
M5: Dashboard readiness microplex-us loss schema, indexer, gate cells replacement readiness is visible from one artifact
M6: Compatibility and beta publication microplex-us, policyengine-us-data H5 contract, publication path mp-national loads through normal dataset interfaces
M7: Local fast artifact microplex-us sparse/sharded output and profiling local microsim runtime is practical
M8: Default switch policyengine-us-data, policyengine-us benchmark pass and rollback path mp-national can replace eCPS in a controlled release

What not to do

  • Do not wait for the local pipeline to be perfect before shipping the national
    replacement candidate.
  • Do not treat PE eCPS calibration choices as truth just because they are the
    incumbent.
  • Do not add source-dataset diagnostic variables to the PolicyEngine model just
    to make Microplex easier to debug.
  • Do not let one-off artifacts become release claims without dashboard-indexed
    configs and scores.
  • Do not flatten relational data, such as capital gains lots, into the main H5
    unless PolicyEngine needs it at that entity level.

Open questions

  1. Should the default national runtime gate remain 1.25x eCPS, or should it be
    stricter before public default switch?
  2. For mp-local, should the primary public artifact be state shards, CD
    shards, or both?
  3. Which local target groups should be held out by default to tune epochs and
    prevent local overfit?
  4. Which high-wealth records beyond Forbes should be fixed-spine rather than
    donor-imputed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions