Skip to content

Centralize generalizable pipeline logic from microplex-us into microplex core #231

@MaxGhenis

Description

@MaxGhenis

Context

Per AGENTS.md ("keep the US pack thin; push shared abstractions upstream into core; if a seam is useful for both UK and US, move it to microplex"), several heavy US-local modules reimplement — or could move to — core surfaces. Core already owns sources, targets, fusion, calibration (Calibrator + microcalibrate adapter), reweighting (reweighting.py, targets/reweighting.py, targets/bundles.py), and an eval harness (eval/harness.py, eval/reweighting_benchmark.py).

Candidates ranked by leverage. Verify each US version genuinely duplicates core (vs intentional country-specific extension) before relocating.

1. eCPS-replacement comparison harness → core eval (biggest win)

src/microplex_us/pipelines/ecps_replacement_comparison.py (1,607 lines) is ~90% country-agnostic: ~28 of ~33 functions are matched-household sampling (_write_matched_dataset, _household_weights, _entity_*), symmetric refit (_fit_dense_refit, _objective), holdout (_build_holdout_target_mask, _validate_common_targets, _filter_loss_inputs_by_scope), scoring/diagnostics (_target_loss_diagnostics, _refit_matrix_score_*, _target_family_breakdown, _target_bucket_breakdown, _diagnostic_unweighted_msre, _protected_family_losses), and utils (_sha256, _dataset_descriptor, …).

The only US-specific seam is _extract_pe_native_loss_inputs (shells to the PE-US scorer + US target DB to build the loss matrix) plus the US bad-target list.

→ Move the harness to microplex.eval (likely merging with the existing, apparently-overlapping eval/reweighting_benchmark.py), parameterized by a loss-input extractor protocol + target provider + baseline resolver. US becomes a ~200-line provider implementing the PE-US extractor. Also unblocks #117 (CI eval).

2. PE-native refit solver → core reweighting

src/microplex_us/pipelines/pe_native_optimization.py (optimize_pe_native_loss_weights — the monotone accelerated projected-gradient refit; rewrite_policyengine_us_dataset_weights). AGENTS.md says reweighting/solver belongs in core and local code "should remain a thin adapter over core bundle/reweighting surfaces." The projected-gradient + simplex-projection solver is pure numerics (loss matrix → weights) → core. Keep only the PE-entity weight I/O (household → tax_unit/spm_unit/family/marital rewrite) local, parameterized by the entity list (the #221 empty-derived-weight-group guard generalizes).

3. CPS-passthrough / income-split mechanism → core fusion/donor

(#226#228) The splitter — preserve survey-measured totals when collapsing donor clones onto a survey scaffold; derive component splits from the survey total; impute only donor-specific detail + clone records — is identical for UK FRS + admin clones. US keeps only the variable specs + split fractions. This is the core-targeted version of #229.

4. De-dup drifted modules

pe_targets.py, target_registry.py, unified_calibration.py, supabase_targets.py exist in both microplex core and microplex_us. Confirm whether the US copies are drifted duplicates and collapse to a single source of truth in core.

Refs: #229 (passthrough extraction — target should be core), #117 (CI eval).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions