Skip to content

Adopt microunit for tax-unit construction (guard against regression-to-eCPS) #113

@MaxGhenis

Description

@MaxGhenis

Summary

Adopt the new microunit package for tax-unit / SPM-unit / family / marital-unit construction in microplex-us, replacing the bespoke role-flag reconstruction currently in src/microplex_us/pipelines/us.py (_build_policyengine_tax_units*, _assign_family_and_spm_units, _assign_marital_units, and the role-flag coherence helpers — ~1,500 lines).

microunit is the rules-based tax-unit engine extracted from policyengine-us-data (PRs #824/#890 there), now standalone. It applies real filing/dependency rules to partition people into units, with two modes (policyengine, census_documented). Public API: from microunit import construct_tax_units.

Why

The eCPS-replacement benchmark traces ~78% of microplex's loss gap to tax-unit/SPM under-splitting (MP ~1.16 tax units/hh vs eCPS ~1.34; SPM 1.00 vs 1.04). MP reimplemented entity construction from scratch; eCPS uses the mature engine now in microunit. Reusing it removes a large maintenance surface and a known accuracy gap.

⚠️ Critical correctness guard (do not skip)

microunit IS eCPS's tax-unit construction logic. So adopting it makes microplex's tax units converge toward eCPS's. This creates two traps the implementation MUST guard against:

  1. Regression-to-eCPS masquerading as improvement. A drop in PE-native loss after adoption may simply mean MP's entities now match eCPS's — i.e. MP is becoming eCPS, not getting independently better. Any benchmark gain from this change must be interpreted as "converged to the incumbent's structure," NOT "microplex improved."
  2. Benchmark circularity. The sound eCPS-replacement comparison derives its tax-unit-dependent targets partly from the baseline (eCPS). If MP and eCPS now share construction code, MP can score well on those targets because they share logic, not because MP's underlying records are better. Scoring MP-with-microunit against eCPS is partly self-referential on the entity dimension.

Required before claiming any win: isolate the effect. Run the sound comparison (matched-N, symmetric refit, holdout) before vs after adoption on the SAME baseline + SAME target surface, and report the change as an entity-convergence effect. Do NOT report "microplex beats eCPS" off the back of adopting eCPS's own construction code. The honest question this answers is "does using the canonical entity engine change MP's loss, and in which direction" — not "is MP better."

Scope

  • Add microunit dependency (pre-PyPI: git+https://github.com/PolicyEngine/microunit@<sha>).
  • Replace the us.py role-flag construction path with microunit.construct_tax_units, mapping microplex's person frame to the expected input schema (microunit is source-agnostic and takes a normalized CPS-like person frame; MP must supply the right columns).
  • Keep MP's existing tests green; ADD the before/after isolation comparison described above as the acceptance gate.
  • Sequencing: land the policyengine-us-data adoption first and verify it in production, THEN do this — per maintainer instruction, fail one at a time.

Blocked on

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions