Summary
Adopt the new microunit package for tax-unit / SPM-unit / family / marital-unit construction in microplex-us, replacing the bespoke role-flag reconstruction currently in src/microplex_us/pipelines/us.py (_build_policyengine_tax_units*, _assign_family_and_spm_units, _assign_marital_units, and the role-flag coherence helpers — ~1,500 lines).
microunit is the rules-based tax-unit engine extracted from policyengine-us-data (PRs #824/#890 there), now standalone. It applies real filing/dependency rules to partition people into units, with two modes (policyengine, census_documented). Public API: from microunit import construct_tax_units.
Why
The eCPS-replacement benchmark traces ~78% of microplex's loss gap to tax-unit/SPM under-splitting (MP ~1.16 tax units/hh vs eCPS ~1.34; SPM 1.00 vs 1.04). MP reimplemented entity construction from scratch; eCPS uses the mature engine now in microunit. Reusing it removes a large maintenance surface and a known accuracy gap.
⚠️ Critical correctness guard (do not skip)
microunit IS eCPS's tax-unit construction logic. So adopting it makes microplex's tax units converge toward eCPS's. This creates two traps the implementation MUST guard against:
- Regression-to-eCPS masquerading as improvement. A drop in PE-native loss after adoption may simply mean MP's entities now match eCPS's — i.e. MP is becoming eCPS, not getting independently better. Any benchmark gain from this change must be interpreted as "converged to the incumbent's structure," NOT "microplex improved."
- Benchmark circularity. The sound eCPS-replacement comparison derives its tax-unit-dependent targets partly from the baseline (eCPS). If MP and eCPS now share construction code, MP can score well on those targets because they share logic, not because MP's underlying records are better. Scoring MP-with-microunit against eCPS is partly self-referential on the entity dimension.
Required before claiming any win: isolate the effect. Run the sound comparison (matched-N, symmetric refit, holdout) before vs after adoption on the SAME baseline + SAME target surface, and report the change as an entity-convergence effect. Do NOT report "microplex beats eCPS" off the back of adopting eCPS's own construction code. The honest question this answers is "does using the canonical entity engine change MP's loss, and in which direction" — not "is MP better."
Scope
- Add
microunit dependency (pre-PyPI: git+https://github.com/PolicyEngine/microunit@<sha>).
- Replace the
us.py role-flag construction path with microunit.construct_tax_units, mapping microplex's person frame to the expected input schema (microunit is source-agnostic and takes a normalized CPS-like person frame; MP must supply the right columns).
- Keep MP's existing tests green; ADD the before/after isolation comparison described above as the acceptance gate.
- Sequencing: land the policyengine-us-data adoption first and verify it in production, THEN do this — per maintainer instruction, fail one at a time.
Blocked on
Summary
Adopt the new
microunitpackage for tax-unit / SPM-unit / family / marital-unit construction in microplex-us, replacing the bespoke role-flag reconstruction currently insrc/microplex_us/pipelines/us.py(_build_policyengine_tax_units*,_assign_family_and_spm_units,_assign_marital_units, and the role-flag coherence helpers — ~1,500 lines).microunitis the rules-based tax-unit engine extracted from policyengine-us-data (PRs #824/#890 there), now standalone. It applies real filing/dependency rules to partition people into units, with two modes (policyengine,census_documented). Public API:from microunit import construct_tax_units.Why
The eCPS-replacement benchmark traces ~78% of microplex's loss gap to tax-unit/SPM under-splitting (MP ~1.16 tax units/hh vs eCPS ~1.34; SPM 1.00 vs 1.04). MP reimplemented entity construction from scratch; eCPS uses the mature engine now in microunit. Reusing it removes a large maintenance surface and a known accuracy gap.
microunitIS eCPS's tax-unit construction logic. So adopting it makes microplex's tax units converge toward eCPS's. This creates two traps the implementation MUST guard against:Required before claiming any win: isolate the effect. Run the sound comparison (matched-N, symmetric refit, holdout) before vs after adoption on the SAME baseline + SAME target surface, and report the change as an entity-convergence effect. Do NOT report "microplex beats eCPS" off the back of adopting eCPS's own construction code. The honest question this answers is "does using the canonical entity engine change MP's loss, and in which direction" — not "is MP better."
Scope
microunitdependency (pre-PyPI:git+https://github.com/PolicyEngine/microunit@<sha>).us.pyrole-flag construction path withmicrounit.construct_tax_units, mapping microplex's person frame to the expected input schema (microunit is source-agnostic and takes a normalized CPS-like person frame; MP must supply the right columns).Blocked on