Adopt microunit for tax-unit construction (guard against regression-to-eCPS)

## Summary

Adopt the new [`microunit`](https://github.com/PolicyEngine/microunit) package for tax-unit / SPM-unit / family / marital-unit construction in microplex-us, replacing the bespoke role-flag reconstruction currently in `src/microplex_us/pipelines/us.py` (`_build_policyengine_tax_units*`, `_assign_family_and_spm_units`, `_assign_marital_units`, and the role-flag coherence helpers — ~1,500 lines).

`microunit` is the rules-based tax-unit engine **extracted from policyengine-us-data** (PRs #824/#890 there), now standalone. It applies real filing/dependency rules to partition people into units, with two modes (`policyengine`, `census_documented`). Public API: `from microunit import construct_tax_units`.

## Why

The eCPS-replacement benchmark traces ~78% of microplex's loss gap to tax-unit/SPM under-splitting (MP ~1.16 tax units/hh vs eCPS ~1.34; SPM 1.00 vs 1.04). MP reimplemented entity construction from scratch; eCPS uses the mature engine now in microunit. Reusing it removes a large maintenance surface and a known accuracy gap.

## ⚠️ Critical correctness guard (do not skip)

**`microunit` IS eCPS's tax-unit construction logic.** So adopting it makes microplex's tax units converge toward eCPS's. This creates two traps the implementation MUST guard against:

1. **Regression-to-eCPS masquerading as improvement.** A drop in PE-native loss after adoption may simply mean MP's entities now match eCPS's — i.e. MP is becoming eCPS, not getting independently better. Any benchmark gain from this change must be interpreted as "converged to the incumbent's structure," NOT "microplex improved."
2. **Benchmark circularity.** The sound eCPS-replacement comparison derives its tax-unit-dependent targets partly from the baseline (eCPS). If MP and eCPS now share construction code, MP can score well on those targets *because they share logic*, not because MP's underlying records are better. Scoring MP-with-microunit against eCPS is partly self-referential on the entity dimension.

**Required before claiming any win:** isolate the effect. Run the sound comparison (matched-N, symmetric refit, holdout) before vs after adoption on the SAME baseline + SAME target surface, and report the change as an entity-convergence effect. Do NOT report "microplex beats eCPS" off the back of adopting eCPS's own construction code. The honest question this answers is "does using the canonical entity engine change MP's loss, and in which direction" — not "is MP better."

## Scope

- Add `microunit` dependency (pre-PyPI: `git+https://github.com/PolicyEngine/microunit@<sha>`).
- Replace the `us.py` role-flag construction path with `microunit.construct_tax_units`, mapping microplex's person frame to the expected input schema (microunit is source-agnostic and takes a normalized CPS-like person frame; MP must supply the right columns).
- Keep MP's existing tests green; ADD the before/after isolation comparison described above as the acceptance gate.
- Sequencing: land the policyengine-us-data adoption first and verify it in production, THEN do this — per maintainer instruction, fail one at a time.

## Blocked on

- microunit PR #3 (package itself) merged + ideally a tagged release / PyPI publish so the dependency is stable.
- policyengine-us-data adoption PR landed and verified.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt microunit for tax-unit construction (guard against regression-to-eCPS) #113

Summary

Why

⚠️ Critical correctness guard (do not skip)

Scope

Blocked on

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Adopt microunit for tax-unit construction (guard against regression-to-eCPS) #113

Description

Summary

Why

⚠️ Critical correctness guard (do not skip)

Scope

Blocked on

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions