The calibration matrix builder is over-attributing dollars for constrained amount targets whose native entity is not household.
This is most visible on total_self_employment_income, but the issue is broader: it affects any non-household amount target where constraints are evaluated at person level and then promoted to household before the final amount is assigned.
Current behavior:
- The builder first pre-aggregates the target variable to household.
- It evaluates the target constraints at person level.
- It then promotes the constraint result to household with a household-level
any().
- Finally, it assigns the full household amount whenever any qualifying person is present.
That means non-qualifying dollars can leak into the target stratum.
Concrete example:
- Person A has
total_self_employment_income = 10,000 and is in a filing tax unit.
- Person B has
total_self_employment_income = 5,000 and is not in a filing tax unit.
- The constrained target is intended to represent filer-only self-employment income.
- The current matrix logic attributes
15,000 to that stratum instead of 10,000.
Why this matters for total_self_employment_income:
- The DB target is semantically correct.
- The target row is constrained to
tax_unit_is_filer == 1 and total_self_employment_income > 0.
- After compat registration,
total_self_employment_income is a person-level variable.
- So the intended target is filer-only Schedule C / self-employment income, not total household self-employment income whenever one person qualifies.
Affected code paths:
policyengine_us_data/calibration/unified_matrix_builder.py
_compute_single_state and _compute_single_state_group_counties pre-aggregate amount targets to household with map_to="household".
_evaluate_constraints_standalone evaluates constraints at person level and promotes them to household with .any().
_calculate_target_values_standalone multiplies the household total by that promoted household mask.
- The same aggregate-first / mask-later pattern is also duplicated in the build loop paths used for sequential and worker-based matrix construction.
Expected behavior:
- For constrained amount targets whose native entity is
person, tax_unit, or spm_unit, qualifying amounts should be selected at the target’s native entity before rolling up to household for the matrix.
- In other words, only qualifying entity amounts should contribute to the target row.
- Household-native amount targets should keep their current household semantics.
- Count targets should keep their current entity-aware behavior.
Impact:
total_self_employment_income is overstated in constrained filer-only strata whenever a household mixes qualifying and non-qualifying self-employment income.
- The same leakage can affect other constrained amount targets with non-household native entities.
- This can distort calibration weights even when the target definitions in the DB are correct.
Acceptance criteria:
- Constrained non-household amount targets include only qualifying entity amounts.
total_self_employment_income filer-only targets no longer pick up non-filer household members’ self-employment income.
- Household-native amount targets are unchanged.
- Count targets are unchanged.
- Regression tests cover at least:
- a person-entity amount target with mixed qualifying / non-qualifying members in one household
- a tax-unit amount target counted once per qualifying tax unit
- a household-native amount target preserving current behavior
The calibration matrix builder is over-attributing dollars for constrained amount targets whose native entity is not
household.This is most visible on
total_self_employment_income, but the issue is broader: it affects any non-household amount target where constraints are evaluated at person level and then promoted to household before the final amount is assigned.Current behavior:
any().That means non-qualifying dollars can leak into the target stratum.
Concrete example:
total_self_employment_income = 10,000and is in a filing tax unit.total_self_employment_income = 5,000and is not in a filing tax unit.15,000to that stratum instead of10,000.Why this matters for
total_self_employment_income:tax_unit_is_filer == 1andtotal_self_employment_income > 0.total_self_employment_incomeis a person-level variable.Affected code paths:
policyengine_us_data/calibration/unified_matrix_builder.py_compute_single_stateand_compute_single_state_group_countiespre-aggregate amount targets to household withmap_to="household"._evaluate_constraints_standaloneevaluates constraints at person level and promotes them to household with.any()._calculate_target_values_standalonemultiplies the household total by that promoted household mask.Expected behavior:
person,tax_unit, orspm_unit, qualifying amounts should be selected at the target’s native entity before rolling up to household for the matrix.Impact:
total_self_employment_incomeis overstated in constrained filer-only strata whenever a household mixes qualifying and non-qualifying self-employment income.Acceptance criteria:
total_self_employment_incomefiler-only targets no longer pick up non-filer household members’ self-employment income.