Skip to content

Calibration matrix builder leaks non-qualifying entity amounts into constrained non-household amount targets #763

@juaristi22

Description

@juaristi22

The calibration matrix builder is over-attributing dollars for constrained amount targets whose native entity is not household.

This is most visible on total_self_employment_income, but the issue is broader: it affects any non-household amount target where constraints are evaluated at person level and then promoted to household before the final amount is assigned.

Current behavior:

  • The builder first pre-aggregates the target variable to household.
  • It evaluates the target constraints at person level.
  • It then promotes the constraint result to household with a household-level any().
  • Finally, it assigns the full household amount whenever any qualifying person is present.

That means non-qualifying dollars can leak into the target stratum.

Concrete example:

  • Person A has total_self_employment_income = 10,000 and is in a filing tax unit.
  • Person B has total_self_employment_income = 5,000 and is not in a filing tax unit.
  • The constrained target is intended to represent filer-only self-employment income.
  • The current matrix logic attributes 15,000 to that stratum instead of 10,000.

Why this matters for total_self_employment_income:

  • The DB target is semantically correct.
  • The target row is constrained to tax_unit_is_filer == 1 and total_self_employment_income > 0.
  • After compat registration, total_self_employment_income is a person-level variable.
  • So the intended target is filer-only Schedule C / self-employment income, not total household self-employment income whenever one person qualifies.

Affected code paths:

  • policyengine_us_data/calibration/unified_matrix_builder.py
  • _compute_single_state and _compute_single_state_group_counties pre-aggregate amount targets to household with map_to="household".
  • _evaluate_constraints_standalone evaluates constraints at person level and promotes them to household with .any().
  • _calculate_target_values_standalone multiplies the household total by that promoted household mask.
  • The same aggregate-first / mask-later pattern is also duplicated in the build loop paths used for sequential and worker-based matrix construction.

Expected behavior:

  • For constrained amount targets whose native entity is person, tax_unit, or spm_unit, qualifying amounts should be selected at the target’s native entity before rolling up to household for the matrix.
  • In other words, only qualifying entity amounts should contribute to the target row.
  • Household-native amount targets should keep their current household semantics.
  • Count targets should keep their current entity-aware behavior.

Impact:

  • total_self_employment_income is overstated in constrained filer-only strata whenever a household mixes qualifying and non-qualifying self-employment income.
  • The same leakage can affect other constrained amount targets with non-household native entities.
  • This can distort calibration weights even when the target definitions in the DB are correct.

Acceptance criteria:

  • Constrained non-household amount targets include only qualifying entity amounts.
  • total_self_employment_income filer-only targets no longer pick up non-filer household members’ self-employment income.
  • Household-native amount targets are unchanged.
  • Count targets are unchanged.
  • Regression tests cover at least:
    • a person-entity amount target with mixed qualifying / non-qualifying members in one household
    • a tax-unit amount target counted once per qualifying tax unit
    • a household-native amount target preserving current behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions