Skip to content

Gate releases on input columns stuck at the engine default#286

Open
daphnehanse11 wants to merge 1 commit into
mainfrom
degenerate-input-gate
Open

Gate releases on input columns stuck at the engine default#286
daphnehanse11 wants to merge 1 commit into
mainfrom
degenerate-input-gate

Conversation

@daphnehanse11

Copy link
Copy Markdown
Collaborator

Summary

Closes #257.

Adds a release gate that fails when any persisted PolicyEngine input column is constant at the engine's default value for every record. A default-valued column carries zero information — the H5 behaves identically without it — while looking populated to anyone inspecting the artifact. That is how the constant-40 weekly_hours_worked_before_lsr, the all-True takes_up_snap_if_eligible, and the all-CITIZEN ssn_card_type shipped in the current main-line release.

What the sweep found

Running the gate against the current main-line release (populace-us-2024-f0af251-703bd81a565c) flags 22 degenerate columns, including two previously untracked:

Changes

Verification

  • Full populace-build + populace-frame suites pass (698 passed, 2 skipped), including 11 new gate tests, 2 new adapter tests, and the builder flow test.
  • Gate run against the real cached main-line release H5: passes with exactly the 22 seeded exclusions in effect, zero stale, zero dormant.
  • ruff check / ruff format clean on changed files.

Draft status

  • Run a full fiscal refresh build end-to-end to confirm the gate result lands in build_manifest.json / calibration_diagnostics.json as expected on a real run.
  • Decide whether the narrower fiscal-refresh base line surfaces additional offenders (its raw ASEC columns are unknown to the engine and skip automatically; its PE-facing surface is smaller).

🤖 Generated with Claude Code

The pre-#266 main-line release ships persisted PolicyEngine input
columns that are constant at the engine default for every record —
weekly_hours_worked_before_lsr at the 40-hour default, SNAP/TANF/SSI
take-up flags at True, spm_unit_tenure_type at RENTER, s_corp_income
at zero. Such columns carry zero information while looking populated,
which is how a failed or missing imputation ships silently. The
existing nonconstant_columns_gate only checks a hand-picked ACA
allowlist, and input_mass_parity_gate passes when the parent artifact
is equally degenerate — the constant-40 hours column had full mass in
every ancestor.

Add default_valued_columns_gate: a sweep over every persisted input
column that fails when all observed values equal the engine default,
with no reference artifact needed. Constant-but-not-default columns
pass and are reported (an intentional broadcast is a modeling choice).
Reviewed exclusions accept known degenerate columns with a recorded
reason; a stale exclusion (column now carries signal) fails so the
list cannot rot, while a dormant one (column absent from this release
line's surface) is only reported.

Expose engine defaults through PolicyEngineUSEngine.default_values —
input variables only, enum defaults normalized to their stored member
name — and wire the gate into the US fiscal refresh builder across all
entity tables, threading the result into the release gate failures,
calibration diagnostics, and build manifest alongside the health input
and input-mass gates. The builder seeds reviewed exclusions for the 20
known offenders, each naming its tracking issue; ssn_card_type and
immigration_status_str are deliberately not excluded — #266 imputes
them now, so a base where they are still constant skipped that stage
and should fail.

Closes #257

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@daphnehanse11 daphnehanse11 force-pushed the degenerate-input-gate branch from 3513ff2 to 6cf3072 Compare July 2, 2026 14:47
@daphnehanse11 daphnehanse11 marked this pull request as ready for review July 2, 2026 16:53
@daphnehanse11

Copy link
Copy Markdown
Collaborator Author

@PavelMakarchuk context for review: verified against the current main-line release artifact (populace-us-2024-f0af251) — the sweep checks 178 input columns and catches 22 stuck at their engine default, all seeded as reviewed exclusions with tracking issues. Two of those were previously unknown: s_corp_income constant zero (possibly relevant to the #212 income shortfall) and spm_unit_tenure_type constant RENTER (SNAP shelter deductions).

One design choice worth your judgment: a stale exclusion (column carries signal now, e.g. SSN after #266 lands in a new base) fails the build until the exclusion is removed — strict by design so the list can't rot, but it turns an upstream fix into a red build for one cycle. Happy to soften to a warning if you'd rather.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gate releases on degenerate PE input columns (constant at the engine default)

1 participant