Gate releases on input columns stuck at the engine default#286
Gate releases on input columns stuck at the engine default#286daphnehanse11 wants to merge 1 commit into
Conversation
The pre-#266 main-line release ships persisted PolicyEngine input columns that are constant at the engine default for every record — weekly_hours_worked_before_lsr at the 40-hour default, SNAP/TANF/SSI take-up flags at True, spm_unit_tenure_type at RENTER, s_corp_income at zero. Such columns carry zero information while looking populated, which is how a failed or missing imputation ships silently. The existing nonconstant_columns_gate only checks a hand-picked ACA allowlist, and input_mass_parity_gate passes when the parent artifact is equally degenerate — the constant-40 hours column had full mass in every ancestor. Add default_valued_columns_gate: a sweep over every persisted input column that fails when all observed values equal the engine default, with no reference artifact needed. Constant-but-not-default columns pass and are reported (an intentional broadcast is a modeling choice). Reviewed exclusions accept known degenerate columns with a recorded reason; a stale exclusion (column now carries signal) fails so the list cannot rot, while a dormant one (column absent from this release line's surface) is only reported. Expose engine defaults through PolicyEngineUSEngine.default_values — input variables only, enum defaults normalized to their stored member name — and wire the gate into the US fiscal refresh builder across all entity tables, threading the result into the release gate failures, calibration diagnostics, and build manifest alongside the health input and input-mass gates. The builder seeds reviewed exclusions for the 20 known offenders, each naming its tracking issue; ssn_card_type and immigration_status_str are deliberately not excluded — #266 imputes them now, so a base where they are still constant skipped that stage and should fail. Closes #257 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3513ff2 to
6cf3072
Compare
|
@PavelMakarchuk context for review: verified against the current main-line release artifact ( One design choice worth your judgment: a stale exclusion (column carries signal now, e.g. SSN after #266 lands in a new base) fails the build until the exclusion is removed — strict by design so the list can't rot, but it turns an upstream fix into a red build for one cycle. Happy to soften to a warning if you'd rather. |
Summary
Closes #257.
Adds a release gate that fails when any persisted PolicyEngine input column is constant at the engine's default value for every record. A default-valued column carries zero information — the H5 behaves identically without it — while looking populated to anyone inspecting the artifact. That is how the constant-40
weekly_hours_worked_before_lsr, the all-Truetakes_up_snap_if_eligible, and the all-CITIZENssn_card_typeshipped in the current main-line release.What the sweep found
Running the gate against the current main-line release (
populace-us-2024-f0af251-703bd81a565c) flags 22 degenerate columns, including two previously untracked:s_corp_incomeconstant zero (combined partnership/S-corp income carried inpartnership_incomein pre-PUF-support bases) — possibly relevant to Cross-check: Populace US 2024 net income & tax ~37–43% below Enhanced CPS (benefits match) #212spm_unit_tenure_typeconstant RENTER — misstates SNAP shelter deductions and SPM housingChanges
gates.py:default_valued_columns_gate— sweeps every column it is handed against per-column engine defaults. Constant-but-not-default columns pass and are reported (intentional broadcasts are modeling choices, not masked imputations). Bool columns never match numeric defaults (True == 1must not excuse a stuck flag). Reviewed exclusions accept known offenders with a recorded reason: a stale exclusion (column now carries signal) fails the gate so the list cannot rot; a dormant one (column absent from this release line's surface) is only reported, since different release lines persist different column sets.default_values(names)— engine defaults for non-formula input variables, enum defaults normalized to their stored member name; unknown and formula-owned names silently omitted so callers can pass a whole export surface._degenerate_input_signal_gatesweeps all entity tables (structural columns excluded), threaded into release gate failures, calibration diagnostics, and the build manifest alongside the health-input gate. SeedsUS_DEGENERATE_INPUT_REVIEWED_EXCLUSIONSwith the 22 known offenders, each naming its tracking issue (Carry hours-worked inputs through Populace US outputs #242/SNAP OBBBA work requirements: restore data inputs needed for household exposure analysis #248 hours, Carry SNAP reported amounts and take-up inputs through Populace US outputs #243 SNAP take-up, Dataset imputes 100% of the population as citizens with valid SSNs (breaks SSN/citizenship-conditioned policies, e.g. OBBBA CTC) #225 SSN/citizenship, Track SPM-specific input gaps: housing, WIC, school meals, child support, workers comp, and expense deductions #32 tenure/WIC, Use formula-constrained leaf imputation for deduction inputs #186 QBI flags, Track health input readiness for upcoming PolicyEngine-US model improvements #98 health take-up) — so existing releases keep building while any new degenerate column fails loudly, and the exclusion list is the visible debt register.Verification
populace-build+populace-framesuites pass (698 passed, 2 skipped), including 11 new gate tests, 2 new adapter tests, and the builder flow test.ruff check/ruff formatclean on changed files.Draft status
build_manifest.json/calibration_diagnostics.jsonas expected on a real run.🤖 Generated with Claude Code