fix(napkin-math): saturated-gate note + DOOM severity wording + MARGINAL band bucketing#723
Merged
Merged
Conversation
…ssing inputs ranking ChatGPT v45 review: a downstream consumer reading the Missing inputs ranked by impact table sees inputs targeting one set of gates (whichever non-saturated gates have the largest sensitivity) while the worst declared gate is the SATURATED DOOM gate — which is absent from the ranking entirely (no input is its worst-affected gate, by construction, because no quartile movement shifts a saturated failure). When the consumer asks 'why are we ranking inputs for the second-worst gate?' the answer is structural, not a bug, but the table doesn't say so. Adding an explicit note removes the ambiguity: Note: the saturated DOOM gate `sponsor_profitability_window_margin_days` is absent from the ranking because no single missing-input restriction can lift its pass rate under current bounds. The inputs below target the next most decision-relevant non-saturated gates; the saturated gate needs a bounds or threshold-definition audit, not a single input fix. New saturated_doom_gates(mc, params) helper detects them using the existing is_saturated_failure rule. render_missing_inputs_ranked() now takes params so it can run that detection; the note grammars singular/plural correctly and lists all saturated gates by id. Section is omitted entirely when no missing_value_priority exists (unchanged). Pure rendering; no schema bump. Smoke 9/9, unit 50/50.
…+ MARGINAL bucketing ChatGPT v44 review: two wording fixes. Suggested next actions item #1 previously said 'N gate(s) currently fail at the 50% pass-rate bar' regardless of whether the worst pass rate was 0% or 49%. That phrasing understated DOOM failures: '1 gate fails at the 50% bar' reads identically whether the gate is FRAGILE-48% or DOOM-0%. Now distinguishes DOOM vs FRAGILE counts and names the worst gate by id + pass rate: '1 declared gate in the DOOM band. Worst: sponsor_profitability_window_margin_days at 0.0% pass rate under current bounds.' (or '2 in the DOOM band; 3 in the FRAGILE band. Worst: ... at X.X% pass rate.' for mixed cases). Decision implications MARGINAL wording was 'close enough to coin-flip' across the full 50-80% band. At 79.8% that reads as a misdiagnosis — the gate is one slip from ROBUST, not coin-flip. Bucketed at 70%: at-or-above 70% uses 'just below the ROBUST band. The gate passes in most runs, but downstream commitments should not treat it as secure.'; below 70% keeps the 'close to coin-flip' framing. No schema bump (manifest unchanged; pure rendering). Smoke 9/9, unit 50/50.
Closed
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three rendering fixes to
summarize_assessment.py, bundled together because ChatGPT's v46 review caught that the v46 assessment regressed two earlier wording fixes (those fixes lived on an unmerged branch, PR #722, so the v46 branch frommainlacked them). Folding them in here so the next assessment iteration has all three.1. Suggested next actions item #1 — sharper severity
Distinguishes DOOM count from FRAGILE count and names the worst gate by id + pass rate:
(or, for mixed cases: "2 declared gates in the DOOM band; 3 in the FRAGILE band. Worst: ... at X.X% pass rate.")
2. Decision implications MARGINAL bucketing
MARGINAL spans 50–80%. Bucketed at 70%:
3. Missing inputs ranking: explain why saturated DOOM gates are absent
When
saturated_doom_gates(mc, params)detects any saturated DOOM gate (DOOM band with no quartile sensitivity), a note now appears above the table:Singular/plural handled. Section unchanged when no saturated DOOM gate exists.
Test plan
Pure rendering — no schema bump, no manifest field change. PR #722 (the older branch with only the first two fixes) will be closed as superseded.
🤖 Generated with Claude Code