feat(napkin-math): sharper severity wording in Suggested next actions + MARGINAL bucketing#722
Closed
neoneye wants to merge 1 commit into
Closed
Conversation
…+ MARGINAL bucketing ChatGPT v44 review: two wording fixes. Suggested next actions item #1 previously said 'N gate(s) currently fail at the 50% pass-rate bar' regardless of whether the worst pass rate was 0% or 49%. That phrasing understated DOOM failures: '1 gate fails at the 50% bar' reads identically whether the gate is FRAGILE-48% or DOOM-0%. Now distinguishes DOOM vs FRAGILE counts and names the worst gate by id + pass rate: '1 declared gate in the DOOM band. Worst: sponsor_profitability_window_margin_days at 0.0% pass rate under current bounds.' (or '2 in the DOOM band; 3 in the FRAGILE band. Worst: ... at X.X% pass rate.' for mixed cases). Decision implications MARGINAL wording was 'close enough to coin-flip' across the full 50-80% band. At 79.8% that reads as a misdiagnosis — the gate is one slip from ROBUST, not coin-flip. Bucketed at 70%: at-or-above 70% uses 'just below the ROBUST band. The gate passes in most runs, but downstream commitments should not treat it as secure.'; below 70% keeps the 'close to coin-flip' framing. No schema bump (manifest unchanged; pure rendering). Smoke 9/9, unit 50/50.
Member
Author
|
Superseded by #723, which bundles these two wording fixes with the v45 saturated-gate-exclusion explainer. |
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ChatGPT review of the v44 casino_royale assessment flagged two wording issues — both in
summarize_assessment.py, both pure rendering, no schema change.1. Suggested next actions item #1 understated DOOM
Old:
That phrasing reads identically whether the worst gate is FRAGILE at 49% or DOOM at 0%. ChatGPT: "the worst gate has 0.0% pass rate, which is not just 'below 50%'; it is a structural failure under current bounds."
New (mixed example):
New (single-DOOM example, matches casino_royale v44):
The summary distinguishes DOOM count from FRAGILE count and names the worst gate by id and pass rate.
2. MARGINAL "coin-flip" wording over-fired at 79.8%
Old Decision implications for MARGINAL was a single template:
ChatGPT: "At 79.8%, it barely misses ROBUST. Calling it 'close enough to coin-flip' is too harsh. It is not coin-flip; it is near the ROBUST threshold."
The MARGINAL band is 50–80%. A 51% pass is genuinely coin-flip; a 79% pass is one slip from ROBUST. Bucketed at 70%:
Test plan
1 declared gate in the DOOM band. Worst: sponsor_profitability_window_margin_days at 0.0% pass rate under current bounds.and the 79.8% AML-adjusted NOI gate's Decision implications row now readsjust below the ROBUST band. The gate passes in most runs, but downstream commitments should not treat it as secure.🤖 Generated with Claude Code