fix(summarize-insights): drop reader-engagement filler, reframe audience as downstream AI by neoneye · Pull Request #708 · PlanExeOrg/PlanExe

neoneye · 2026-05-16T15:29:11Z

Summary

The user pointed out that the line "If you read nothing else, read this." in insights.md is wasted tokens:

The planexe report is big, and humans spend on average 7 seconds before they navigate away. It's for AI consumption, not humans. So the "If you read nothing else, read this." is wasting tokens.

The structural markers (## Bad news first, ### Likely deal-breakers, the verdict labels) already do the work that prefix tried to do. The reader-hook framing came from when we were optimising for a project-manager audience; the actual primary consumer is downstream AI.

What changed

summarize_insights.py: removed the prefix from render_bad_news_first. The substantive sentence that explains what items are in the section stays:

Every item below is a signal the plan does not survive its own assumptions.
Items are ordered by severity.

summarize-insights/SKILL.md: new ## Audience and tone section codifying the principle so it survives future edits:

No reader-engagement prefixes
No filler sentences whose only job is to motivate the next sentence
Keep substantive explanations (what a verdict label means, what a column shows) — those are signal, not filler
Don't apologise for the bad news

The no-sycophancy rule also got its "the reader" → "the downstream consumer" to match the audience reframe.

Test plan

All three reference plans (Nuuk v31, Cross-Border Rail v33, Faraday v33) regenerate clean insights.md files with the prefix gone
Smoke 8/8 (tests/run_smoke.py)
Unittest 45/45 (tests/test_run_monte_carlo.py)

🤖 Generated with Claude Code

…nce as downstream AI 'If you read nothing else, read this.' was a reader hook for humans, but the primary consumer of insights.md is downstream AI (another agent, a planning loop, a follow-on extractor) — token-density of useful signal matters more than engagement. The structural markers (## Bad news first, ### Likely deal-breakers, verdict labels) already do the work the prefix tried to do; restating it in prose burns tokens. Removed the prefix from render_bad_news_first. The substantive sentence ('Every item below is a signal the plan does not survive its own assumptions. Items are ordered by severity.') stays because it explains what's in the section. SKILL.md gains a new 'Audience and tone' section codifying the principle: no reader-engagement prefixes, no filler sentences whose only job is to motivate the next sentence, keep substantive explanations (signal, not filler). Replaced 'reader' with 'downstream consumer' in the no-sycophancy rule to reflect the audience reframe. Verified: all three reference plans regenerate insights.md cleanly with the prefix gone. Smoke 8/8, unittest 45/45.

ChatGPT's review of v33 raised 15 items; this commit ships the five 'quick win' ones that fit on top of the existing per-run state. All five are runner-side analyses plus matching insights.md sections — no schema changes, no LLM prompt edits. §14 binding-gate frequency tracking. For every min() aggregate, the runner records which dependency provided the min value in each run, then aggregates conditional on the aggregate failing its threshold. Faraday demonstration: the weakest_program_gate fails in 9,826 of 10,000 runs; mil_std_cert_funding is the binder in 67% of those, cash_flow_trigger in 32%, inventory_overhang in 0.5%. That tells the reader which sub-gate to fix first; the previous output only knew it failed. §7 quartile pass-rates. For each threshold × driver, P(threshold passes | driver in bottom quartile) vs P(threshold passes | driver in top quartile). The delta in percentage points is much more actionable than Pearson r — 'P(coverage 99%) goes from 18% in worst-quartile satellite-failure runs to 74% in best-quartile' is a directly usable lever. §13 required-input thresholds. For each FAILING gate (P < 80%), find the input-bound restriction that would lift conditional pass rate to >= 80%. Empty list means no single-input restriction is enough — Faraday's weakest_program_gate gets an empty list, correctly diagnosing it as structurally unreachable. §8 missing-value priority. Rank missing_values entries by |delta_pp on worst gate| * (1 - pass_prob) * bound_width_ratio. The highest-scoring entries are the ones most worth replacing with real data instead of an assumed range. §10 model confidence grades. Per output, grade HIGH/MEDIUM/LOW based on the fraction of upstream input bounds anchored in 'data' vs 'assumption' and the average bound-width-to-base ratio. Cutoffs: data >= 70% AND width < 0.5 -> HIGH; data < 30% OR width > 1.5 -> LOW; else MEDIUM. The reasons array names the specific evidence. Five new render functions in summarize_insights.py emit these as separate sections after the existing verdict table. Five new unittest.TestCase methods (TestNewAnalysisBlocks) cover each block end-to-end against a small synthetic fixture. Smoke 8/8, unittest 50/50. Reference runs regenerated for Nuuk, Cross-Border Rail, Faraday, India Census.

* main: prompts: add Hauts-de-France hyperscale AI datacenter test case

Two small wording changes from ChatGPT's v38 feedback (the reviewer called the format 'production-candidate; freeze the structure', and these are the only follow-ups). 1) DOOM verdict band 'almost certainly fails' -> 'rarely passes under current bounds'. Avoids the epistemic overclaim of 'certainly' on a model-relative pass rate. 2) Decision implications intro line reworded from 'The actual plan revision is for human or LLM interpretation against the source report' to 'This section identifies the affected planning lever; concrete revisions should be derived by reading the source report and the relevant intermediary artifacts.' Less self-referential. Status doc renamed 20260516_claude.md -> 20260517_claude.md and rewritten for the current state: PR #708 (merged, five analysis blocks) and PR #710 (in flight, this branch, the v34->v38 insights-format iteration). Schema v3 frozen. Cross-plan validation extended to five distinct domains. Open issues mostly carry over from 0516 plus a new manifest-regression-test gap.

neoneye added 3 commits May 16, 2026 17:28

Merge branch 'main' into fix/napkin-math-insights-no-filler

566e70f

* main: prompts: add Hauts-de-France hyperscale AI datacenter test case

neoneye merged commit 2b969bd into main May 16, 2026
3 checks passed

neoneye deleted the fix/napkin-math-insights-no-filler branch May 16, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(summarize-insights): drop reader-engagement filler, reframe audience as downstream AI#708

fix(summarize-insights): drop reader-engagement filler, reframe audience as downstream AI#708
neoneye merged 3 commits into
mainfrom
fix/napkin-math-insights-no-filler

neoneye commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented May 16, 2026

Summary

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant