docs(napkin-math): proposal 141 — source-preservation audit#739
Merged
Conversation
…ipeline Two failure modes the existing depends-on audit cannot catch: 1. Source-stated thresholds (floors, caps, targets, deadlines) silently absent from parameters.json. The plan names the gate; the extractor recognises it but omits it; downstream stages cannot test it. 2. Prior-baseline variables silently dropped between vN-1 and vN. May be a structural improvement (replaced by a better-named equivalent) or a silent regression. Same depends-on audit passes either way. Proposal specifies: - A new optional 'dropped_signals' field on the extract artifact's JSON shape, with a closed enumeration of allowed reasons (replaced_by, cap_pressure, out_of_scope, unmodelled_external, redundant_with). - A source-preservation rule added to the extract skill system prompts, requiring either preservation or explicit drop justification for every source-stated or prior-baseline signal. - A deterministic Python audit script (no LLM call) that scans the digest for threshold patterns, diffs the prior baseline id-set against the current, and flags unjustified drops. - Scope deliberately limited to the extract stage. Compress preservation is a different problem (the LLM is meant to drop content there). Downstream deterministic stages preserve by construction. Corpus-agnostic by design: enumeration members are structural categories, regex patterns target comparison-word structures, and the rule applies to any plan in any domain. Plan-name references in the doc text itself are humans-only context, not prompt content.
neoneye
added a commit
that referenced
this pull request
May 21, 2026
Replaces the prior 'Status as of 2026-05-21' content with three explicit subsections: 1. Landed on main: PR #737 (Phase 1 compress + initial extract threshold-pairing + OPTIMIZE_INSTRUCTIONS) and PR #739 (Proposal 141 design only). 2. Open for merge: PR #740 commit chain (4cda70b source-arithmetic + parity, 19f927b aggregate-sum tightening, 8f94c8c source_text truncation discipline). All edits applied symmetrically to both extract skills. 3. PR #740 verification posture: same-LLM same-session regression check, not improvement proof. All six v51 parameters.json files validate clean. Behavioural verification of the rules on a different LLM is a separate piece of follow-up work, not part of PR #740. Known limitations section now explicitly names the clearest unresolved regression (paperclip OPC UA / p99 latency at compress stage), the cap-pressure-without-recorded-rationale gap (yellowstone public_compliance trio), and the absence of a source-preservation audit implementation (proposal 141 design merged, code not). Lists four follow-up PRs in preferred order; bundling them re-creates the scope creep PR #740 was extracted from.
This was referenced May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds proposal 141 to
docs/proposals/covering a deterministic source-preservation audit for the napkin_math pipeline. Doc-only PR — no code or prompt changes.What the proposal specifies
parameters.jsonparameters.jsonsource_claim_ids(per entry) anddropped_signals(top-level array)source_claim_idhashes for mechanical claim-to-declaration matchingreason(replaced_by,cap_pressure,out_of_scope,moved_to_unmodelled_gate,redundant_with) with dedicated reference fields per reasondropped_signalsitself (e.g.cap_pressuremust actually match a capped array at cap)experiments/napkin_math/audit_source_preservation.pyscript (no LLM call)Why now
Two failure modes the existing depends-on audit cannot catch:
Both predate this proposal and were surfaced during the PR #737 v50 prompt-cleanup work. The proposal commits to a design before implementation lands.
What this PR does NOT do
Commit chain
aaceee55— Initial proposal draft3c47a3ab— ChatGPT-led restructuring (added Pitch, Feasibility, Implementation Phases, Success Metrics, Risks, Acceptance, Open Questions; deterministicsource_claim_idhash; closed-enumreasonwith dedicated reference fields; validation rules ondropped_signals)Test plan
docs/proposals/AGENTS.mdformattinggrep -nE "€[0-9]|km²|GW|GVA|RTE|DGSI|DREAL|OPC UA|paperclip|hyperscale|yellowstone|mars_gtld|euro_adoption|crate_recovery|datacenter" docs/proposals/141-source-preservation-audit.mdreturns nothing)