napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3) by neoneye · Pull Request #753 · PlanExeOrg/PlanExe

neoneye · 2026-05-21T19:07:50Z

Summary

Builds on PR #751 (Fork B audit) and PR #752 (dropped_signals schema + strict consumption) by closing the prior_baseline loop: the extract skill now has a way to see what the previous iteration emitted, so it can decide what to preserve and what to explain-drop.

Per review direction, the orchestration is intentionally narrow: the full prior parameters.json is not passed in. Instead, prepare_extract_input.py builds a compact Prior Signal Ledger and appends it to the combined digest at the end.

What the ledger contains (and doesn't)

Included — just enough for the LLM to recognise prior signals and reason about structural relationships:

Signal name (entry id or output_name)
Section (key_values, missing_values_to_estimate, ...) and kind (id or output_name)
formula_hint when present
depends_on when non-empty

Intentionally excluded — these would anchor the LLM on old phrasings/framings, undermining the "preservation budget, not target" posture:

label, source_text, comment, value

Changes

prepare_extract_input.py gains a --prior CLI flag. When omitted (first-iteration extraction), no ledger is appended and behaviour is unchanged. When provided, build_prior_signal_ledger emits a compact markdown section appended after the bundle so the source remains authoritative.
Both extract prompts (from-digest and from-full) gain a Prior Signal Ledger subsection in the dropped_signals rules. Posture: ledger is advisory metadata, source remains authoritative; preserve when source-supported, record dropped_signals when not; do not invent dropped_signals for signals not in the ledger or source.
12 synthetic unit tests cover ledger construction (kv ids with section/kind, output_names tracked separately, formula_hint inclusion, formula_hint omission when null, id-equals-output_name dedupe, unmodelled_gates inclusion, first-iteration empty-ledger message, explicit exclusion of label/source_text/comment/value) plus end-to-end build_combined_digest with and without --prior.

End-to-end empirical check

Ran prepare_extract_input.py --prior on paperclip's v49 parameters.json. The ledger lands at the end of the digest with all 18 prior signals — matching audit_source_preservation.build_signal_index one-for-one. The latency-tripwire trio (api_latency_p99_threshold_ms, api_latency_margin_ms, actual_api_p99_latency_ms) that v51 silently dropped per the v49→v51 audit is present in the ledger. Infrastructure is now in place for the LLM extract skill to see these prior signals and either preserve them OR record dropped_signals.

What this PR explicitly does NOT do

Does not re-run the LLM extract skill end-to-end. That is the user's next step via the standard skill workflow — re-run extract-parameters-from-digest with the new digest, then audit_source_preservation.py against v49, then inspect whether absent_unexplained moves to explained_drop honestly (without hiding invalid drops via the PR napkin-math: dropped_signals schema + validator + audit consumption (proposal 141 PR 2) #752 strict consumption).
Does not pass the full prior parameters.json (per review direction — anchoring risk).
Does not change strict mode, CI gating, or Fork A scope.
Does not bundle Phase 5 verify-bounds-citations or different-LLM validation.

Test plan

pytest experiments/napkin_math/tests/test_prepare_extract_input_prior_ledger.py (12 pass)
python3 experiments/napkin_math/tests/run_smoke.py (9/9 pass)
End-to-end smoke: prepare_extract_input.py --prior produces a digest with the ledger correctly appended (verified on paperclip v49 — 18 signals, matches the audit's signal universe one-for-one)
No label / source_text / comment / value leaks into the ledger (covered by test_ledger_does_not_include_source_text_or_label)
CI green on latest head

🤖 Generated with Claude Code

…posal 141 PR 3) Builds on PR #751 (Fork B audit) and PR #752 (dropped_signals schema + strict consumption) by closing the prior_baseline loop: the extract skill now has a way to see what the previous iteration emitted, so it can decide what to preserve and what to explain-drop. Per review direction, the orchestration is intentionally NARROW: the full prior parameters.json is NOT passed in. Instead, prepare_extract_input.py builds a compact Prior Signal Ledger and appends it to the combined digest at the end. The ledger contains only: - signal names (entry ids and output_names) - section and kind (id or output_name) - formula_hint when present - depends_on when non-empty Intentionally excluded: source_text, label, comment, value. These would anchor the LLM on old phrasings and old framings — the ledger is a preservation BUDGET, not a phrasing TARGET. The source digest above the ledger remains the authoritative input. Changes: (1) prepare_extract_input.py grows a --prior CLI flag pointing at a prior parameters.json. When omitted (first-iteration extraction), no ledger is appended and behavior is unchanged. When provided, build_prior_signal_ledger emits a compact markdown section appended after the bundle. (2) Both extract prompts (from-digest and from-full) gain a 'Prior Signal Ledger' subsection in the dropped_signals area. Posture: ledger is advisory metadata, source remains authoritative; preserve when source-supported, record dropped_signals when not; do NOT invent dropped_signals entries for signals not in the ledger or source. (3) 12 synthetic unit tests cover ledger construction: key_value ids with section/kind tags, output_names tracked separately when distinct from ids, formula_hint and depends_on inclusion, formula_hint omission when null, id-equals-output_name dedupe (kind=id wins), unmodelled_gates inclusion, first-iteration empty-ledger message, and explicit exclusion of label/source_text/comment/value. Plus 3 end-to-end tests covering build_combined_digest with and without --prior. End-to-end empirical check: ran prepare_extract_input.py --prior on paperclip's v49 parameters.json. The ledger lands at the end of the digest with all 16 prior signals — including the latency-tripwire trio (api_latency_p99_threshold_ms, api_latency_margin_ms, actual_api_p99_latency_ms) that v51 silently dropped per the v49→v51 audit on main. The infrastructure is now in place for the LLM extract skill to see these prior signals and either preserve them OR record dropped_signals. What this PR explicitly does NOT do: - Does not re-run the LLM extract skill end-to-end (that is the user's next step via the standard skill workflow). The skill re-run plus audit comparison is the empirical validation of whether the ledger actually helps the LLM populate dropped_signals usefully. - Does not pass the full prior parameters.json (per review direction — anchoring risk). - Does not change strict mode, CI gating, or Fork A scope (those land in later PRs once this loop is proven useful). - Does not bundle Phase 5 verify-bounds-citations or different-LLM validation. Empirical posture: 12 new unit tests pass. 9/9 smoke checks pass. End-to-end smoke run on paperclip produced a clean digest with the ledger appended. No corpus literals introduced (the ledger emits the actual prior_baseline ids, but those are extracted ids from gitignored corpus outputs, not literals embedded in the prompt or code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three review fixes: 1. plan: update the stale 'No formal source-preservation audit implementation' bullet — Fork B shipped in PR #751/#752/#753; Fork A, orchestrator-side prior-baseline injection, and strict-mode are the actual still-pending follow-ups. 2. plan: bump the document title from 2026-05-20 to 2026-05-22; add an italicised note that the doc was originally drafted 2026-05-20 and renamed/refreshed for the post-#753 ship-set. 3. methology: stop overclaiming what the assessment Basis column exposes. summarize_assessment.py maps source:'data' → 'report_derived' and source:'assumption' → 'model_assumption', and that is what the column shows; the finer 'plan-internal gap forecast vs bare commitment' distinction lives in the rationale string, not the column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…hip-set Updates two docs to reflect the post-PlanExeOrg#753 state of the napkin-math pipeline. methology.md: describe the current pipeline behaviour — two-batch compress with paraphrase-tolerant quote match and cross-bucket promoter; extract's source-arithmetic preservation, threshold-pairing, and dropped_signals field; 19-check validator (added aggregate_not_bounded, requirement_has_margin, dropped_signals_schema); bounds' asymmetric source label on commitment defaults, calculation-output strip, reserved correlations block, reserved lognormal/pert disciplines with loud NotImplementedError; advisory audit_source_preservation.py step. 20260520_plan.md → 20260522_plan.md: bump status date; mark PR PlanExeOrg#750 merged; add PR PlanExeOrg#751/PlanExeOrg#752/PlanExeOrg#753 entries (proposal 141 implementation); update Phase status table (added 4.5 audit row, reclassified Phase 8 as partially done, Phase 10 marked done for current ship-set); add v58 14-plan empirical snapshot (1 viable / 5 fragile / 8 doom); reorder Next likely move now that proposal 141 has shipped — Phase 5 citation verifier promoted to PlanExeOrg#1, Phase 8 samplers added as PlanExeOrg#2 with v58 cases that bite now, Phase 9 composite-band cap as PlanExeOrg#3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

neoneye merged commit 4f085b3 into main May 21, 2026
3 checks passed

neoneye deleted the napkin-math/prior-signal-ledger-141-pr3 branch May 21, 2026 19:51

neoneye mentioned this pull request May 21, 2026

docs(napkin-math): refresh methodology + plan status for 2026-05-22 ship-set #754

Merged

4 tasks

neoneye mentioned this pull request May 21, 2026

napkin-math: wire prior-baseline ledger into Stage 0 reruns #756

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3)#753

napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3)#753
neoneye merged 1 commit into
mainfrom
napkin-math/prior-signal-ledger-141-pr3

neoneye commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the ledger contains (and doesn't)

Changes

End-to-end empirical check

What this PR explicitly does NOT do

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neoneye commented May 21, 2026 •

edited

Loading