Skip to content

napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3)#753

Merged
neoneye merged 1 commit into
mainfrom
napkin-math/prior-signal-ledger-141-pr3
May 21, 2026
Merged

napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3)#753
neoneye merged 1 commit into
mainfrom
napkin-math/prior-signal-ledger-141-pr3

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented May 21, 2026

Summary

Builds on PR #751 (Fork B audit) and PR #752 (dropped_signals schema + strict consumption) by closing the prior_baseline loop: the extract skill now has a way to see what the previous iteration emitted, so it can decide what to preserve and what to explain-drop.

Per review direction, the orchestration is intentionally narrow: the full prior parameters.json is not passed in. Instead, prepare_extract_input.py builds a compact Prior Signal Ledger and appends it to the combined digest at the end.

What the ledger contains (and doesn't)

Included — just enough for the LLM to recognise prior signals and reason about structural relationships:

  • Signal name (entry id or output_name)
  • Section (key_values, missing_values_to_estimate, ...) and kind (id or output_name)
  • formula_hint when present
  • depends_on when non-empty

Intentionally excluded — these would anchor the LLM on old phrasings/framings, undermining the "preservation budget, not target" posture:

  • label, source_text, comment, value

Changes

  1. prepare_extract_input.py gains a --prior CLI flag. When omitted (first-iteration extraction), no ledger is appended and behaviour is unchanged. When provided, build_prior_signal_ledger emits a compact markdown section appended after the bundle so the source remains authoritative.
  2. Both extract prompts (from-digest and from-full) gain a Prior Signal Ledger subsection in the dropped_signals rules. Posture: ledger is advisory metadata, source remains authoritative; preserve when source-supported, record dropped_signals when not; do not invent dropped_signals for signals not in the ledger or source.
  3. 12 synthetic unit tests cover ledger construction (kv ids with section/kind, output_names tracked separately, formula_hint inclusion, formula_hint omission when null, id-equals-output_name dedupe, unmodelled_gates inclusion, first-iteration empty-ledger message, explicit exclusion of label/source_text/comment/value) plus end-to-end build_combined_digest with and without --prior.

End-to-end empirical check

Ran prepare_extract_input.py --prior on paperclip's v49 parameters.json. The ledger lands at the end of the digest with all 18 prior signals — matching audit_source_preservation.build_signal_index one-for-one. The latency-tripwire trio (api_latency_p99_threshold_ms, api_latency_margin_ms, actual_api_p99_latency_ms) that v51 silently dropped per the v49→v51 audit is present in the ledger. Infrastructure is now in place for the LLM extract skill to see these prior signals and either preserve them OR record dropped_signals.

What this PR explicitly does NOT do

  • Does not re-run the LLM extract skill end-to-end. That is the user's next step via the standard skill workflow — re-run extract-parameters-from-digest with the new digest, then audit_source_preservation.py against v49, then inspect whether absent_unexplained moves to explained_drop honestly (without hiding invalid drops via the PR napkin-math: dropped_signals schema + validator + audit consumption (proposal 141 PR 2) #752 strict consumption).
  • Does not pass the full prior parameters.json (per review direction — anchoring risk).
  • Does not change strict mode, CI gating, or Fork A scope.
  • Does not bundle Phase 5 verify-bounds-citations or different-LLM validation.

Test plan

  • pytest experiments/napkin_math/tests/test_prepare_extract_input_prior_ledger.py (12 pass)
  • python3 experiments/napkin_math/tests/run_smoke.py (9/9 pass)
  • End-to-end smoke: prepare_extract_input.py --prior produces a digest with the ledger correctly appended (verified on paperclip v49 — 18 signals, matches the audit's signal universe one-for-one)
  • No label / source_text / comment / value leaks into the ledger (covered by test_ledger_does_not_include_source_text_or_label)
  • CI green on latest head

🤖 Generated with Claude Code

…posal 141 PR 3)

Builds on PR #751 (Fork B audit) and PR #752 (dropped_signals schema + strict consumption) by closing the prior_baseline loop: the extract skill now has a way to see what the previous iteration emitted, so it can decide what to preserve and what to explain-drop.

Per review direction, the orchestration is intentionally NARROW: the full prior parameters.json is NOT passed in. Instead, prepare_extract_input.py builds a compact Prior Signal Ledger and appends it to the combined digest at the end. The ledger contains only:

  - signal names (entry ids and output_names)

  - section and kind (id or output_name)

  - formula_hint when present

  - depends_on when non-empty

Intentionally excluded: source_text, label, comment, value. These would anchor the LLM on old phrasings and old framings — the ledger is a preservation BUDGET, not a phrasing TARGET. The source digest above the ledger remains the authoritative input.

Changes:

(1) prepare_extract_input.py grows a --prior CLI flag pointing at a prior parameters.json. When omitted (first-iteration extraction), no ledger is appended and behavior is unchanged. When provided, build_prior_signal_ledger emits a compact markdown section appended after the bundle.

(2) Both extract prompts (from-digest and from-full) gain a 'Prior Signal Ledger' subsection in the dropped_signals area. Posture: ledger is advisory metadata, source remains authoritative; preserve when source-supported, record dropped_signals when not; do NOT invent dropped_signals entries for signals not in the ledger or source.

(3) 12 synthetic unit tests cover ledger construction: key_value ids with section/kind tags, output_names tracked separately when distinct from ids, formula_hint and depends_on inclusion, formula_hint omission when null, id-equals-output_name dedupe (kind=id wins), unmodelled_gates inclusion, first-iteration empty-ledger message, and explicit exclusion of label/source_text/comment/value. Plus 3 end-to-end tests covering build_combined_digest with and without --prior.

End-to-end empirical check: ran prepare_extract_input.py --prior on paperclip's v49 parameters.json. The ledger lands at the end of the digest with all 16 prior signals — including the latency-tripwire trio (api_latency_p99_threshold_ms, api_latency_margin_ms, actual_api_p99_latency_ms) that v51 silently dropped per the v49→v51 audit on main. The infrastructure is now in place for the LLM extract skill to see these prior signals and either preserve them OR record dropped_signals.

What this PR explicitly does NOT do:

  - Does not re-run the LLM extract skill end-to-end (that is the user's next step via the standard skill workflow). The skill re-run plus audit comparison is the empirical validation of whether the ledger actually helps the LLM populate dropped_signals usefully.

  - Does not pass the full prior parameters.json (per review direction — anchoring risk).

  - Does not change strict mode, CI gating, or Fork A scope (those land in later PRs once this loop is proven useful).

  - Does not bundle Phase 5 verify-bounds-citations or different-LLM validation.

Empirical posture: 12 new unit tests pass. 9/9 smoke checks pass. End-to-end smoke run on paperclip produced a clean digest with the ledger appended. No corpus literals introduced (the ledger emits the actual prior_baseline ids, but those are extracted ids from gitignored corpus outputs, not literals embedded in the prompt or code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neoneye neoneye merged commit 4f085b3 into main May 21, 2026
3 checks passed
@neoneye neoneye deleted the napkin-math/prior-signal-ledger-141-pr3 branch May 21, 2026 19:51
neoneye added a commit that referenced this pull request May 21, 2026
Three review fixes:

1. plan: update the stale 'No formal source-preservation audit implementation' bullet — Fork B shipped in PR #751/#752/#753; Fork A, orchestrator-side prior-baseline injection, and strict-mode are the actual still-pending follow-ups.

2. plan: bump the document title from 2026-05-20 to 2026-05-22; add an italicised note that the doc was originally drafted 2026-05-20 and renamed/refreshed for the post-#753 ship-set.

3. methology: stop overclaiming what the assessment Basis column exposes. summarize_assessment.py maps source:'data' → 'report_derived' and source:'assumption' → 'model_assumption', and that is what the column shows; the finer 'plan-internal gap forecast vs bare commitment' distinction lives in the rationale string, not the column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
huangyingting pushed a commit to repomesh/PlanExe that referenced this pull request May 22, 2026
…hip-set

Updates two docs to reflect the post-PlanExeOrg#753 state of the napkin-math pipeline.

methology.md: describe the current pipeline behaviour — two-batch compress with paraphrase-tolerant quote match and cross-bucket promoter; extract's source-arithmetic preservation, threshold-pairing, and dropped_signals field; 19-check validator (added aggregate_not_bounded, requirement_has_margin, dropped_signals_schema); bounds' asymmetric source label on commitment defaults, calculation-output strip, reserved correlations block, reserved lognormal/pert disciplines with loud NotImplementedError; advisory audit_source_preservation.py step.

20260520_plan.md → 20260522_plan.md: bump status date; mark PR PlanExeOrg#750 merged; add PR PlanExeOrg#751/PlanExeOrg#752/PlanExeOrg#753 entries (proposal 141 implementation); update Phase status table (added 4.5 audit row, reclassified Phase 8 as partially done, Phase 10 marked done for current ship-set); add v58 14-plan empirical snapshot (1 viable / 5 fragile / 8 doom); reorder Next likely move now that proposal 141 has shipped — Phase 5 citation verifier promoted to PlanExeOrg#1, Phase 8 samplers added as PlanExeOrg#2 with v58 cases that bite now, Phase 9 composite-band cap as PlanExeOrg#3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant