Skip to content

napkin-math: wire prior-baseline ledger into Stage 0 reruns#756

Open
neoneye wants to merge 4 commits into
mainfrom
napkin-math/prior-ledger-wiring
Open

napkin-math: wire prior-baseline ledger into Stage 0 reruns#756
neoneye wants to merge 4 commits into
mainfrom
napkin-math/prior-ledger-wiring

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented May 21, 2026

Summary

Why

PR #753 told the extract LLM to record dropped_signals with origin: "prior_baseline" when a Prior Signal Ledger is present in the digest. The discipline existed, but no orchestration ever supplied the ledger:

$ grep -c 'Prior Signal Ledger' experiments/napkin_math/output/v58/*/extract_parameters_input.md
…/20260516_datacenter_in_france/extract_parameters_input.md:0
…/20260201_yellowstone_evacuation/extract_parameters_input.md:0
…/20251114_paperclip_automation/extract_parameters_input.md:0
… (every v58 plan: 0)
…/v57a_paperclip_priorledger/extract_parameters_input.md:1   ← the one experimental run

That regression — 124 unexplained absent signals in the v49 → v58 source-preservation audit — is the load-bearing observation in the 2026-05-22 plan reset.

Changes (5 files, +169/-3)

File What changed
.claude/skills/run-napkin-math-pipeline/SKILL.md Stage 0 dispatch row + command block show --prior; new "Prior-baseline context for reruns" subsection makes it mandatory when a prior version exists for the slug; Inputs scenario (a) and resume-mode decision table updated.
.claude/skills/extract-parameters-from-digest/SKILL.md One-paragraph pointer so a standalone invocation knows the ledger is upstream-controlled by --prior.
.claude/skills/extract-parameters-from-full/SKILL.md Same pointer for the full-report extract path.
prepare_extract_input.py Module docstring documents --prior as the wiring point. No behaviour change.
tests/test_prepare_extract_input_prior_ledger.py Three new CLI-level tests: main() with --prior (ledger lands in digest), without --prior (digest stays clean), and with a non-existent prior path (loud SystemExit). Locks the orchestration contract.

Acceptance evidence

CLI demo (out-of-tree, with run_compress monkey-patched so it doesn't need worker_plan):

=== ACCEPTANCE EVIDENCE ===
With --prior:    '# Prior Signal Ledger' appears 1x
With --prior:    'demo_signal' appears        1x
Without --prior: 'Prior Signal Ledger' appears 0x

Test suite:

146 passed, 6 subtests passed in 0.41s

Out of scope

Documented for follow-up, not implemented here:

  • Path-Forward item 2 — strict-mode source-preservation audit gate. The wiring this PR ships is the necessary precondition; a strict policy that fails v49 → v58 is the next PR.
  • Auto-discovery of the most-recent prior in prepare_extract_input.py (e.g. a --auto-prior flag scanning output/v*/<slug>/). Considered and rejected for this PR: explicit is clearer than implicit, and the orchestrator can already resolve the prior path. If future runs keep forgetting --prior, revisit.
  • Items 3–6 (deterministic gate selection, corpus regression report, regenerated snapshot, methodology backlog) — orthogonal to this wiring.

Test plan

  • pytest experiments/napkin_math/tests/ — 146 passed
  • pytest experiments/napkin_math/tests/test_prepare_extract_input_prior_ledger.py -v — 15 passed (12 existing + 3 new CLI-level)
  • CLI demo above: ledger present with --prior, absent without it, loud failure on bad path
  • Reviewer to spot-check the run skill's "Prior-baseline context for reruns" subsection for clarity (this is the load-bearing doc; a clear rule that agents will actually follow is the whole point)

🤖 Generated with Claude Code

neoneye and others added 2 commits May 22, 2026 01:28
PR #753 added prompt rules so the extract LLM consumes a # Prior Signal Ledger when one is present in the digest, but the orchestration that produces the digest never instructed agents to pass --prior. Empirical confirmation: every v58 extract_parameters_input.md has 0 occurrences of 'Prior Signal Ledger'; the discipline existed, the run did not use it.

Path-Forward item 1 from docs/20260522_plan.md, scoped to a single PR-sized change. Items 2–6 (strict audit, deterministic gate selection, corpus regression report, regenerated snapshot, methodology backlog) are explicitly out of scope.

Changes:

- run-napkin-math-pipeline SKILL.md: Stage 0 dispatch row and command block now show --prior; new 'Prior-baseline context for reruns' subsection states it is mandatory whenever a prior version exists for the slug, omitted only on first-iteration plans, and explains prior-version resolution. Inputs scenario (a) and resume-mode decision table updated to match.

- extract-parameters-from-{digest,full} SKILL.md: one-paragraph pointer so an agent invoking the skill standalone knows the ledger is upstream-controlled by --prior.

- prepare_extract_input.py: module docstring documents --prior as the wiring point for pipeline reruns. No behavior change; --prior was already implemented.

- tests/test_prepare_extract_input_prior_ledger.py: three new CLI-level tests covering main() with --prior (ledger appears), main() without --prior (digest stays clean), and main() with a non-existent --prior path (loud SystemExit). These lock the orchestration contract so a future refactor cannot silently drop the wiring.

Acceptance evidence: out-of-tree CLI demo shows the digest contains '# Prior Signal Ledger' and the supplied prior signal name when --prior is passed, and contains 0 occurrences when it is not. 146 napkin_math tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review of PR #756 surfaced two issues in the orchestration rules:

P1 (blocking). The 'Resume mode' carve-out exempted pinned digests from --prior wiring, which preserved the v58 failure mode: a digest produced earlier without --prior could flow straight into Stage 1 untouched while extract-from-digest treated the absent ledger as a first-iteration baseline. A target dir with a digest, an available prior, and no ledger could extract as if no prior existed.

Open Question on prior resolution. 'Most-recent earlier version' was a wrong default because output/ contains experimental letter-suffixed dirs (v52a/b/c, v57a_<topic>, …) alongside integer-numbered accepted baselines. Comparability against accepted snapshots is what the 2026-05-22 plan reset cares about.

Changes:

- run-napkin-math-pipeline SKILL.md: new mandatory Stage 1 preflight (deterministic 'grep -q # Prior Signal Ledger' against the digest) that applies regardless of whether Stage 0 just ran or is being resumed. When a prior baseline exists and the ledger is absent, the orchestrator must STOP and offer two resolutions: regenerate Stage 0 with --prior (the pinned-digest cardinal rule explicitly does not apply when repairing for ledger), or record an explicit '$D/.no_prior_waiver' file naming the reason. Replaces the 'Selecting the prior' default — accepted-baseline-named-by-user, integer-numbered versions are the typical shape, letter-suffixed dirs are probes and are never auto-selected. Resume-mode decision table row updated to point at the preflight.

- extract-parameters-from-{digest,full} SKILL.md: retract the 'when absent, treat as first-iteration baseline' rule that contradicted the preflight. Absent-ledger is now declared ambiguous on its own; the orchestrator disambiguates via the Stage 1 preflight. Standalone invocations must confirm no prior accepted baseline exists for the slug, or stop and ask.

Verification: 146 napkin_math tests pass. No production-Python behaviour change; the preflight is a Bash check the orchestrator runs, plus rule updates in three SKILL.md files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neoneye
Copy link
Copy Markdown
Member Author

neoneye commented May 21, 2026

Addressed both review items in 1cab128.

P1 (blocking) — closed. Replaced the "Resume mode" carve-out with a mandatory Stage 1 preflight that applies regardless of whether Stage 0 just ran or is being resumed:

grep -q '# Prior Signal Ledger' $D/extract_parameters_input.md

When a prior baseline exists and that grep returns non-zero, the orchestrator MUST stop and offer two resolutions:

  1. Delete the digest and re-run Stage 0 with --prior <baseline>/<slug>/parameters.json. The pinned-digest cardinal rule explicitly does not apply when the digest is being repaired for missing ledger.
  2. Write $D/.no_prior_waiver with a one-line reason. The waiver file is the only path that lets Stage 1 proceed with absent ledger when a prior exists.

I also retracted the contradicting "treat absent-ledger as first-iteration" line from both extract SKILL.md files. Absent-ledger is now declared ambiguous on its own; the orchestrator disambiguates via the preflight. Standalone invocations of the extract skill must confirm no prior accepted baseline exists for the slug, or stop and ask.

Open Question — addressed. Replaced the "most-recent earlier version" default with "accepted baseline named by the user." The skill now explicitly says integer-numbered versions are the typical accepted shape (v(N-1) for a v(N) → v(N+1) bump); letter-suffixed dirs (v52a, v53b, v57a_<topic>, …) are experimental probes and are never auto-selected. If the user has not named the baseline, the orchestrator asks rather than guessing or using mtime.

Verification: 146 napkin_math tests still pass. No production-Python behaviour change — the preflight is a Bash check the orchestrator runs plus prose updates in three SKILL.md files.

Note on test coverage. The preflight is a doc-level orchestration rule, not Python. A unit test of an orchestrator instruction is reviewing prose. If you'd prefer a deterministic enforcement point (e.g. a validate_parameters.py rule, or a small check_prior_ledger.py invoked by the orchestrator), say so and I'll add it in a follow-up — but the deterministic grep already gives the agent a one-liner to run, which is the same enforcement surface the script-based checks would have.

neoneye and others added 2 commits May 22, 2026 01:45
…h-start rule

Review of PR #756 surfaced two P2 issues:

1. The fresh-start instructions in 'Inputs scenario (a)' still said to 'resolve the most-recent prior parameters.json' when a prior exists, contradicting the accepted-baseline-named-by-user rule earlier in the same skill. Agents following the Inputs section directly could still auto-select a probe dir (v52a/v53b-style) as the prior.

2. The Stage 1 preflight grep checked ledger presence but not ledger correctness. A digest containing a ledger built from the wrong prior baseline would still pass, defeating the comparability goal.

Changes:

- prepare_extract_input.py: new render_prior_source helper renders the prior path relative to NAPKIN_MATH_DIR when applicable (so the stamp is stable across worktrees), else absolute. build_prior_signal_ledger and build_combined_digest accept an optional prior_path argument; when supplied, the ledger header now includes a 'Prior source: `<path>`' line. main() threads the resolved prior path through. Backwards compatible: callers passing only prior_params get the previous behaviour with no stamp.

- run-napkin-math-pipeline SKILL.md: Stage 1 preflight extended with a second grep that verifies 'Prior source:.*$BASELINE/$SLUG/parameters.json' against the named accepted baseline. Both checks must pass; either failure stops Stage 1 with the same two-resolution choice (regenerate Stage 0 or write $D/.no_prior_waiver). Resume-mode decision table and the 'Prior-baseline context for reruns' intro updated to mention the stamp. Inputs scenario (a) rewritten — ask the user to name/confirm the accepted baseline, never auto-select letter-suffixed probe dirs.

- tests: 4 new tests covering the stamp — stamp omitted when no path supplied (backwards-compat), relative rendering under NAPKIN_MATH_DIR, absolute rendering outside it, and render_prior_source standalone. The existing CLI-level test extended to assert the stamp lands in the digest end-to-end.

Acceptance evidence: CLI demo shows the digest contains 'Prior source: `output/v58/demo_slug/parameters.json`'; orchestrator preflight grep against the named baseline (v58/demo_slug) returns True; grep against a wrong baseline (v52a/demo_slug) returns False. 150 napkin_math tests pass (was 146).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neoneye
Copy link
Copy Markdown
Member Author

neoneye commented May 21, 2026

Both P2 items addressed in 85f7459.

P2 #1 — fresh-start contradiction fixed. "Inputs scenario (a)" no longer says "resolve the most-recent prior." It now reads: "ask the user to name (or confirm) the accepted baseline — typically the immediately-preceding integer-numbered version — and pass it as --prior. Never auto-select a letter-suffixed dir (v52a, v53b, v57a_<topic>, …) as the prior; those are experimental probes." Consistent with the rule at lines 112–120.

P2 #2 — preflight now checks correctness, not just presence. Implemented the "better" option you suggested: prepare_extract_input.py now stamps the ledger header with a Prior source: \`line. The path is rendered relative toexperiments/napkin_math/` when the prior lives under that tree (so the stamp is stable across worktrees), else absolute. Stage 1 preflight extended to two greps:

# (a) Ledger present?
grep -q '# Prior Signal Ledger' $D/extract_parameters_input.md

# (b) Ledger built from the named accepted baseline?
grep -q "Prior source:.*\$BASELINE/\$SLUG/parameters.json" \
  $D/extract_parameters_input.md

Both must pass. Either failure stops Stage 1 with the same two-resolution choice (regenerate Stage 0 or write \$D/.no_prior_waiver).

Implementation details on the stamp:

  • build_prior_signal_ledger and build_combined_digest accept an optional prior_path. When supplied, the stamp line lands between the header text and the ## Signals block. When not supplied, no stamp — backwards compatible with programmatic callers.
  • New render_prior_source(prior_path) helper does the relative/absolute rendering. Covered by tests.
  • 4 new tests: stamp omitted when no path is supplied, relative rendering under NAPKIN_MATH_DIR, absolute rendering outside it, and render_prior_source standalone. Existing CLI-level test extended to assert the stamp lands end-to-end through main().

CLI acceptance evidence:

=== STAMP ACCEPTANCE EVIDENCE ===
  Prior source: \`output/v58/demo_slug/parameters.json\`
  (a) ledger present?                   True
  (b) source matches v58/demo_slug?         True
  (b') source matches wrong baseline?    False

A digest built against the named baseline (v58/demo_slug) passes both checks; a preflight against a wrong baseline (v52a/demo_slug) returns False — the differentiation you asked for.

Verification: 150 napkin_math tests pass (was 146).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant