napkin-math: wire prior-baseline ledger into Stage 0 reruns#756
napkin-math: wire prior-baseline ledger into Stage 0 reruns#756neoneye wants to merge 4 commits into
Conversation
PR #753 added prompt rules so the extract LLM consumes a # Prior Signal Ledger when one is present in the digest, but the orchestration that produces the digest never instructed agents to pass --prior. Empirical confirmation: every v58 extract_parameters_input.md has 0 occurrences of 'Prior Signal Ledger'; the discipline existed, the run did not use it. Path-Forward item 1 from docs/20260522_plan.md, scoped to a single PR-sized change. Items 2–6 (strict audit, deterministic gate selection, corpus regression report, regenerated snapshot, methodology backlog) are explicitly out of scope. Changes: - run-napkin-math-pipeline SKILL.md: Stage 0 dispatch row and command block now show --prior; new 'Prior-baseline context for reruns' subsection states it is mandatory whenever a prior version exists for the slug, omitted only on first-iteration plans, and explains prior-version resolution. Inputs scenario (a) and resume-mode decision table updated to match. - extract-parameters-from-{digest,full} SKILL.md: one-paragraph pointer so an agent invoking the skill standalone knows the ledger is upstream-controlled by --prior. - prepare_extract_input.py: module docstring documents --prior as the wiring point for pipeline reruns. No behavior change; --prior was already implemented. - tests/test_prepare_extract_input_prior_ledger.py: three new CLI-level tests covering main() with --prior (ledger appears), main() without --prior (digest stays clean), and main() with a non-existent --prior path (loud SystemExit). These lock the orchestration contract so a future refactor cannot silently drop the wiring. Acceptance evidence: out-of-tree CLI demo shows the digest contains '# Prior Signal Ledger' and the supplied prior signal name when --prior is passed, and contains 0 occurrences when it is not. 146 napkin_math tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review of PR #756 surfaced two issues in the orchestration rules: P1 (blocking). The 'Resume mode' carve-out exempted pinned digests from --prior wiring, which preserved the v58 failure mode: a digest produced earlier without --prior could flow straight into Stage 1 untouched while extract-from-digest treated the absent ledger as a first-iteration baseline. A target dir with a digest, an available prior, and no ledger could extract as if no prior existed. Open Question on prior resolution. 'Most-recent earlier version' was a wrong default because output/ contains experimental letter-suffixed dirs (v52a/b/c, v57a_<topic>, …) alongside integer-numbered accepted baselines. Comparability against accepted snapshots is what the 2026-05-22 plan reset cares about. Changes: - run-napkin-math-pipeline SKILL.md: new mandatory Stage 1 preflight (deterministic 'grep -q # Prior Signal Ledger' against the digest) that applies regardless of whether Stage 0 just ran or is being resumed. When a prior baseline exists and the ledger is absent, the orchestrator must STOP and offer two resolutions: regenerate Stage 0 with --prior (the pinned-digest cardinal rule explicitly does not apply when repairing for ledger), or record an explicit '$D/.no_prior_waiver' file naming the reason. Replaces the 'Selecting the prior' default — accepted-baseline-named-by-user, integer-numbered versions are the typical shape, letter-suffixed dirs are probes and are never auto-selected. Resume-mode decision table row updated to point at the preflight. - extract-parameters-from-{digest,full} SKILL.md: retract the 'when absent, treat as first-iteration baseline' rule that contradicted the preflight. Absent-ledger is now declared ambiguous on its own; the orchestrator disambiguates via the Stage 1 preflight. Standalone invocations must confirm no prior accepted baseline exists for the slug, or stop and ask. Verification: 146 napkin_math tests pass. No production-Python behaviour change; the preflight is a Bash check the orchestrator runs, plus rule updates in three SKILL.md files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed both review items in 1cab128. P1 (blocking) — closed. Replaced the "Resume mode" carve-out with a mandatory Stage 1 preflight that applies regardless of whether Stage 0 just ran or is being resumed: grep -q '# Prior Signal Ledger' $D/extract_parameters_input.mdWhen a prior baseline exists and that grep returns non-zero, the orchestrator MUST stop and offer two resolutions:
I also retracted the contradicting "treat absent-ledger as first-iteration" line from both extract SKILL.md files. Absent-ledger is now declared ambiguous on its own; the orchestrator disambiguates via the preflight. Standalone invocations of the extract skill must confirm no prior accepted baseline exists for the slug, or stop and ask. Open Question — addressed. Replaced the "most-recent earlier version" default with "accepted baseline named by the user." The skill now explicitly says integer-numbered versions are the typical accepted shape (v(N-1) for a v(N) → v(N+1) bump); letter-suffixed dirs ( Verification: 146 napkin_math tests still pass. No production-Python behaviour change — the preflight is a Note on test coverage. The preflight is a doc-level orchestration rule, not Python. A unit test of an orchestrator instruction is reviewing prose. If you'd prefer a deterministic enforcement point (e.g. a |
…h-start rule Review of PR #756 surfaced two P2 issues: 1. The fresh-start instructions in 'Inputs scenario (a)' still said to 'resolve the most-recent prior parameters.json' when a prior exists, contradicting the accepted-baseline-named-by-user rule earlier in the same skill. Agents following the Inputs section directly could still auto-select a probe dir (v52a/v53b-style) as the prior. 2. The Stage 1 preflight grep checked ledger presence but not ledger correctness. A digest containing a ledger built from the wrong prior baseline would still pass, defeating the comparability goal. Changes: - prepare_extract_input.py: new render_prior_source helper renders the prior path relative to NAPKIN_MATH_DIR when applicable (so the stamp is stable across worktrees), else absolute. build_prior_signal_ledger and build_combined_digest accept an optional prior_path argument; when supplied, the ledger header now includes a 'Prior source: `<path>`' line. main() threads the resolved prior path through. Backwards compatible: callers passing only prior_params get the previous behaviour with no stamp. - run-napkin-math-pipeline SKILL.md: Stage 1 preflight extended with a second grep that verifies 'Prior source:.*$BASELINE/$SLUG/parameters.json' against the named accepted baseline. Both checks must pass; either failure stops Stage 1 with the same two-resolution choice (regenerate Stage 0 or write $D/.no_prior_waiver). Resume-mode decision table and the 'Prior-baseline context for reruns' intro updated to mention the stamp. Inputs scenario (a) rewritten — ask the user to name/confirm the accepted baseline, never auto-select letter-suffixed probe dirs. - tests: 4 new tests covering the stamp — stamp omitted when no path supplied (backwards-compat), relative rendering under NAPKIN_MATH_DIR, absolute rendering outside it, and render_prior_source standalone. The existing CLI-level test extended to assert the stamp lands in the digest end-to-end. Acceptance evidence: CLI demo shows the digest contains 'Prior source: `output/v58/demo_slug/parameters.json`'; orchestrator preflight grep against the named baseline (v58/demo_slug) returns True; grep against a wrong baseline (v52a/demo_slug) returns False. 150 napkin_math tests pass (was 146). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Both P2 items addressed in 85f7459. P2 #1 — fresh-start contradiction fixed. "Inputs scenario (a)" no longer says "resolve the most-recent prior." It now reads: "ask the user to name (or confirm) the accepted baseline — typically the immediately-preceding integer-numbered version — and pass it as P2 #2 — preflight now checks correctness, not just presence. Implemented the "better" option you suggested: # (a) Ledger present?
grep -q '# Prior Signal Ledger' $D/extract_parameters_input.md
# (b) Ledger built from the named accepted baseline?
grep -q "Prior source:.*\$BASELINE/\$SLUG/parameters.json" \
$D/extract_parameters_input.mdBoth must pass. Either failure stops Stage 1 with the same two-resolution choice (regenerate Stage 0 or write Implementation details on the stamp:
CLI acceptance evidence: A digest built against the named baseline ( Verification: 150 napkin_math tests pass (was 146). |
Summary
parameters.jsonintoprepare_extract_input.py --priorso the extract LLM actually sees the# Prior Signal Ledgerthat PR napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3) #753 (napkin-math: prior-signal ledger orchestration for extract skill (proposal 141 PR 3) #753) added rules for.prepare_extract_input.py --prioralready existed; the wiring change is teaching the orchestrator (and any agent invoking the extract skills standalone) to use it.experiments/napkin_math/docs/20260522_plan.md. Items 2–6 explicitly out of scope.Why
PR #753 told the extract LLM to record
dropped_signalswithorigin: "prior_baseline"when a Prior Signal Ledger is present in the digest. The discipline existed, but no orchestration ever supplied the ledger:That regression — 124 unexplained absent signals in the v49 → v58 source-preservation audit — is the load-bearing observation in the 2026-05-22 plan reset.
Changes (5 files, +169/-3)
.claude/skills/run-napkin-math-pipeline/SKILL.md--prior; new "Prior-baseline context for reruns" subsection makes it mandatory when a prior version exists for the slug; Inputs scenario (a) and resume-mode decision table updated..claude/skills/extract-parameters-from-digest/SKILL.md--prior..claude/skills/extract-parameters-from-full/SKILL.mdprepare_extract_input.py--prioras the wiring point. No behaviour change.tests/test_prepare_extract_input_prior_ledger.pymain()with--prior(ledger lands in digest), without--prior(digest stays clean), and with a non-existent prior path (loudSystemExit). Locks the orchestration contract.Acceptance evidence
CLI demo (out-of-tree, with
run_compressmonkey-patched so it doesn't need worker_plan):Test suite:
Out of scope
Documented for follow-up, not implemented here:
prepare_extract_input.py(e.g. a--auto-priorflag scanningoutput/v*/<slug>/). Considered and rejected for this PR: explicit is clearer than implicit, and the orchestrator can already resolve the prior path. If future runs keep forgetting--prior, revisit.Test plan
pytest experiments/napkin_math/tests/— 146 passedpytest experiments/napkin_math/tests/test_prepare_extract_input_prior_ledger.py -v— 15 passed (12 existing + 3 new CLI-level)--prior, absent without it, loud failure on bad path🤖 Generated with Claude Code