Batch 4 (phase-1) + v2 prompt overlay + COMPILER_STRATEGY by anantham · Pull Request #49 · anantham/LexiconForge

anantham · 2026-05-13T19:42:34Z

Summary

Opens batch 4 of MN10 hand-curation with phase-1 (the first teaching-content phase), ships the v2 prompt overlay codifying all session learnings as a ready-to-wire compiler amendment, and adds the strategic-economic analysis document that emerged from this session.

Base note: This PR is stacked on #48 (batch-3). GitHub will auto-rebase the base to main when #48 merges. Review the 3 batch-4 commits independently.

Commits

110f1d0 — phase-1 curation: Ekāyano ayaṁ Bhikkhave maggo
First teaching-content phase after 8 framing phases. The famously-contested compound ekāyano gets a 5-sense translator-debate cycle (direct / one-way / solitary / convergent / only) with per-tradition notes citing Sujato, Bhikkhu Bodhi, and the Chinese parallel EA 12.1. New curation log at docs/sutta-studio/curation/phase-1.md.
2d198f6 — v2 prompt context amendments (config/suttaStudioPromptContextV2.ts, 280 lines new)
Six amendment blocks codifying MN10 batches 1-4 learnings:
- TOOLTIP REGISTER (pay-rent rule, strengthens v1's jargon-with-explanation)
- ARROW-EARNING RULE (relations encode Pāli case-quirks English lacks)
- SENSE METADATA (epistemicBasis, sourceCitationIds, confidence, notes)
- ANCHOR SELECTION (one isAnchor per phase, semantic centerpiece)
- TRANSLATOR-DEBATE AWARENESS (multi-sense curatorial pattern for contested words)
- CROSS-PHASE AWARENESS (recurring-lemma cross-references when envelope available)
Standalone module — not yet wired into prompts.ts. Companion PR will wire + run on phase-2.
ea528d9 — COMPILER_STRATEGY.md + HANDOVER.md replacement
- docs/sutta-studio/COMPILER_STRATEGY.md (289 lines, new): economic-strategic analysis of pipeline vs hand-curation. Quality bands (35% v1 / 65% v2 / 85% +post-passes / 100% hand), Pareto distribution (10-15 phases pedagogically critical, 35-40 routine), per-compile cost estimates, scaling roadmap, irreducibly-human gaps.
- docs/HANDOVER.md (180 lines, replaces prior): 17-commit inventory across 3 branches, schema-tensions status, protocol-amendment summary, refrain progression, 10 pending threads, strategic-pivot decision flagged for next session.

Strategic context

Per COMPILER_STRATEGY.md, this PR closes batch 4 at one phase (phase-1) and recommends pivoting from linear hand-curation to wiring v2 + building 4 deterministic post-passes (morph-from-POS, citation-linker, cross-phase-facet detector, §3.4 linter). The pivot is ~7 hr cheaper for MN10 alone and inherits a multiplier for every future sutta. Decision is deferred to a follow-up PR.

Test plan

npm test services/providers/dpd.test.ts (regression coverage from batch-3 DPD fix carries forward)
Visual smoke: open /sutta/demo, hover phase-1 words, confirm 5-sense ekāyano cycle renders with translator-tradition notes in the audit modal
Confirm config/suttaStudioPromptContextV2.ts is exported but not yet imported by services/compiler/prompts.ts (intentional — overlay is staged for follow-up wiring PR)
Read docs/sutta-studio/COMPILER_STRATEGY.md end-to-end before next compiler work; it explains the pivot rationale

🤖 Generated with Claude Code

…e maggo (batch 4 opens; first teaching content) After 8 phases of narrative framing (a-h: "Thus I have heard… the Buddha addressed the monks… the monks replied… the Buddha said this"), phase-1 opens the satipaṭṭhāna teaching proper with its famous declaratory claim: 'This, monks, is the direct path.' 4 words / 8 segments. Centerpiece: 'ekāyano' is one of the most-debated words in the entire Pāli canon. ek + āyana (from root i, to go) can read as 'one-going / direct' (Sujato), 'going to one / convergent', 'going alone / solitary' (Bhikkhu Bodhi), 'one and only' (doctrinally controversial), or 'unified'. The packet ALREADY had 5 senses encoding this debate; this commit grounds each sense with curatorial basis + per-sense notes citing the tradition. Changes: - p1 Ekāyano: isAnchor=true. morph nom/sg/m on p1s3. DROPPED "Way TO" type=ownership relation (didn't fit case-quirk palette per the arrow-earning rule we just ratified in 9830ef1). 5 senses all curatorial with per-sense notes: direct (Sujato, high) / one-way (older, medium) / solitary (Bodhi, medium) / convergent (interpretive, low) / only (controversial, low). Plain-first explanations of the ek- and āyan- elements + the translator-debate framing. - p2 ayaṁ: color-explanation facet + cross-references to etad (phase-h) and te (phase-g) — same demonstrative system, different cases. morph nom/sg/m. DROPPED "This IS" type=direction relation (universal grammar, no case-quirk earns the arrow). 1 sense lexical + dpd:8757. - p3 Bhikkhave: REFRAIN HIT #5 — refrain-explanation facet on p3s2 references all prior bhikkhu appearances. morph voc/pl/m. 5 senses: 2 lexical (Mendicants/Monks + dpd:49868), 2 etymological (Sharers from bhaj-share, Seekers from bhī-kkh danger-seer), 1 curatorial (Friends — Thanissaro-style relational rendering). - p4 maggo: plain-first rewrite. morph nom/sg/m. 3 senses lexical: path/road (dpd:50495) + method (dpd:50496 — the abstract sense that frames satipaṭṭhāna as METHOD, not just road). - 4 new packet.citations: dpd:8757 ayaṁ, dpd:49868 bhikkhave, dpd:50495 magga-road, dpd:50496 magga-method. Total 28 → 32. Schema tension #12 (arrow-earning rule): STABLE post-codification. Phase-1's two pre-existing relations both failed the rule and were dropped. Schema tension #1 (DPD stripper): STAYS RESOLVED. Ekāyono is the 5th Lookup-gap surface across batch 3+4 (Bhikkhavo, Bhadante, etad, avoca, Ekāyono). Pattern: certain inflected/compound forms fall outside DPD's enumeration even when the lemma is attested. Defer upstream action; revisit after phase-2/3 to see if morphology-generator fallback is warranted. NEW PATTERN observed: translator-debate as first-class curation. When a word has multiple legitimate scholarly readings (Ekāyano: 5 readings), surface them as distinct senses with 'curatorial' basis, per-sense notes citing the tradition, and confidence ranking. Reader cycles through the debate rather than receiving an authorial verdict. Proposed §3.4.2 amendment for next docs commit cycle. Refrain status — fully mature: - bhikkhu: 5/9 phases (e/f/g/h/1) - bhagavā: 4/9 phases across 3 forms (b/e/g/h) - viharati: 1/9 (expected to recur in phase-2's satipaṭṭhāna formula) Curation log: docs/sutta-studio/curation/phase-1.md (§0-§8). Includes proposed §3.4.2 translator-debate cycle rule. Tests: worktree sandbox restricted test execution this commit (env issue, not packet-related). JSON validates; structural assertions confirm integrity (isAnchor, morph hints, citations, basis distribution). Batch 4 opens. 9/51 phases curated (a-h + phase-1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on learnings as compiler-prompt overlay Adds config/suttaStudioPromptContextV2.ts — 6 amendment blocks codifying protocol learnings from MN10 hand-curation (batches 1-4): 1. TOOLTIP REGISTER — strengthens v1's "JARGON-WITH-EXPLANATION" to the full §3.4 pay-rent rule. Drop bracketed grammar prefixes, √ symbols without prose, and emoji defaults; keep technical terms only when they pay rent (precision required + glossed inline). 2. ARROW-EARNING RULE — refines v1 relations guidance with the rule ratified in FEATURES.md §1.3: relations earn their arrow when Pāli's case-marker does work English doesn't have an analog for. NOT for subject-of-active-verb, direct-object-of-verb, or demonstrative agreement. Includes earned/not-earned examples from curated phases. 3. SENSE METADATA — new fields v1 doesn't mention: epistemicBasis (lexical/grammatical/curatorial/etymological/commentarial/contextual/ comparative), sourceCitationIds (DPD wiring), confidence (high/medium/low), notes (translator tradition references). 4. ANCHOR SELECTION — exactly one isAnchor per phase, semantic centerpiece. Heuristics for verb-anchor / contested-word-anchor / proper-noun-anchor / framing-anchor (from phase-a/c/d/e/g/h/1). 5. TRANSLATOR-DEBATE AWARENESS — for famously-contested words (ekāyano, ātāpī, sampajāno, etc.), generate multiple senses representing distinct scholarly readings, each with curatorial epistemicBasis + per-tradition notes + confidence ranking. Worked example from phase-1's Ekāyono 5-sense cycle. 6. CROSS-PHASE AWARENESS — when phase-state envelope provides prior- phase context, recurring lemmas should get cross-reference facets (≤4 phases back). Three pattern categories: same-lemma-new-form, same-lemma-new-role, parallel-structures. These amendments are NOT yet wired into the compiler — this commit ships the overlay as a standalone module that can be imported by buildPhasePrompt (and the relevant Anatomist/Lexico passes) behind a feature flag or unconditionally in a future refactor. Companion to: - docs/sutta-studio/CURATION_PROTOCOL.md §3.4 + §9.1 + §3.4.1 - docs/sutta-studio/FEATURES.md §1.3 arrow-earning rule - 9 hand-curated phases in components/sutta-studio/demoPacket.json + curation/phase-{a,b,c,d,e,f,g,h,1}.md Next step: wire v2 into prompts.ts and re-run compiler on phase-2 to test. The companion analysis "what v2 would change about phase-2" lives in this commit's PR description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the strategic-economic analysis that emerged from MN10 batches 1-4 hand-curation. The conversation surfaced ~14 distinct insights about pipeline vs hand-curation economics, cost telemetry, scaling, and the strategic pivot — none of which lived in the codebase until this commit. docs/sutta-studio/COMPILER_STRATEGY.md (289 lines, new): §1 The economic shape — quality bands (35% v1 / 65% v2 / 85% +post-passes / 100% hand), Pareto distribution (10-15 phases pedagogically critical, 35-40 are routine recurrences), per-compile cost estimates ($0.10-0.30 Gemini Flash / $1-3 Sonnet / $3-10 Opus per MN10). §2 What the pipeline does today vs what it could do — 11-row matrix classifying each hand-curation move as: learnable by prompt / deterministic post-processable / irreducibly human. §3 What's irreducibly human — translator-tradition citations, pedagogical taste, curation-log narrative. ~5-8 phases per sutta fall in this bucket. §4 Cost telemetry — surprised discovery that services/apiMetricsService.ts already records every API call with tokens+cost+apiType=sutta_studio to IndexedDB. Missing: phaseId attribution, UI, prompt caching, local-vs-LLM split beyond DPD. 3-step plan to close gaps. ccusage for Claude-Code-side conversation cost. §5 Scaling roadmap — 5 stages: hand-curate MN10 exemplar (in progress) → wire v2 overlay → build 4 deterministic post-passes → run on DN22 with selective polish (~5-6 hr vs ~30 hr from scratch) → satipaṭṭhāna sub-corpus → cross-pattern (~20-30 patterns covering most of the canon). §6 Open questions — translator-tradition DB, DPD Lookup-gap pattern resolution, prompt-caching tradeoff, when to wire v2, pedagogical- fidelity floor for routine phases. docs/HANDOVER.md (180 lines, replaces prior 2026-05-12 handover): - Full 17-commit inventory across 3 branches (PR #47 from prior session, PR #48 batch-3 from today, batch-4 branch from today) - DPD root-cause fix details (coverage 86.9% → 89.5%, 458 sqlite-lookup vs 20 heuristic-fallback vs 56 unmatched, better-sqlite3 dep, one-time 168 MB download) - Schema tensions status (#1 RESOLVED at root, #7 RESOLVED prior, #12 RESOLVED via documentation, Lookup-gap as new observation) - 3 protocol amendments codified (§9.1, §3.4.1, FEATURES §1.3) - 5 phase logs added (e/f/g/h/1) - Refrain status (bhikkhu 5/9, bhagavā 4/9, viharati 1/9) - 10 pending threads in priority order with effort estimates - The pending strategic pivot decision flagged for next session - Worktree convention + bash sandbox quirk + 3-branch base structure documented as non-obvious context - Resume instructions branch on pivot decision Both docs written by parallel subagents with full context briefings; reviewed and committed by main session. Companion to the v2 prompt overlay (2d198f6) and the protocol amendments (c6b150f + 9830ef1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-13T19:42:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lexicon-forge	Ready	Preview, Comment	May 13, 2026 7:42pm

anantham and others added 3 commits May 13, 2026 14:49

This was referenced May 13, 2026

Wire V2 amendments into live compiler (activates 2d198f6) #50

Closed

Compiler consolidation Phase 0+1 — design doc + canonical prompts #51

Merged

anantham changed the base branch from feat/opus-batch3-curation to main May 13, 2026 22:31

anantham merged commit c17924d into main May 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch 4 (phase-1) + v2 prompt overlay + COMPILER_STRATEGY#49

Batch 4 (phase-1) + v2 prompt overlay + COMPILER_STRATEGY#49
anantham merged 3 commits into
mainfrom
feat/opus-batch4-curation

anantham commented May 13, 2026

Uh oh!

vercel Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantham commented May 13, 2026

Summary

Commits

Strategic context

Test plan

Uh oh!

vercel Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant