Skip to content

Batch 4 (phase-1) + v2 prompt overlay + COMPILER_STRATEGY#49

Merged
anantham merged 3 commits into
mainfrom
feat/opus-batch4-curation
May 13, 2026
Merged

Batch 4 (phase-1) + v2 prompt overlay + COMPILER_STRATEGY#49
anantham merged 3 commits into
mainfrom
feat/opus-batch4-curation

Conversation

@anantham
Copy link
Copy Markdown
Owner

Summary

Opens batch 4 of MN10 hand-curation with phase-1 (the first teaching-content phase), ships the v2 prompt overlay codifying all session learnings as a ready-to-wire compiler amendment, and adds the strategic-economic analysis document that emerged from this session.

Base note: This PR is stacked on #48 (batch-3). GitHub will auto-rebase the base to main when #48 merges. Review the 3 batch-4 commits independently.

Commits

  1. 110f1d0phase-1 curation: Ekāyano ayaṁ Bhikkhave maggo
    First teaching-content phase after 8 framing phases. The famously-contested compound ekāyano gets a 5-sense translator-debate cycle (direct / one-way / solitary / convergent / only) with per-tradition notes citing Sujato, Bhikkhu Bodhi, and the Chinese parallel EA 12.1. New curation log at docs/sutta-studio/curation/phase-1.md.

  2. 2d198f6v2 prompt context amendments (config/suttaStudioPromptContextV2.ts, 280 lines new)
    Six amendment blocks codifying MN10 batches 1-4 learnings:

    • TOOLTIP REGISTER (pay-rent rule, strengthens v1's jargon-with-explanation)
    • ARROW-EARNING RULE (relations encode Pāli case-quirks English lacks)
    • SENSE METADATA (epistemicBasis, sourceCitationIds, confidence, notes)
    • ANCHOR SELECTION (one isAnchor per phase, semantic centerpiece)
    • TRANSLATOR-DEBATE AWARENESS (multi-sense curatorial pattern for contested words)
    • CROSS-PHASE AWARENESS (recurring-lemma cross-references when envelope available)

    Standalone module — not yet wired into prompts.ts. Companion PR will wire + run on phase-2.

  3. ea528d9COMPILER_STRATEGY.md + HANDOVER.md replacement

    • docs/sutta-studio/COMPILER_STRATEGY.md (289 lines, new): economic-strategic analysis of pipeline vs hand-curation. Quality bands (35% v1 / 65% v2 / 85% +post-passes / 100% hand), Pareto distribution (10-15 phases pedagogically critical, 35-40 routine), per-compile cost estimates, scaling roadmap, irreducibly-human gaps.
    • docs/HANDOVER.md (180 lines, replaces prior): 17-commit inventory across 3 branches, schema-tensions status, protocol-amendment summary, refrain progression, 10 pending threads, strategic-pivot decision flagged for next session.

Strategic context

Per COMPILER_STRATEGY.md, this PR closes batch 4 at one phase (phase-1) and recommends pivoting from linear hand-curation to wiring v2 + building 4 deterministic post-passes (morph-from-POS, citation-linker, cross-phase-facet detector, §3.4 linter). The pivot is ~7 hr cheaper for MN10 alone and inherits a multiplier for every future sutta. Decision is deferred to a follow-up PR.

Test plan

  • npm test services/providers/dpd.test.ts (regression coverage from batch-3 DPD fix carries forward)
  • Visual smoke: open /sutta/demo, hover phase-1 words, confirm 5-sense ekāyano cycle renders with translator-tradition notes in the audit modal
  • Confirm config/suttaStudioPromptContextV2.ts is exported but not yet imported by services/compiler/prompts.ts (intentional — overlay is staged for follow-up wiring PR)
  • Read docs/sutta-studio/COMPILER_STRATEGY.md end-to-end before next compiler work; it explains the pivot rationale

🤖 Generated with Claude Code

anantham and others added 3 commits May 13, 2026 14:49
…e maggo (batch 4 opens; first teaching content)

After 8 phases of narrative framing (a-h: "Thus I have heard… the Buddha
addressed the monks… the monks replied… the Buddha said this"), phase-1
opens the satipaṭṭhāna teaching proper with its famous declaratory claim:
'This, monks, is the direct path.' 4 words / 8 segments.

Centerpiece: 'ekāyano' is one of the most-debated words in the entire
Pāli canon. ek + āyana (from root i, to go) can read as 'one-going /
direct' (Sujato), 'going to one / convergent', 'going alone / solitary'
(Bhikkhu Bodhi), 'one and only' (doctrinally controversial), or 'unified'.
The packet ALREADY had 5 senses encoding this debate; this commit grounds
each sense with curatorial basis + per-sense notes citing the tradition.

Changes:
  - p1 Ekāyano: isAnchor=true. morph nom/sg/m on p1s3. DROPPED "Way TO"
    type=ownership relation (didn't fit case-quirk palette per the
    arrow-earning rule we just ratified in 9830ef1). 5 senses all
    curatorial with per-sense notes: direct (Sujato, high) / one-way
    (older, medium) / solitary (Bodhi, medium) / convergent (interpretive,
    low) / only (controversial, low). Plain-first explanations of the
    ek- and āyan- elements + the translator-debate framing.
  - p2 ayaṁ: color-explanation facet + cross-references to etad (phase-h)
    and te (phase-g) — same demonstrative system, different cases. morph
    nom/sg/m. DROPPED "This IS" type=direction relation (universal grammar,
    no case-quirk earns the arrow). 1 sense lexical + dpd:8757.
  - p3 Bhikkhave: REFRAIN HIT #5 — refrain-explanation facet on p3s2
    references all prior bhikkhu appearances. morph voc/pl/m. 5 senses:
    2 lexical (Mendicants/Monks + dpd:49868), 2 etymological (Sharers
    from bhaj-share, Seekers from bhī-kkh danger-seer), 1 curatorial
    (Friends — Thanissaro-style relational rendering).
  - p4 maggo: plain-first rewrite. morph nom/sg/m. 3 senses lexical:
    path/road (dpd:50495) + method (dpd:50496 — the abstract sense that
    frames satipaṭṭhāna as METHOD, not just road).

  - 4 new packet.citations: dpd:8757 ayaṁ, dpd:49868 bhikkhave,
    dpd:50495 magga-road, dpd:50496 magga-method. Total 28 → 32.

Schema tension #12 (arrow-earning rule): STABLE post-codification. Phase-1's
two pre-existing relations both failed the rule and were dropped.

Schema tension #1 (DPD stripper): STAYS RESOLVED. Ekāyono is the 5th
Lookup-gap surface across batch 3+4 (Bhikkhavo, Bhadante, etad, avoca,
Ekāyono). Pattern: certain inflected/compound forms fall outside DPD's
enumeration even when the lemma is attested. Defer upstream action;
revisit after phase-2/3 to see if morphology-generator fallback is
warranted.

NEW PATTERN observed: translator-debate as first-class curation. When a
word has multiple legitimate scholarly readings (Ekāyano: 5 readings),
surface them as distinct senses with 'curatorial' basis, per-sense notes
citing the tradition, and confidence ranking. Reader cycles through the
debate rather than receiving an authorial verdict. Proposed §3.4.2
amendment for next docs commit cycle.

Refrain status — fully mature:
  - bhikkhu: 5/9 phases (e/f/g/h/1)
  - bhagavā: 4/9 phases across 3 forms (b/e/g/h)
  - viharati: 1/9 (expected to recur in phase-2's satipaṭṭhāna formula)

Curation log: docs/sutta-studio/curation/phase-1.md (§0-§8). Includes
proposed §3.4.2 translator-debate cycle rule.

Tests: worktree sandbox restricted test execution this commit (env issue,
not packet-related). JSON validates; structural assertions confirm
integrity (isAnchor, morph hints, citations, basis distribution).

Batch 4 opens. 9/51 phases curated (a-h + phase-1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on learnings as compiler-prompt overlay

Adds config/suttaStudioPromptContextV2.ts — 6 amendment blocks codifying
protocol learnings from MN10 hand-curation (batches 1-4):

  1. TOOLTIP REGISTER — strengthens v1's "JARGON-WITH-EXPLANATION" to the
     full §3.4 pay-rent rule. Drop bracketed grammar prefixes, √ symbols
     without prose, and emoji defaults; keep technical terms only when
     they pay rent (precision required + glossed inline).
  2. ARROW-EARNING RULE — refines v1 relations guidance with the rule
     ratified in FEATURES.md §1.3: relations earn their arrow when Pāli's
     case-marker does work English doesn't have an analog for. NOT for
     subject-of-active-verb, direct-object-of-verb, or demonstrative
     agreement. Includes earned/not-earned examples from curated phases.
  3. SENSE METADATA — new fields v1 doesn't mention: epistemicBasis
     (lexical/grammatical/curatorial/etymological/commentarial/contextual/
     comparative), sourceCitationIds (DPD wiring), confidence
     (high/medium/low), notes (translator tradition references).
  4. ANCHOR SELECTION — exactly one isAnchor per phase, semantic
     centerpiece. Heuristics for verb-anchor / contested-word-anchor /
     proper-noun-anchor / framing-anchor (from phase-a/c/d/e/g/h/1).
  5. TRANSLATOR-DEBATE AWARENESS — for famously-contested words
     (ekāyano, ātāpī, sampajāno, etc.), generate multiple senses
     representing distinct scholarly readings, each with curatorial
     epistemicBasis + per-tradition notes + confidence ranking.
     Worked example from phase-1's Ekāyono 5-sense cycle.
  6. CROSS-PHASE AWARENESS — when phase-state envelope provides prior-
     phase context, recurring lemmas should get cross-reference facets
     (≤4 phases back). Three pattern categories: same-lemma-new-form,
     same-lemma-new-role, parallel-structures.

These amendments are NOT yet wired into the compiler — this commit ships
the overlay as a standalone module that can be imported by buildPhasePrompt
(and the relevant Anatomist/Lexico passes) behind a feature flag or
unconditionally in a future refactor.

Companion to:
  - docs/sutta-studio/CURATION_PROTOCOL.md §3.4 + §9.1 + §3.4.1
  - docs/sutta-studio/FEATURES.md §1.3 arrow-earning rule
  - 9 hand-curated phases in components/sutta-studio/demoPacket.json
    + curation/phase-{a,b,c,d,e,f,g,h,1}.md

Next step: wire v2 into prompts.ts and re-run compiler on phase-2 to
test. The companion analysis "what v2 would change about phase-2"
lives in this commit's PR description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the strategic-economic analysis that emerged from MN10 batches
1-4 hand-curation. The conversation surfaced ~14 distinct insights about
pipeline vs hand-curation economics, cost telemetry, scaling, and the
strategic pivot — none of which lived in the codebase until this commit.

docs/sutta-studio/COMPILER_STRATEGY.md (289 lines, new):
  §1 The economic shape — quality bands (35% v1 / 65% v2 / 85% +post-passes
     / 100% hand), Pareto distribution (10-15 phases pedagogically critical,
     35-40 are routine recurrences), per-compile cost estimates ($0.10-0.30
     Gemini Flash / $1-3 Sonnet / $3-10 Opus per MN10).
  §2 What the pipeline does today vs what it could do — 11-row matrix
     classifying each hand-curation move as: learnable by prompt /
     deterministic post-processable / irreducibly human.
  §3 What's irreducibly human — translator-tradition citations,
     pedagogical taste, curation-log narrative. ~5-8 phases per sutta
     fall in this bucket.
  §4 Cost telemetry — surprised discovery that services/apiMetricsService.ts
     already records every API call with tokens+cost+apiType=sutta_studio
     to IndexedDB. Missing: phaseId attribution, UI, prompt caching,
     local-vs-LLM split beyond DPD. 3-step plan to close gaps. ccusage
     for Claude-Code-side conversation cost.
  §5 Scaling roadmap — 5 stages: hand-curate MN10 exemplar (in progress)
     → wire v2 overlay → build 4 deterministic post-passes → run on DN22
     with selective polish (~5-6 hr vs ~30 hr from scratch) → satipaṭṭhāna
     sub-corpus → cross-pattern (~20-30 patterns covering most of the canon).
  §6 Open questions — translator-tradition DB, DPD Lookup-gap pattern
     resolution, prompt-caching tradeoff, when to wire v2, pedagogical-
     fidelity floor for routine phases.

docs/HANDOVER.md (180 lines, replaces prior 2026-05-12 handover):
  - Full 17-commit inventory across 3 branches (PR #47 from prior session,
    PR #48 batch-3 from today, batch-4 branch from today)
  - DPD root-cause fix details (coverage 86.9% → 89.5%, 458 sqlite-lookup
    vs 20 heuristic-fallback vs 56 unmatched, better-sqlite3 dep,
    one-time 168 MB download)
  - Schema tensions status (#1 RESOLVED at root, #7 RESOLVED prior,
    #12 RESOLVED via documentation, Lookup-gap as new observation)
  - 3 protocol amendments codified (§9.1, §3.4.1, FEATURES §1.3)
  - 5 phase logs added (e/f/g/h/1)
  - Refrain status (bhikkhu 5/9, bhagavā 4/9, viharati 1/9)
  - 10 pending threads in priority order with effort estimates
  - The pending strategic pivot decision flagged for next session
  - Worktree convention + bash sandbox quirk + 3-branch base structure
    documented as non-obvious context
  - Resume instructions branch on pivot decision

Both docs written by parallel subagents with full context briefings;
reviewed and committed by main session. Companion to the v2 prompt
overlay (2d198f6) and the protocol amendments (c6b150f + 9830ef1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexicon-forge Ready Ready Preview, Comment May 13, 2026 7:42pm

@anantham anantham changed the base branch from feat/opus-batch3-curation to main May 13, 2026 22:31
@anantham anantham merged commit c17924d into main May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant