Compiler consolidation Phase 0+1 — design doc + canonical prompts by anantham · Pull Request #51 · anantham/LexiconForge

anantham · 2026-05-13T20:20:08Z

Summary

Multi-session refactor in progress. This PR ships Phase 0 (design doc) and Phase 1 (canonical prompt builders) of the compiler consolidation work agreed during the 2026-05-13 session.

The dual-stack issue surfaced today: two parallel compiler implementations (services/compiler/ and services/suttaStudio*) had drifted over 4 months. PR #50's response was "wire V2 into both" — patching the symptom. This PR fixes the architecture: single source of truth for prompts, with both stacks consuming via re-export shims.

Base note: Stacked on #49 (batch-4). PR #50 (V2 wire into both) is PAUSED in favor of this work — once consolidation lands, PR #50's commits become redundant.

Commits

`dc57a63` — Phase 0: CONSOLIDATION.md design doc

313 lines. Captures:

Why this exists (dual-stack drift since 2026-01-30)
Target file structure (`services/sutta-studio/{prompts,passes}/` + orchestrator/llm/schemas)
Feature-by-feature migration map (production-only features, benchmark-only features)
4-phase plan with risk assessment per phase
Backward-compat strategy (re-export shims; consumers unchanged)
Test strategy (typecheck + vitest + UI smoke + benchmark smoke)
6 open design questions

`8be501f` — Phase 1: canonical prompt builders

Establishes single source of truth in `services/sutta-studio/prompts/` (7 files + barrel, 422 lines total). Old builder files become shims (382L → 24L; 725L → 528L).

Merge decisions per pass (best-of-both):

Anatomist: 3 examples from benchmark (PHASE_A + PHASE_B + REFRAIN), V2 amendments
Lexicographer: DPD context from production + ripples instructions from benchmark + V2 amendments
Weaver: ANTI_PATTERN + duplicate-mapping guard from benchmark
Typesetter: optional logger from benchmark (production was spamming console)
Skeleton / Morphology / Phase: byte-identical between stacks; just moved

V2 amendments wired into ONE place now. Future protocol amendments cost 1 file edit instead of 2.

What this PR does NOT change

Per Phase 1 scope:

Schemas still differ between the two stacks (Phase 2)
Duplicated buildPhaseStateEnvelope + buildBoundaryContext still exist (Phase 2)
parseJsonResponse helper still duplicated (Phase 2)
LLM-visible prompt content is BYTE-IDENTICAL to PR Wire V2 amendments into live compiler (activates 2d198f6) #50's output. Benchmark/production output should not change.

Verification

✅ `npx tsc --noEmit -p .` — zero errors in touched files
✅ `npx vitest run` — 1332 passed, 16 skipped, 1 failure (build-dpd.test.ts — better-sqlite3 missing from main's node_modules; unrelated to refactor)
⏸ Manual UI smoke — recommend before merge

Next steps after merge

Phase 2: canonical pass functions (`services/sutta-studio/passes/`), DPD context moved into lexicographer pass
Phase 3: single LLM caller
Phase 4: cleanup + close PR Wire V2 amendments into live compiler (activates 2d198f6) #50 (its V2-wire becomes redundant)

Test plan

CI typecheck passes
CI test suite passes (excluding build-dpd.test.ts pre-existing failure)
Manual UI smoke: open /sutta/demo, trigger compile, confirm output structure unchanged
Benchmark CLI smoke: `tsx scripts/sutta-studio/benchmark.ts` runs to completion (no API call needed for the prompt-shape check)
Confirm `services/compiler/prompts.ts` and `services/suttaStudioPassPrompts.ts` still export the same names (for backward compat shim verification)

🤖 Generated with Claude Code

Captures the principled refactor plan agreed during 2026-05-13 session. The Sutta Studio codebase has two parallel compiler stacks (production at services/compiler/, benchmark at services/suttaStudio*) that have drifted in opposite directions over 4 months. This doc plans their consolidation into a single canonical core under services/sutta-studio/. Architecture (Option C from the design conversation): services/sutta-studio/ prompts/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts passes/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts schemas.ts, llm.ts, utils.ts, dictionary.ts, segments.ts orchestrator.ts ← compileSuttaStudioPacket — production wrapper index.ts ← barrel Existing public paths become re-export shims for backward compat; SuttaStudioApp.tsx + benchmark scripts don't change in this refactor. Shims deleted in Phase 4 cleanup. Phasing (~10-15 hr over 2-3 sessions): Phase 0 Design doc (this commit) — DONE Phase 1 Single prompts module — NEXT Phase 2 Canonical pass functions Phase 3 Single LLM caller Phase 4 Cleanup + rework PR #50 Sequencing decision: PR #50 (V2 wire into both builders) is PAUSED until this refactor lands, so V2 amendments wire into ONE canonical builder instead of two divergent ones. Companion to docs/sutta-studio/COMPILER_STRATEGY.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…o services/sutta-studio/prompts/ Per CONSOLIDATION.md (dc57a63). The two parallel compiler stacks (services/compiler/ and services/suttaStudio*) had drifted: each had its own prompt builders that diverged on examples, parameters, and which features were supported. As of PR #50 they were briefly aligned on V2 amendments — this commit locks in the alignment by establishing a SINGLE SOURCE OF TRUTH. NEW canonical location: services/sutta-studio/prompts/ skeleton.ts — buildSkeletonPrompt anatomist.ts — buildAnatomistPrompt (3 examples, V2 amendments) lexicographer.ts — buildLexicographerPrompt (DPD context + ripples + V2) weaver.ts — buildWeaverPrompt (with ANTI_PATTERN guard) typesetter.ts — buildTypesetterPrompt (optional logger injection) morphology.ts — buildMorphologyPrompt phase.ts — buildPhasePrompt (full V2 overlay) index.ts — barrel Merge decisions (best-of-both for each pass): Anatomist: take benchmark's 3 examples (PHASE_A + PHASE_B + REFRAIN) rather than production's single example. The phase-b morph-data example and the refrain-formula example are pedagogically critical. Lexicographer: KEEP production's DPD context + renderDpdBlock helper AND benchmark's RIPPLES instruction block + ripple example. Neither stack had both. Merged here. Weaver: take benchmark's version with DUPLICATE-MAPPING rule + the SUTTA_STUDIO_WEAVER_ANTI_PATTERN reference. Production lacked these. Typesetter: take benchmark's optional logger param (vs production's inline log() spamming every call). Defaults to no-op. Skeleton / Morphology / Phase: byte-identical between stacks; no merge needed beyond moving. V2 amendments wired into THREE passes (Anatomist, Lexicographer, Phase) — one place, not two. Future protocol amendments land in one place. OLD files become re-export shims: services/compiler/prompts.ts 382L → 24L re-export shim services/suttaStudioPassPrompts.ts 725L → 528L (schemas/types/parseJsonResponse kept inline pending Phase 2; only the 7 builder functions removed, replaced by re-exports from the new canonical path) Consumers don't change: - services/compiler/index.ts continues to import from './prompts' - services/suttaStudioPassRunners.ts continues to import from './suttaStudioPassPrompts' - scripts/sutta-studio/benchmark.ts continues to import from suttaStudioPassPrompts All those imports now resolve through the shims to the canonical builders. Verification: - npx tsc --noEmit -p . — ZERO errors in the touched files (all reported errors are pre-existing TS strictness issues in unrelated files: AboutThisText.tsx React namespace, build-dpd.ts better-sqlite3 type, smoke-real-fojin.ts playwright export, etc.) - npx vitest run — 1332 passed, 16 skipped, 0 functional failures. Single failure is build-dpd.test.ts because main's node_modules (symlinked for the test run) lacks better-sqlite3 — that package was added on the not-yet-merged batch-3 branch. Unrelated to refactor. What this commit does NOT do (deferred per CONSOLIDATION.md): - DOES NOT consolidate schemas. compiler/schemas.ts and the schemas in suttaStudioPassPrompts.ts still differ on skeleton.wordRange and anatomist.refrainId fields (Phase 2). - DOES NOT consolidate the duplicated buildPhaseStateEnvelope or buildBoundaryContext functions (still byte-identical duplicates, Phase 2 cleans up alongside pass runner consolidation). - DOES NOT consolidate the parseJsonResponse helper (also duplicated; Phase 2). - DOES NOT change LLM-visible prompt content. The V2-amended Anatomist / Lexicographer / Phase prompts produce the exact same string as PR #50 (currently paused) would. Benchmark output should be byte-identical pre/post this commit. - DOES NOT bump SUTTA_STUDIO_PROMPT_VERSION. Content unchanged means no benchmark cache invalidation. Phase 4 will bump if needed. Companion to: - docs/sutta-studio/CONSOLIDATION.md (dc57a63) — full design doc - PR #50 paused — its V2-wire commits become redundant once this lands Next: Phase 2 — canonical pass functions in services/sutta-studio/passes/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-13T20:20:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lexicon-forge	Ready	Preview, Comment	May 13, 2026 10:26pm

…tions Per CONSOLIDATION.md. Continues the consolidation started in Phase 1 (8be501f). Phase 2a — utilities: Moves services/compiler/utils.ts content (verbatim, byte-identical) to services/sutta-studio/utils.ts. Drops the inline duplicates of buildPhaseStateEnvelope + buildBoundaryContext + parseJsonResponse + stripCodeFences + BoundaryNote/SkeletonPhase/PhaseStageKey types that lived in services/suttaStudioPassPrompts.ts (those were byte-identical duplicates of the compiler/utils.ts versions — confirmed via diff). Single canonical implementation of: - parseJsonResponse + stripCodeFences (JSON parsing with fallback) - buildPhaseStateEnvelope (phase-state context block) - buildBoundaryContext (cross-chapter boundary marker) - getTimeoutSignal + waitFor (AbortSignal helpers) - createCompilerThrottle (rate-limit gating) - buildSourceRefs + computeSourceDigest (provenance helpers) - applyWordRangeToSegments (sub-segment splitting) - chunkPhases (skeleton fallback chunker) - BoundaryNote + SkeletonPhase + PhaseStageKey types Existing services/compiler/utils.ts becomes a 1-line re-export shim. Existing services/suttaStudioPassPrompts.ts re-exports the utilities from the canonical location. Phase 2b — pass functions: Creates services/sutta-studio/passes/ with one file per pass: - skeleton.ts runSkeletonPass (with chunking + fallback) - anatomist.ts runAnatomistPass - lexicographer.ts runLexicographerPass (now accepts optional dpdLookups — previously DPD context was constructed only in compiler/index.ts orchestrator; moving it onto the pass declares the dependency where it belongs) - weaver.ts runWeaverPass - typesetter.ts runTypesetterPass (optional logger injection) - morphology.ts runMorphologyPass - types.ts PassName + LLMCaller + PassCallResult + Skeleton* - _defaultCaller.ts default LLMCaller (dynamic-imports suttaStudioLLM to avoid circular dep; Phase 3 consolidates the LLM module proper) - index.ts barrel re-export Existing services/suttaStudioPassRunners.ts becomes a re-export shim. All consumers unchanged (suttaStudioPassRunners.ts is imported by scripts/sutta-studio/benchmark.ts, benchmark-config.ts, and generate-new-phases.ts — all continue to work transparently). Backward compat preserved: - services/suttaStudioCompiler.ts (re-export of compileSuttaStudioPacket) — unchanged - services/suttaStudioPassRunners.ts — now a shim re-exporting from sutta-studio/passes - services/suttaStudioPassPrompts.ts — kept schemas inline (Phase 4 reconciles), re-exports builders + utilities - services/compiler/utils.ts — 1-line shim - services/compiler/prompts.ts — shim from Phase 1 - services/compiler/index.ts — UNCHANGED. Production orchestrator continues to import build*Prompt from compiler/prompts (which re-exports from sutta-studio/prompts). Phase 2c will refactor compileSuttaStudioPacket to compose pass calls. What this commit does NOT do (deferred): - DOES NOT touch services/compiler/index.ts (the 675-line orchestrator). Phase 2c will rewrite compileSuttaStudioPacket as a composition of runXPass calls + production orchestration concerns. That's the risky part of the refactor; saved for a focused future session. - DOES NOT consolidate the 7 response schemas. Production schemas (compiler/schemas.ts) and benchmark schemas (in suttaStudioPassPrompts.ts) still differ on skeleton.wordRange + anatomist.refrainId fields. Phase 4 reconciles when consumer impact is clearer. - DOES NOT consolidate compiler/llm.ts vs suttaStudioLLM.ts (Phase 3). - DOES NOT delete the re-export shims (Phase 4 cleanup). Verification: - npx vitest run — 1332 passed, 16 skipped, 1 pre-existing failure (build-dpd.test.ts: better-sqlite3 missing from main's node_modules). Same numbers as Phase 1 (8be501f). - npx tsc --noEmit -p . — zero new errors in touched files. All errors surfaced are pre-existing TS strictness issues in unrelated files. Companion to: - 8be501f Phase 1 (prompt builders canonicalized) - dc57a63 CONSOLIDATION.md design doc - docs/sutta-studio/CONSOLIDATION.md §Phase 2 (full migration plan) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anantham and others added 2 commits May 13, 2026 16:09

vercel Bot deployed to Preview May 13, 2026 20:20 View deployment

vercel Bot deployed to Preview May 13, 2026 22:26 View deployment

anantham changed the base branch from feat/opus-batch4-curation to main May 13, 2026 22:31

anantham merged commit e23c190 into main May 13, 2026
4 checks passed

This was referenced May 13, 2026

Wire V2 amendments into live compiler (activates 2d198f6) #50

Closed

feat(sutta-studio): phase-2 hand-curation + A2 experiment scaffolding #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler consolidation Phase 0+1 — design doc + canonical prompts#51

Compiler consolidation Phase 0+1 — design doc + canonical prompts#51
anantham merged 3 commits into
mainfrom
feat/opus-compiler-consolidation

anantham commented May 13, 2026

Uh oh!

vercel Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantham commented May 13, 2026

Summary

Commits

dc57a63 — Phase 0: CONSOLIDATION.md design doc

8be501f — Phase 1: canonical prompt builders

What this PR does NOT change

Verification

Next steps after merge

Test plan

Uh oh!

vercel Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`dc57a63` — Phase 0: CONSOLIDATION.md design doc

`8be501f` — Phase 1: canonical prompt builders

vercel Bot commented May 13, 2026 •

edited

Loading