Compiler consolidation Phase 0+1 — design doc + canonical prompts#51
Merged
Conversation
Captures the principled refactor plan agreed during 2026-05-13 session.
The Sutta Studio codebase has two parallel compiler stacks (production at
services/compiler/, benchmark at services/suttaStudio*) that have drifted
in opposite directions over 4 months. This doc plans their consolidation
into a single canonical core under services/sutta-studio/.
Architecture (Option C from the design conversation):
services/sutta-studio/
prompts/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts
passes/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts
schemas.ts, llm.ts, utils.ts, dictionary.ts, segments.ts
orchestrator.ts ← compileSuttaStudioPacket — production wrapper
index.ts ← barrel
Existing public paths become re-export shims for backward compat;
SuttaStudioApp.tsx + benchmark scripts don't change in this refactor.
Shims deleted in Phase 4 cleanup.
Phasing (~10-15 hr over 2-3 sessions):
Phase 0 Design doc (this commit) — DONE
Phase 1 Single prompts module — NEXT
Phase 2 Canonical pass functions
Phase 3 Single LLM caller
Phase 4 Cleanup + rework PR #50
Sequencing decision: PR #50 (V2 wire into both builders) is PAUSED until
this refactor lands, so V2 amendments wire into ONE canonical builder
instead of two divergent ones.
Companion to docs/sutta-studio/COMPILER_STRATEGY.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o services/sutta-studio/prompts/ Per CONSOLIDATION.md (dc57a63). The two parallel compiler stacks (services/compiler/ and services/suttaStudio*) had drifted: each had its own prompt builders that diverged on examples, parameters, and which features were supported. As of PR #50 they were briefly aligned on V2 amendments — this commit locks in the alignment by establishing a SINGLE SOURCE OF TRUTH. NEW canonical location: services/sutta-studio/prompts/ skeleton.ts — buildSkeletonPrompt anatomist.ts — buildAnatomistPrompt (3 examples, V2 amendments) lexicographer.ts — buildLexicographerPrompt (DPD context + ripples + V2) weaver.ts — buildWeaverPrompt (with ANTI_PATTERN guard) typesetter.ts — buildTypesetterPrompt (optional logger injection) morphology.ts — buildMorphologyPrompt phase.ts — buildPhasePrompt (full V2 overlay) index.ts — barrel Merge decisions (best-of-both for each pass): Anatomist: take benchmark's 3 examples (PHASE_A + PHASE_B + REFRAIN) rather than production's single example. The phase-b morph-data example and the refrain-formula example are pedagogically critical. Lexicographer: KEEP production's DPD context + renderDpdBlock helper AND benchmark's RIPPLES instruction block + ripple example. Neither stack had both. Merged here. Weaver: take benchmark's version with DUPLICATE-MAPPING rule + the SUTTA_STUDIO_WEAVER_ANTI_PATTERN reference. Production lacked these. Typesetter: take benchmark's optional logger param (vs production's inline log() spamming every call). Defaults to no-op. Skeleton / Morphology / Phase: byte-identical between stacks; no merge needed beyond moving. V2 amendments wired into THREE passes (Anatomist, Lexicographer, Phase) — one place, not two. Future protocol amendments land in one place. OLD files become re-export shims: services/compiler/prompts.ts 382L → 24L re-export shim services/suttaStudioPassPrompts.ts 725L → 528L (schemas/types/parseJsonResponse kept inline pending Phase 2; only the 7 builder functions removed, replaced by re-exports from the new canonical path) Consumers don't change: - services/compiler/index.ts continues to import from './prompts' - services/suttaStudioPassRunners.ts continues to import from './suttaStudioPassPrompts' - scripts/sutta-studio/benchmark.ts continues to import from suttaStudioPassPrompts All those imports now resolve through the shims to the canonical builders. Verification: - npx tsc --noEmit -p . — ZERO errors in the touched files (all reported errors are pre-existing TS strictness issues in unrelated files: AboutThisText.tsx React namespace, build-dpd.ts better-sqlite3 type, smoke-real-fojin.ts playwright export, etc.) - npx vitest run — 1332 passed, 16 skipped, 0 functional failures. Single failure is build-dpd.test.ts because main's node_modules (symlinked for the test run) lacks better-sqlite3 — that package was added on the not-yet-merged batch-3 branch. Unrelated to refactor. What this commit does NOT do (deferred per CONSOLIDATION.md): - DOES NOT consolidate schemas. compiler/schemas.ts and the schemas in suttaStudioPassPrompts.ts still differ on skeleton.wordRange and anatomist.refrainId fields (Phase 2). - DOES NOT consolidate the duplicated buildPhaseStateEnvelope or buildBoundaryContext functions (still byte-identical duplicates, Phase 2 cleans up alongside pass runner consolidation). - DOES NOT consolidate the parseJsonResponse helper (also duplicated; Phase 2). - DOES NOT change LLM-visible prompt content. The V2-amended Anatomist / Lexicographer / Phase prompts produce the exact same string as PR #50 (currently paused) would. Benchmark output should be byte-identical pre/post this commit. - DOES NOT bump SUTTA_STUDIO_PROMPT_VERSION. Content unchanged means no benchmark cache invalidation. Phase 4 will bump if needed. Companion to: - docs/sutta-studio/CONSOLIDATION.md (dc57a63) — full design doc - PR #50 paused — its V2-wire commits become redundant once this lands Next: Phase 2 — canonical pass functions in services/sutta-studio/passes/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…tions Per CONSOLIDATION.md. Continues the consolidation started in Phase 1 (8be501f). Phase 2a — utilities: Moves services/compiler/utils.ts content (verbatim, byte-identical) to services/sutta-studio/utils.ts. Drops the inline duplicates of buildPhaseStateEnvelope + buildBoundaryContext + parseJsonResponse + stripCodeFences + BoundaryNote/SkeletonPhase/PhaseStageKey types that lived in services/suttaStudioPassPrompts.ts (those were byte-identical duplicates of the compiler/utils.ts versions — confirmed via diff). Single canonical implementation of: - parseJsonResponse + stripCodeFences (JSON parsing with fallback) - buildPhaseStateEnvelope (phase-state context block) - buildBoundaryContext (cross-chapter boundary marker) - getTimeoutSignal + waitFor (AbortSignal helpers) - createCompilerThrottle (rate-limit gating) - buildSourceRefs + computeSourceDigest (provenance helpers) - applyWordRangeToSegments (sub-segment splitting) - chunkPhases (skeleton fallback chunker) - BoundaryNote + SkeletonPhase + PhaseStageKey types Existing services/compiler/utils.ts becomes a 1-line re-export shim. Existing services/suttaStudioPassPrompts.ts re-exports the utilities from the canonical location. Phase 2b — pass functions: Creates services/sutta-studio/passes/ with one file per pass: - skeleton.ts runSkeletonPass (with chunking + fallback) - anatomist.ts runAnatomistPass - lexicographer.ts runLexicographerPass (now accepts optional dpdLookups — previously DPD context was constructed only in compiler/index.ts orchestrator; moving it onto the pass declares the dependency where it belongs) - weaver.ts runWeaverPass - typesetter.ts runTypesetterPass (optional logger injection) - morphology.ts runMorphologyPass - types.ts PassName + LLMCaller + PassCallResult + Skeleton* - _defaultCaller.ts default LLMCaller (dynamic-imports suttaStudioLLM to avoid circular dep; Phase 3 consolidates the LLM module proper) - index.ts barrel re-export Existing services/suttaStudioPassRunners.ts becomes a re-export shim. All consumers unchanged (suttaStudioPassRunners.ts is imported by scripts/sutta-studio/benchmark.ts, benchmark-config.ts, and generate-new-phases.ts — all continue to work transparently). Backward compat preserved: - services/suttaStudioCompiler.ts (re-export of compileSuttaStudioPacket) — unchanged - services/suttaStudioPassRunners.ts — now a shim re-exporting from sutta-studio/passes - services/suttaStudioPassPrompts.ts — kept schemas inline (Phase 4 reconciles), re-exports builders + utilities - services/compiler/utils.ts — 1-line shim - services/compiler/prompts.ts — shim from Phase 1 - services/compiler/index.ts — UNCHANGED. Production orchestrator continues to import build*Prompt from compiler/prompts (which re-exports from sutta-studio/prompts). Phase 2c will refactor compileSuttaStudioPacket to compose pass calls. What this commit does NOT do (deferred): - DOES NOT touch services/compiler/index.ts (the 675-line orchestrator). Phase 2c will rewrite compileSuttaStudioPacket as a composition of runXPass calls + production orchestration concerns. That's the risky part of the refactor; saved for a focused future session. - DOES NOT consolidate the 7 response schemas. Production schemas (compiler/schemas.ts) and benchmark schemas (in suttaStudioPassPrompts.ts) still differ on skeleton.wordRange + anatomist.refrainId fields. Phase 4 reconciles when consumer impact is clearer. - DOES NOT consolidate compiler/llm.ts vs suttaStudioLLM.ts (Phase 3). - DOES NOT delete the re-export shims (Phase 4 cleanup). Verification: - npx vitest run — 1332 passed, 16 skipped, 1 pre-existing failure (build-dpd.test.ts: better-sqlite3 missing from main's node_modules). Same numbers as Phase 1 (8be501f). - npx tsc --noEmit -p . — zero new errors in touched files. All errors surfaced are pre-existing TS strictness issues in unrelated files. Companion to: - 8be501f Phase 1 (prompt builders canonicalized) - dc57a63 CONSOLIDATION.md design doc - docs/sutta-studio/CONSOLIDATION.md §Phase 2 (full migration plan) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multi-session refactor in progress. This PR ships Phase 0 (design doc) and Phase 1 (canonical prompt builders) of the compiler consolidation work agreed during the 2026-05-13 session.
The dual-stack issue surfaced today: two parallel compiler implementations (services/compiler/ and services/suttaStudio*) had drifted over 4 months. PR #50's response was "wire V2 into both" — patching the symptom. This PR fixes the architecture: single source of truth for prompts, with both stacks consuming via re-export shims.
Base note: Stacked on #49 (batch-4). PR #50 (V2 wire into both) is PAUSED in favor of this work — once consolidation lands, PR #50's commits become redundant.
Commits
dc57a63— Phase 0: CONSOLIDATION.md design doc313 lines. Captures:
8be501f— Phase 1: canonical prompt buildersEstablishes single source of truth in `services/sutta-studio/prompts/` (7 files + barrel, 422 lines total). Old builder files become shims (382L → 24L; 725L → 528L).
Merge decisions per pass (best-of-both):
V2 amendments wired into ONE place now. Future protocol amendments cost 1 file edit instead of 2.
What this PR does NOT change
Per Phase 1 scope:
Verification
Next steps after merge
Test plan
🤖 Generated with Claude Code