Skip to content

Compiler consolidation Phase 0+1 — design doc + canonical prompts#51

Merged
anantham merged 3 commits into
mainfrom
feat/opus-compiler-consolidation
May 13, 2026
Merged

Compiler consolidation Phase 0+1 — design doc + canonical prompts#51
anantham merged 3 commits into
mainfrom
feat/opus-compiler-consolidation

Conversation

@anantham
Copy link
Copy Markdown
Owner

Summary

Multi-session refactor in progress. This PR ships Phase 0 (design doc) and Phase 1 (canonical prompt builders) of the compiler consolidation work agreed during the 2026-05-13 session.

The dual-stack issue surfaced today: two parallel compiler implementations (services/compiler/ and services/suttaStudio*) had drifted over 4 months. PR #50's response was "wire V2 into both" — patching the symptom. This PR fixes the architecture: single source of truth for prompts, with both stacks consuming via re-export shims.

Base note: Stacked on #49 (batch-4). PR #50 (V2 wire into both) is PAUSED in favor of this work — once consolidation lands, PR #50's commits become redundant.

Commits

dc57a63 — Phase 0: CONSOLIDATION.md design doc

313 lines. Captures:

  • Why this exists (dual-stack drift since 2026-01-30)
  • Target file structure (`services/sutta-studio/{prompts,passes}/` + orchestrator/llm/schemas)
  • Feature-by-feature migration map (production-only features, benchmark-only features)
  • 4-phase plan with risk assessment per phase
  • Backward-compat strategy (re-export shims; consumers unchanged)
  • Test strategy (typecheck + vitest + UI smoke + benchmark smoke)
  • 6 open design questions

8be501f — Phase 1: canonical prompt builders

Establishes single source of truth in `services/sutta-studio/prompts/` (7 files + barrel, 422 lines total). Old builder files become shims (382L → 24L; 725L → 528L).

Merge decisions per pass (best-of-both):

  • Anatomist: 3 examples from benchmark (PHASE_A + PHASE_B + REFRAIN), V2 amendments
  • Lexicographer: DPD context from production + ripples instructions from benchmark + V2 amendments
  • Weaver: ANTI_PATTERN + duplicate-mapping guard from benchmark
  • Typesetter: optional logger from benchmark (production was spamming console)
  • Skeleton / Morphology / Phase: byte-identical between stacks; just moved

V2 amendments wired into ONE place now. Future protocol amendments cost 1 file edit instead of 2.

What this PR does NOT change

Per Phase 1 scope:

  • Schemas still differ between the two stacks (Phase 2)
  • Duplicated buildPhaseStateEnvelope + buildBoundaryContext still exist (Phase 2)
  • parseJsonResponse helper still duplicated (Phase 2)
  • LLM-visible prompt content is BYTE-IDENTICAL to PR Wire V2 amendments into live compiler (activates 2d198f6) #50's output. Benchmark/production output should not change.

Verification

  • ✅ `npx tsc --noEmit -p .` — zero errors in touched files
  • ✅ `npx vitest run` — 1332 passed, 16 skipped, 1 failure (build-dpd.test.ts — better-sqlite3 missing from main's node_modules; unrelated to refactor)
  • ⏸ Manual UI smoke — recommend before merge

Next steps after merge

Test plan

  • CI typecheck passes
  • CI test suite passes (excluding build-dpd.test.ts pre-existing failure)
  • Manual UI smoke: open /sutta/demo, trigger compile, confirm output structure unchanged
  • Benchmark CLI smoke: `tsx scripts/sutta-studio/benchmark.ts` runs to completion (no API call needed for the prompt-shape check)
  • Confirm `services/compiler/prompts.ts` and `services/suttaStudioPassPrompts.ts` still export the same names (for backward compat shim verification)

🤖 Generated with Claude Code

anantham and others added 2 commits May 13, 2026 16:09
Captures the principled refactor plan agreed during 2026-05-13 session.
The Sutta Studio codebase has two parallel compiler stacks (production at
services/compiler/, benchmark at services/suttaStudio*) that have drifted
in opposite directions over 4 months. This doc plans their consolidation
into a single canonical core under services/sutta-studio/.

Architecture (Option C from the design conversation):
  services/sutta-studio/
    prompts/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts
    passes/{anatomist,lexicographer,phase,skeleton,morphology,weaver,typesetter}.ts
    schemas.ts, llm.ts, utils.ts, dictionary.ts, segments.ts
    orchestrator.ts   ← compileSuttaStudioPacket — production wrapper
    index.ts          ← barrel

Existing public paths become re-export shims for backward compat;
SuttaStudioApp.tsx + benchmark scripts don't change in this refactor.
Shims deleted in Phase 4 cleanup.

Phasing (~10-15 hr over 2-3 sessions):
  Phase 0  Design doc (this commit) — DONE
  Phase 1  Single prompts module    — NEXT
  Phase 2  Canonical pass functions
  Phase 3  Single LLM caller
  Phase 4  Cleanup + rework PR #50

Sequencing decision: PR #50 (V2 wire into both builders) is PAUSED until
this refactor lands, so V2 amendments wire into ONE canonical builder
instead of two divergent ones.

Companion to docs/sutta-studio/COMPILER_STRATEGY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o services/sutta-studio/prompts/

Per CONSOLIDATION.md (dc57a63). The two parallel compiler stacks
(services/compiler/ and services/suttaStudio*) had drifted: each had its
own prompt builders that diverged on examples, parameters, and which
features were supported. As of PR #50 they were briefly aligned on V2
amendments — this commit locks in the alignment by establishing a SINGLE
SOURCE OF TRUTH.

NEW canonical location:
  services/sutta-studio/prompts/
    skeleton.ts        — buildSkeletonPrompt
    anatomist.ts       — buildAnatomistPrompt (3 examples, V2 amendments)
    lexicographer.ts   — buildLexicographerPrompt (DPD context + ripples + V2)
    weaver.ts          — buildWeaverPrompt (with ANTI_PATTERN guard)
    typesetter.ts      — buildTypesetterPrompt (optional logger injection)
    morphology.ts      — buildMorphologyPrompt
    phase.ts           — buildPhasePrompt (full V2 overlay)
    index.ts           — barrel

Merge decisions (best-of-both for each pass):
  Anatomist: take benchmark's 3 examples (PHASE_A + PHASE_B + REFRAIN)
    rather than production's single example. The phase-b morph-data
    example and the refrain-formula example are pedagogically critical.
  Lexicographer: KEEP production's DPD context + renderDpdBlock helper
    AND benchmark's RIPPLES instruction block + ripple example. Neither
    stack had both. Merged here.
  Weaver: take benchmark's version with DUPLICATE-MAPPING rule + the
    SUTTA_STUDIO_WEAVER_ANTI_PATTERN reference. Production lacked these.
  Typesetter: take benchmark's optional logger param (vs production's
    inline log() spamming every call). Defaults to no-op.
  Skeleton / Morphology / Phase: byte-identical between stacks; no
    merge needed beyond moving.

V2 amendments wired into THREE passes (Anatomist, Lexicographer, Phase) —
one place, not two. Future protocol amendments land in one place.

OLD files become re-export shims:
  services/compiler/prompts.ts        382L → 24L re-export shim
  services/suttaStudioPassPrompts.ts  725L → 528L (schemas/types/parseJsonResponse
                                       kept inline pending Phase 2; only the 7
                                       builder functions removed, replaced by
                                       re-exports from the new canonical path)

Consumers don't change:
  - services/compiler/index.ts continues to import from './prompts'
  - services/suttaStudioPassRunners.ts continues to import from './suttaStudioPassPrompts'
  - scripts/sutta-studio/benchmark.ts continues to import from suttaStudioPassPrompts
  All those imports now resolve through the shims to the canonical builders.

Verification:
  - npx tsc --noEmit -p . — ZERO errors in the touched files (all reported
    errors are pre-existing TS strictness issues in unrelated files:
    AboutThisText.tsx React namespace, build-dpd.ts better-sqlite3 type,
    smoke-real-fojin.ts playwright export, etc.)
  - npx vitest run — 1332 passed, 16 skipped, 0 functional failures.
    Single failure is build-dpd.test.ts because main's node_modules
    (symlinked for the test run) lacks better-sqlite3 — that package
    was added on the not-yet-merged batch-3 branch. Unrelated to refactor.

What this commit does NOT do (deferred per CONSOLIDATION.md):
  - DOES NOT consolidate schemas. compiler/schemas.ts and the schemas in
    suttaStudioPassPrompts.ts still differ on skeleton.wordRange and
    anatomist.refrainId fields (Phase 2).
  - DOES NOT consolidate the duplicated buildPhaseStateEnvelope or
    buildBoundaryContext functions (still byte-identical duplicates,
    Phase 2 cleans up alongside pass runner consolidation).
  - DOES NOT consolidate the parseJsonResponse helper (also duplicated;
    Phase 2).
  - DOES NOT change LLM-visible prompt content. The V2-amended Anatomist /
    Lexicographer / Phase prompts produce the exact same string as PR #50
    (currently paused) would. Benchmark output should be byte-identical
    pre/post this commit.
  - DOES NOT bump SUTTA_STUDIO_PROMPT_VERSION. Content unchanged means
    no benchmark cache invalidation. Phase 4 will bump if needed.

Companion to:
  - docs/sutta-studio/CONSOLIDATION.md (dc57a63) — full design doc
  - PR #50 paused — its V2-wire commits become redundant once this lands

Next: Phase 2 — canonical pass functions in services/sutta-studio/passes/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexicon-forge Ready Ready Preview, Comment May 13, 2026 10:26pm

…tions

Per CONSOLIDATION.md. Continues the consolidation started in Phase 1 (8be501f).

Phase 2a — utilities:
  Moves services/compiler/utils.ts content (verbatim, byte-identical) to
  services/sutta-studio/utils.ts. Drops the inline duplicates of
  buildPhaseStateEnvelope + buildBoundaryContext + parseJsonResponse +
  stripCodeFences + BoundaryNote/SkeletonPhase/PhaseStageKey types that
  lived in services/suttaStudioPassPrompts.ts (those were byte-identical
  duplicates of the compiler/utils.ts versions — confirmed via diff).

  Single canonical implementation of:
    - parseJsonResponse + stripCodeFences      (JSON parsing with fallback)
    - buildPhaseStateEnvelope                  (phase-state context block)
    - buildBoundaryContext                     (cross-chapter boundary marker)
    - getTimeoutSignal + waitFor               (AbortSignal helpers)
    - createCompilerThrottle                   (rate-limit gating)
    - buildSourceRefs + computeSourceDigest    (provenance helpers)
    - applyWordRangeToSegments                 (sub-segment splitting)
    - chunkPhases                              (skeleton fallback chunker)
    - BoundaryNote + SkeletonPhase + PhaseStageKey types

  Existing services/compiler/utils.ts becomes a 1-line re-export shim.
  Existing services/suttaStudioPassPrompts.ts re-exports the utilities
  from the canonical location.

Phase 2b — pass functions:
  Creates services/sutta-studio/passes/ with one file per pass:
    - skeleton.ts       runSkeletonPass (with chunking + fallback)
    - anatomist.ts      runAnatomistPass
    - lexicographer.ts  runLexicographerPass (now accepts optional dpdLookups
                        — previously DPD context was constructed only in
                        compiler/index.ts orchestrator; moving it onto the
                        pass declares the dependency where it belongs)
    - weaver.ts         runWeaverPass
    - typesetter.ts     runTypesetterPass (optional logger injection)
    - morphology.ts     runMorphologyPass
    - types.ts          PassName + LLMCaller + PassCallResult + Skeleton*
    - _defaultCaller.ts default LLMCaller (dynamic-imports suttaStudioLLM
                        to avoid circular dep; Phase 3 consolidates the LLM
                        module proper)
    - index.ts          barrel re-export

  Existing services/suttaStudioPassRunners.ts becomes a re-export shim.
  All consumers unchanged (suttaStudioPassRunners.ts is imported by
  scripts/sutta-studio/benchmark.ts, benchmark-config.ts, and
  generate-new-phases.ts — all continue to work transparently).

Backward compat preserved:
  - services/suttaStudioCompiler.ts (re-export of compileSuttaStudioPacket) — unchanged
  - services/suttaStudioPassRunners.ts — now a shim re-exporting from sutta-studio/passes
  - services/suttaStudioPassPrompts.ts — kept schemas inline (Phase 4 reconciles), re-exports builders + utilities
  - services/compiler/utils.ts — 1-line shim
  - services/compiler/prompts.ts — shim from Phase 1
  - services/compiler/index.ts — UNCHANGED. Production orchestrator continues to import build*Prompt from compiler/prompts (which re-exports from sutta-studio/prompts). Phase 2c will refactor compileSuttaStudioPacket to compose pass calls.

What this commit does NOT do (deferred):
  - DOES NOT touch services/compiler/index.ts (the 675-line orchestrator).
    Phase 2c will rewrite compileSuttaStudioPacket as a composition of
    runXPass calls + production orchestration concerns. That's the risky
    part of the refactor; saved for a focused future session.
  - DOES NOT consolidate the 7 response schemas. Production schemas
    (compiler/schemas.ts) and benchmark schemas (in suttaStudioPassPrompts.ts)
    still differ on skeleton.wordRange + anatomist.refrainId fields.
    Phase 4 reconciles when consumer impact is clearer.
  - DOES NOT consolidate compiler/llm.ts vs suttaStudioLLM.ts (Phase 3).
  - DOES NOT delete the re-export shims (Phase 4 cleanup).

Verification:
  - npx vitest run — 1332 passed, 16 skipped, 1 pre-existing failure
    (build-dpd.test.ts: better-sqlite3 missing from main's node_modules).
    Same numbers as Phase 1 (8be501f).
  - npx tsc --noEmit -p . — zero new errors in touched files. All errors
    surfaced are pre-existing TS strictness issues in unrelated files.

Companion to:
  - 8be501f Phase 1 (prompt builders canonicalized)
  - dc57a63 CONSOLIDATION.md design doc
  - docs/sutta-studio/CONSOLIDATION.md §Phase 2 (full migration plan)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anantham anantham changed the base branch from feat/opus-batch4-curation to main May 13, 2026 22:31
@anantham anantham merged commit e23c190 into main May 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant