feat(sutta-studio): phase-2 hand-curation + A2 experiment scaffolding#52
Open
anantham wants to merge 20 commits into
Open
feat(sutta-studio): phase-2 hand-curation + A2 experiment scaffolding#52anantham wants to merge 20 commits into
anantham wants to merge 20 commits into
Conversation
…r A2 validation This commit prepares the empirical experiment that validates the V2 prompt amendments landed by PR #51 (compiler consolidation). The three-way comparison documented in docs/sutta-studio/experiments/README.md compares v10 baseline output vs v11 compiler output vs hand-curated gold standard. Changes: 1. Bump SUTTA_STUDIO_PROMPT_VERSION → 'sutta-studio-v11-mn10-amendments'. V2 amendments have been active in the canonical prompt builders since PR #51 merged. This version key lets the benchmark / quality-scorer distinguish v10 and v11 outputs in the leaderboard. (Reverts the version state from v10 back to v11 — PR #50 had attempted this bump in the now-closed approach.) 2. Hand-curate phase-2 (sattānaṁ visuddhiyā) following CURATION_PROTOCOL §3.4 + §3.4.1 with all 6 V2 amendments applied. Replaces the v10 LLM-generated entry in components/sutta-studio/demoPacket.json: - p5 sattānaṁ: 3 senses with epistemicBasis (lexical/etymological), confidence, and per-sense notes. morph field added on -ānaṁ suffix (gen, pl). Cross-phase tooltip notes the bhikkhū-vs-sattā audience-vs-beneficiary contrast. - p6 visuddhiyā: 6 senses (purification/purity/clarity/cleansing/brightening/ refinement) with curatorial epistemicBasis, confidence rankings, and per- tradition notes (Sujato, Ñāṇamoli-Bodhi, Thanissaro). morph field added on -yā suffix (dat, sg, f). isAnchor: true (semantic centerpiece — what the path is FOR). - All tooltips rewritten in plain-first §3.4 prose. No bracketed grammar prefixes ([Genitive Plural], [Dative]), no decorative emoji (🔗 ✨ 🎯). - Relation p5s2 → p6 (genitive-of-possession) preserved — earns its arrow per the V2 arrow-earning rule. 3. Add docs/sutta-studio/experiments/ scaffolding: - README.md explains the three-way comparison and follow-up plan. - phase-2-v10-baseline.json snapshots the pre-curation LLM output for diffing (read-only reference). - phase-2-hand-curated.json is the canonical source-of-truth for the new live entry; the demoPacket.json change above is generated from this file. - phase-2-v11-output.json (TBD) will land here after the v11 compile run. 4. Add docs/sutta-studio/curation/phase-2.md — full curation log following the established phase-{a..h,1}.md format. Documents every curation decision, predicts what v11 should produce, and identifies which gaps a deterministic post-pass (the proposed A3 work) could close. How the experiment closes: After this commit lands, the user runs the compiler on phase-2 with v11 prompts active (script edit of scripts/sutta-studio/generate-new-phases.ts with phase-2's mn10:2.1 wordRange, or via the live UI). Saves output to docs/sutta-studio/experiments/phase-2-v11-output.json. Then the three-way diff (v10 / v11 / hand) is written up at phase-2-analysis.md. Decision flows from there: invest in A3 post-passes if v11 hits 65%+ quality; iterate on prompts if it doesn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Empirical-validation script for the V2-amended prompts. Targets a single
phase from demoPacket.json, runs the V2-active passes (anatomist +
lexicographer + phase composition) on its Pali words, and saves the output
for diffing against the hand-curated entry.
Skips intentionally:
- skeleton: phase grouping is already known from the packet
- weaver, typesetter: orthogonal to V2 (V2 amendments don't touch token
mapping or layout blocks)
- morphology: refinement pass; not core to V2 quality
Usage:
tsx scripts/sutta-studio/run-phase-experiment.ts \
--phase phase-2 \
[--model google/gemini-3-flash-preview] \
[--out docs/sutta-studio/experiments/phase-2-v11-output.json]
Reuses the OpenRouter LLM caller pattern from generate-new-phases.ts (same
provider, same fetch shape, same pricing lookup). One-command run for any
phase, generalised so phase-3 / phase-4 / DN22 phases can reuse without
modification. Output includes _meta with tokens, cost, prompt version,
model — supports cross-model comparison.
Companion to:
- docs/sutta-studio/experiments/README.md
- docs/sutta-studio/curation/phase-2.md
- PR #52 (the experiment scaffolding this script populates)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the A2 experiment loop opened in PR #52 (phase-2 hand-curation + scaffolding). Adds: 1. v11 outputs from 2 successful frontier models (Gemini Flash + Pro) and failure modes documented for 4 others (Sonnet, Grok, GPT-4o, GPT-5 all fail on our structured-output pipeline). Gemini-only is a known limitation worth noting but not blocking. 2. Phase-3 hand-curation in pipeline+polish mode. First demonstration of the workflow COMPILER_STRATEGY.md §5 predicted: run the v11 pipeline ($0.019), open the draft, add metadata + cross-phase + extra polysemy. Total ~22 min vs ~45 min from scratch. ~2x speedup. 3. phase-2-analysis.md — comprehensive three-way diff (v10 / v11 Flash / hand-curated) + cross-model comparison + failure mode notes + A3 post-pass priority ranking. Empirical signal: - V2 amendments lift STRUCTURAL fields strongly (tooltip register, anchor, morph, relations). v10 had bracketed grammar prefixes and emoji; v11 has plain-first prose throughout. - V2 amendments DO NOT lift METADATA fields. epistemicBasis, confidence, notes citing traditions — LLMs ignore these regardless of model tier (Flash, Pro, Sonnet all skip them). - Polysemy count goes BACKWARDS in some cases (visuddhi: v10 had 5 senses, v11 has 3). Hand-curation re-adds the missing senses. - Cost: Gemini Flash $0.018 per phase; full MN10 re-compile $0.92 total. Pipeline+polish workflow takes ~3-4 hours curator time vs ~25-30 hours from-scratch. ~85% time reduction on routine phases. A3 post-pass priorities (revised based on data): 1. citation-linker (HIGH) — closes sourceCitationIds gap 2. morph-from-POS (HIGH) — closes gender-on-morph gap 3. epistemicBasis inference (HIGH, new — wasn't in original A3 list) 4. cross-phase facet detector (MEDIUM) 5. §3.4 linter (LOW — LLM already does this well) Together these form a "metadata-filler" post-pass module (~5-7 hrs of engineering). After this lands, v11 output should hit ~75% of hand quality automatically. Files changed: - components/sutta-studio/demoPacket.json: phase-3 swapped to hand-curated (line-based splice; other phases untouched) - docs/sutta-studio/curation/phase-3.md: 90-line log of the polish workflow - docs/sutta-studio/experiments/phase-2-analysis.md: 130-line empirical analysis - docs/sutta-studio/experiments/phase-2-v11-output.json: Gemini Flash output - docs/sutta-studio/experiments/phase-2-v11-output-geminipro.json: Gemini Pro output - docs/sutta-studio/experiments/phase-3-v11-output.json: Gemini Flash output for phase-3 - docs/sutta-studio/experiments/phase-3-hand-curated.json: hand-polished gold standard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Path 1 of the data-fields-evaluation work: makes the V2 metadata fields
visible in the UI so you can empirically decide which ones earn their
packet bulk vs which can be dropped.
Wires LensPanel.tsx into SuttaStudioView. The panel was defined but never
imported anywhere — orphaned audit drawer with rich content (Senses /
Grammar / Relations tabs) that nothing rendered. Now it docks right when
settings.auditPanel is on and a word/segment is hovered. Stays pinned to
the most-recent hover target until user clicks ✕ or hovers a new target.
Adds 5 new settings toggles under "Audit fields (V2)" section:
- Audit panel — open/close the side drawer (default OFF)
- Anchor emphasis — toggle the subtle amber underline on PaliWord.isAnchor
- Sense notes — Sense.notes prose in the panel
- Citation chips — Sense.citationIds chips in the panel
- Confidence + basis — NEW per-sense confidence ('high'/'medium'/'low') and
epistemicBasis ('lexical'/'curatorial'/'etymological'/...)
badges, previously not rendered anywhere
LensPanel signature extended with showNotes / showCitationChips /
showConfidenceBadges optional props (default true for back-compat).
PaliWord.tsx: isAnchor underline now gated by settings.anchorEmphasis
(was unconditional).
Empirical use: load /sutta/demo, scroll to phase-1 (Ekāyono with 5
translator-debate senses) or phase-2 (sattānaṁ visuddhiyā with the V2
metadata hand-curated in this session). Hover a word. Toggle the audit
panel ON in the gear menu. See the senses with their confidence/basis
tags, notes citing scholars, and DPD citation chips. Toggle each field
on/off to feel which ones add value.
Decision criterion: if after using the UI the metadata feels "useful" —
keep generating it. If it feels "noise" — drop SENSE_METADATA from V2
amendments and ship a leaner pipeline. Path 2 (~30 min cleanup) is one
commit away if that's the call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mentary from tooltips 10 facets across 6 phases were using tooltip[0] to explain the UI's color palette instead of providing a gloss — meta-commentary about the renderer, not about the word. Pure removal: no rewrites, no replacement content. After deletion, facet[0] in each affected segment is now the actual gloss that was sitting at index [1]. The color palette explanation belongs in a Legend panel (future commit), not repeated per-word in every tooltip. Reverse-direction cleanup — see ~/.claude/CLAUDE.md "Lean toward the reverse direction" principle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…affordance on English words Three small UX changes triggered by user feedback after reloading the demo: 1. Trim evaṁ (a1 'eva' segment): drop the "Don't confuse with bare eva" facet. The distinction is already in the Sense.notes for the audit panel; it doesn't belong in a hover tooltip. (Reverse-direction subtraction.) 2. Replace ṁ (a1 'ṁ' segment) tooltip from "humming dot-m sound" to a concrete English-pronunciation analog: "Pronounced as a soft nasal close — like the 'um' in English 'hum' or 'sum'." Information that teaches by example, not by Pāli-internal naming. 3. EnglishWord: render small dot affordance under English words linked to multi-sense Pāli words. One filled dot per sense, current at bold-grey, others at slate-700. Subtle visual cue that the word is clickable AND that there are N alternative renderings available. Only shows when senseCount > 1; ghost words stay clean. The dot affordance is the user's design — replaces my earlier failure mode of leaving clickability hidden. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…UX upgrades
Three coordinated changes addressing user feedback after the initial audit-panel
shipped:
1. PaliWord.pronunciation field — Pāli is oral; same Roman letters carry
different sounds based on syllable position, vowel length, compound
boundaries. Pronunciation can't be lemma-derived; it's per-word data.
Added optional field on PaliWord type with curator-written syllable
breakdown + optional English rhyme analog. Populated for 46 words
across the 9 hand-curated phases (a, b, c, d, e, f, g, h, 1, 2, 3).
Rendered in LensPanel header below the Pāli text.
Note: demoPacket.json diff is bigger than the semantic content adds
because json.dump reformatted single-element arrays to multi-line.
Accepted the churn rather than break JSON with a line-based splice
that mis-matched word ids vs English-token ids.
2. Legend panel (new — components/sutta-studio/Legend.tsx) — visual
reference showing the color/symbol vocabulary ONCE: word colors
(content/function/vocative), emphasis (anchor/refrain/ghost),
diacritics (ā ē ī ō ū / ṁ / ṭ ḍ ṇ ḷ / ñ), relation arrows,
cycle dots, audit panel reminder. Toggle via "Legend" in settings.
Replaces the deleted "Colored differently because…" tooltip
meta-commentary — register-correct location for color teaching.
3. LensPanel UX upgrades:
- Header now renders pronunciation under the Pāli text in mono font
- Copy Pāli + Copy English moved OUT of the tab row, into a separate
footer bar at the bottom of the panel. Added Copy Pron. button
when pronunciation exists.
- Panel now draggable via framer-motion `drag`. Position persists
to localStorage (key: sutta-studio-lens-panel-pos) so it survives
reloads.
- Panel wider (360px → 440px) to show more text without scrolling.
- Header has cursor-move + drag affordance hint.
All three changes are positive-additive but each replaces something subtractive:
pronunciation replaces lossy lemma-guessing, Legend replaces per-word color
meta-commentary, audit-panel UX replaces the cramped header-bar copy buttons.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three UX fixes after first reload-and-look review:
1. SettingsPanel z-index raised from z-50 to z-[100] — was being covered
by the audit panel (z-[90]), which made the gear menu unusable while
the audit drawer was open.
2. New "Cycle dots" toggle in settings. Lets readers turn off the small
dot affordance under multi-sense English words. Clicking still cycles;
only the visual indicator hides. Plumbed through to EnglishWordEngine
via the showCycleDots prop.
3. Legend panel rewrites:
- Refrain & anchor underlines now hug the text instead of spanning
the w-20 column (was extending visibly past 'bhikkhū' / 'visuddhi').
- Ghost-word example "have" opacity dropped from CSS class opacity-60
to inline 0.3 — matches the actual renderer's ghostOpacity default
so readers see the real thing in the legend.
- Diacritics section rewritten example-first:
• Drops "palatal", "niggahīta", "retroflex" from visible prose
— they're technical labels that don't help a default reader.
• Each long vowel gets its own English-word analog:
ā = 'father', ē = 'they', ī = 'machine', ō = 'boat', ū = 'rule'.
Reader knows the sound from English; doesn't need to hear "long
vowel" first.
• ṁ unchanged ("the 'um' in 'hum' or 'sum'") — already example-first.
• ñ → "the 'ny' in 'canyon' or 'señor'" (was: "palatal n").
• ṭ ḍ ṇ ḷ → "the soft 'd' in American 'water' or 'butter'" + honest
acknowledgment that English lacks a clean equivalent.
User feedback: "the example words need to be common words that people
actually know... the most important thing is explaining the example
English word where naturally you end up doing a long vowel."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… scale
Builds the Tier-1 pronunciation post-pass agreed in the architecture
discussion: rule-based syllabifier + Latin-penult stress placer that
produces pronunciation hints for any Pāli word from spelling alone.
Zero LLM cost; <1ms per word; runs on every word in the packet.
services/sutta-studio/postPasses/syllabify.ts (235 lines)
- tokenizePaliPhonemes: groups aspirated digraphs (kh, gh, ch, jh,
ṭh, ḍh, th, dh, ph, bh) as single phonemes
- syllabify: produces CV(C) syllables per Pāli prosody rules
(single C between vowels → next syllable; CC → split; CCC → first
closes preceding, rest onset the following; ṁ always closes)
- isHeavySyllable: long vowel OR closed = heavy
- pickStressIndex: Latin penult rule (heavy penult → penult, else
antepenult; disyllabic → initial)
- syllabifyPaliWord: full pipeline → "vi · SUD · dhi · yā" format
services/sutta-studio/postPasses/syllabify.test.ts (29 tests)
Worked examples + edge cases: aspirated digraphs, geminate splits,
niggahīta closing, stress placement, capitalization preservation.
scripts/sutta-studio/backfill-pronunciation.ts
One-shot script that populates pronunciation for any PaliWord that
doesn't already have one. Idempotent — hand-curated words (the 46
from the earlier commit) are left alone. New words use algorithm.
scripts/sutta-studio/syllabify-compare.ts
Debug utility: shows algorithm output vs hand-curated pronunciation
for every word that has both. Useful for the curator to audit
where algorithm and hand-curation diverge.
Effect: demoPacket.json now has pronunciation on ALL 269 Pāli words.
46 are hand-curated (with optional "rhymes with X" suffixes for
famously-tricky cases like visuddhiyā); 223 are algorithm-populated.
Future packet generation can call syllabifyPaliWord() directly as a
post-pass, with zero per-word curator effort.
Architecture sibling to A3's metadata-filler module:
citation-linker → sourceCitationIds
morph-from-POS → case/number/gender
epistemicBasis-infer → epistemicBasis
syllabify → pronunciation (THIS)
Known limitations (documented in syllabify.ts):
- Stress traditions vary by region/school; we use the standard
Latin penult rule. Curator can override per-word via the
PaliWord.pronunciation field.
- The "rhymes with English-word X" English-analog hint is judgment;
not produced by this pass. Curator-added for famously-tricky words.
- Sandhi alternations across word boundaries are out of scope.
Note: demoPacket.json diff is large because json.dump reformatted
single-element arrays. Semantic content added: 223 pronunciation
strings. Same accept-the-churn tradeoff as the earlier pronunciation
commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or audit panel
Addressing concrete user feedback after first reload of audit panel on mobile
+ landscape desktop. Five coordinated changes, mostly subtractive:
1. confidenceBadges → default OFF.
The 'high/medium/low' confidence + 'DPD-attested/curatorial/...'
epistemicBasis badges are useful for curators auditing data, confusing
for end readers ("what does 'high' mean next to 'DPD-attested'?").
Settings toggle preserved for opt-in curator review.
2. Inline copy icons replace the footer Copy bar.
Removed: the 3-button footer (Copy Pāli / Copy English / Copy Pron.)
that occupied vertical space across the panel bottom even when senses
tab had plenty of room above it.
Added: small clipboard SVG icon next to each copyable item —
• Pāli surface in header
• Pronunciation (when present)
• Each English sense in the Senses tab
Net: less wasted space, contextual per-item copy that scales naturally
to future language alignments (Tibetan/Chinese/Japanese rows would
each get their own icon).
3. Toast on copy success.
Bottom-center pill that fades in ("Copied Pāli: sutaṁ") and disappears
after 1.6s. Was missing — no feedback that copy worked.
4. Mobile bottom-sheet layout.
On <640px viewports, the panel becomes a bottom drawer:
• Position: fixed bottom-0, full-width, max-h-65vh
• Rounded top corners only
• Drag-handle bar visible at top as dismiss-affordance cue
• Drag-to-reposition disabled (use of motion `drag={!isMobile}`)
Reader content stays visible above the panel. Was full-screen-overlay
side panel before, which was unusable on phones.
`useIsMobile` hook polls window.innerWidth on resize.
5. Settings gear / About chip collision.
AboutThisText container gets `pr-16 md:pr-6` so the about text doesn't
wrap under the absolute-positioned gear icon on narrow viewports.
Tooltips on the (now opt-in) badges explain what they mean — was missing
context. Confidence: "Curator's confidence in this rendering — independent
of source." Epistemic basis: per-value explanation of where the sense came
from (DPD-attested = from the Digital Pāli Dictionary, etc).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plain-language label that says what the toggle is FOR (curator metadata) rather than what fields it contains. User couldn't find this when looking to switch off the 'DPD-attested + high/medium' badges that confused them. Reverse-direction edit: just rename, no new toggle, no new logic.
…obile The centered-on-parent tooltip overflowed the viewport when the parent word sat near a screen edge (visible on mobile where viewport is ~400px and tooltip max-width is 28rem = 448px). Measured: word "evaṁ" near the left edge, tooltip extended past the left side of the screen. Fix: useLayoutEffect now measures the tooltip rect after render. If the default centered position would overflow either viewport edge by more than 8px margin, set an absolute leftPx in offsetParent coords that pins the appropriate edge to viewport - 8px. Tailwind centering classes are swapped out for an inline style.left only when the clamp kicks in; desktop tooltips that fit naturally are unchanged. Also tightened max-width from min(28rem, 90vw) to min(28rem, calc(100vw - 1rem)) so the tooltip can never be wider than viewport - 16px — even before the JS clamp runs, the rendering is bounded.
… nested min(...) Previous tooltip max-width was set via Tailwind arbitrary value: max-w-[min(28rem,calc(100vw-1rem))] Tailwind's JIT parser appears to choke on the nested CSS functions inside an arbitrary value bracket. Result: max-width was effectively unset, tooltip rendered at content's natural width (unwrapped), and on narrow viewports the tooltip extended way past the right edge. Moved to inline style.maxWidth which always works. Same effective rule: on desktop, capped at 28rem (~448px); on viewport < 28rem, capped at viewport - 1rem (16px margin). Also set inline style.width = 'max-content' so the tooltip is as wide as needed up to maxWidth — without this, the box would shrink to whatever natural width Tailwind decides, often too narrow.
…dered
Was: two visually-separated sections ("Settings" + "Audit fields (V2)")
that artificially split toggles. The "V2" label leaked internal architecture
language into UX; readers don't need to know what V2 means.
Now: one flat list, no extra header, ordered by what most affects the
reader's experience:
1. Audit panel — opens large side/bottom drawer
2. Legend — opens visual reference panel
3. Tooltips — fundamental hover info
4. Grammar arrows — relation visualization
5. Alignment lines — Pāli↔English connections
6. Refrain colors — structural recurrence rhythm
7. Anchor emphasis — semantic-centerpiece underline
8. Cycle dots — multi-sense affordance under English words
9. Ghost words — English scaffolding visibility
10. Sense notes — audit panel detail
11. Citation chips — audit panel detail
12. Curator badges — audit panel detail (was hidden under "V2")
13. Emoji in tooltips — style preference
14. Grammar terms — jargon level
Reverse-direction edit: removes the section header + container wrapper
that conditionally subdivided. Same number of toggles, fewer dividers.
…tips
Pure-subtraction data cleanup. The 11 V2-curated phases (a-h + 1-3) were
already clean; the other 40 phases carried v10-style tooltips with emoji
markers (🎯 Purpose marker, 💭 Mindfulness, etc) and bracketed grammar
prefixes ([Genitive Plural], [Dative], etc). Two settings toggles
("Emoji in tooltips", "Grammar terms") were compensating at render time
by running stripEmoji() / stripGrammarTerms() over every tooltip.
This commit moves the strip from render-time to source-of-truth:
- 261 segments with tooltips modified
- 82 tooltips that became empty after strip were removed entirely
- 0 emoji remain (across full Unicode emoji ranges incl. supplemental
pictographs U+1FA70-U+1FAFF — the wood-log emoji escaped first pass)
- 0 bracketed grammar prefixes remain
Net effect:
- Tooltips are now V2-clean at the data layer across all 51 phases
- The render-time strip toggles become no-ops (no cruft left to strip)
- Both toggles can be safely removed in a follow-up subtractive PR
Companion: scripts/sutta-studio/strip-tooltip-cruft.ts. Idempotent —
re-running on clean data is a no-op.
Note: large diff because Python json.dump reformatted single-element
arrays to multi-line. Semantic content removed: emoji characters +
bracketed grammar labels. No new content.
…ases Bash wrapper around run-phase-experiment.ts that iterates through the 42 phases not yet V2-curated (phase-4-7, x/y/z, aa-bg) and produces v11 output JSON for each in docs/sutta-studio/experiments/. Loads OPENROUTER_API_KEY from .env.local. Sequential — ~$0.02 × 42 ≈ $0.85 with Gemini Flash. Wall time ~25 min. Idempotent re-run with SKIP_EXISTING=1 to skip phases whose output file already exists.
… repo Worktrees don't carry dotfiles from the main repo, so `source .env.local` failed when the script ran from the worktree. Now falls back to the canonical main-repo path if .env.local isn't present locally.
Empirical artifacts from the bulk v11 pipeline run on the 40 un-V2-curated
phases of MN10 (phase-4-7, x/y/z, aa-bg). Total cost $0.96, 1.12M tokens,
40/40 succeeded. Each output is a self-contained anatomist + lexicographer
+ phaseView trio with _meta showing model, tokens, cost, prompt version.
Why commit these:
- They're the raw evidence for the pipeline+polish thesis. Future
sessions (human or agent) reviewing the strategy can compare v11
output against the hand-curated phases and see exactly what V2
amendments produce vs what gets skipped.
- They are read-only research artifacts, not runtime data. The live
app reads demoPacket.json — these files don't affect rendering.
- Conversation JSONL is local; the repo is the only durable record.
Anyone trying to understand "why did we decide pipeline+polish was
the right scaling path?" can read phase-2-analysis.md alongside
these output files.
These outputs are the input to the Path B work that follows: per-phase
hand-polish of v11 drafts, splicing into demoPacket.json with curation
logs.
…ked example #1) Third dative-of-purpose in the satipaṭṭhāna formula chain (dukkhadomanassānaṁ atthaṅgamāya — "for the disappearance of pain and dejection"). First worked example of the Path B workflow. Kept from v11: - Segmentation (dukkha + domanass + ānaṁ; attha + ṅ + gam + āya) - isAnchor on p10 atthaṅgamāya (the verb-noun action) - Relation arrow p9 → p10 (genitive of purpose) - morph fields (gen-pl on ānaṁ, dat-sg on āya — added gender m) - Plain-first tooltip register - Etymological tooltips (axle-hole metaphor for dukkha, sun-setting for attha) Added in polish: - epistemicBasis + confidence + notes on all 6 senses - Cross-phase note on -ānaṁ: connects to phase-2's sattānaṁ + phase-3's sokaparidevānaṁ (genitive-of-purpose spine) - Cross-phase note on -āya: explicit "third of five datives" framing - Gender m on -āya morph - Caveat on the ṅ segment (traditional grammar treats as sandhi-trace, not separate morpheme — v11's 4-segment parse is pedagogical choice) - Distinction note on dukkha vs soka/parideva/domanassa vocabulary cluster Time spent: ~18 min. Expected to converge to ~10-12 min as rhythm settles. Concerns surfaced in docs/sutta-studio/curation/phase-4.md: 1. Translator-tradition citations are claimed from training memory, not from a verified database. The F task (tradition DB) would ground them. 2. ~5 of the 8 minutes writing hand-curated JSON went to metadata fields that the A3 metadata-filler post-pass would produce deterministically. Continuing Path B vs pausing to build A3 first is a sequencing trade-off the curator should decide — both paths hit ~10 hr total but A3 leaves reusable infrastructure.
…ps clickable
User empirical-audit finding: most of the V2 SENSE_METADATA payload is
"trust me bro" — confidence levels are LLM hallucinations, DPD-attested
tags can't be verified, and the fields are mostly hidden behind a
curator-only toggle that's off by default. Verifiable evidence trails
(clickable citation links) > asserted confidence levels.
WHAT WAS PURGED
Sense.epistemicBasis — LLM-classified "lexical/curatorial/etymological/…"
tags. Cannot be self-verified by the model.
Sense.confidence — "high/medium/low" labels. Hallucinated levels.
Sense.sourceCitationIds — Never rendered. Redundant with `citationIds`.
Segment.morph — Never rendered anywhere in the UI. Was study-mode
infrastructure waiting for a study-mode consumer
that doesn't exist.
- Removed from V2 prompt amendments (SUTTA_STUDIO_V2_AMENDMENTS no
longer includes SUTTA_STUDIO_V2_SENSE_METADATA). The named export
is kept for historical reference but unused.
- Stripped from demoPacket.json (101 senses, 37 segments).
- Stripped from experiment hand-curated files (phases 2, 3, 4).
- "Curator badges" toggle removed from settings.
- confidenceBadges flag removed from StudioSettings type and defaults.
- LensPanel no longer renders confidence/epistemicBasis badges or
accepts showConfidenceBadges prop.
- CONFIDENCE_COLORS + EPISTEMIC_BASIS_LABELS consts deleted.
WHAT WAS ADDED (verifiable evidence, not asserted certainty)
- LensPanel accepts `citations?: Citation[]` prop, indexed by id.
- citationIds chips become clickable links when their citation has a
`url` — opens in new tab, target=_blank, rel=noopener.
- Falls back to non-clickable span when no url is available (chip
still visible; just not navigable yet).
- Chip label uses `citation.short` when available (e.g., "DPD s.v.
me (pron — by me)") instead of the raw id.
- SuttaStudioView now passes `packet.citations` into LensPanel.
Result: when DPD entries are minted with URLs (via citationHelpers.ts
which already supports the `url` field on MaterializeOptions), the
chips become a "go check the source" affordance. The architecture
was always there; the rendering wires it in.
SUBTLER CYCLE DOTS
EnglishWord cycle-affordance dots default to opacity-30, brighten to
opacity-100 on parent-word hover. They were too loud as primary
visuals; now they're present-but-quiet, lighting up when the reader
approaches.
NET EFFECT
- demoPacket.json shrinks (phantom data gone).
- v11 prompt shrinks (LLM stops generating fields nobody reads).
- Settings list drops one toggle.
- Audit panel has one less code path.
- "Trust me bro" badges replaced by "go check the source" links.
The schema definitions in types/suttaStudio.ts for the stripped fields
are kept — separate concern, can be cleaned up in a follow-up if
confirmed permanently unused. Removing data is the leveraged subtraction;
removing schema declarations is cosmetic.
The retired V2 amendment + the strip script + the citation-click affordance
together signal the principle to future agents: don't carry data ahead of
consumer demand, and don't render asserted-confidence when verifiable
evidence is available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prepares the empirical experiment that validates PR #51's V2 prompt amendments. Phase-2 (`sattānaṁ visuddhiyā` — "for the purification of beings") is now hand-curated following all V2 amendments, with the v10 baseline snapshotted for diffing and scaffolding ready for the v11 compile run.
What's in this PR
Prompt version bump — `SUTTA_STUDIO_PROMPT_VERSION → 'sutta-studio-v11-mn10-amendments'`. PR Wire V2 amendments into live compiler (activates 2d198f6) #50's intended bump landed via PR Compiler consolidation Phase 0+1 — design doc + canonical prompts #51's consolidation transitively, but the version key itself was never bumped. Doing it now so benchmark/leaderboard runs distinguish v10 vs v11.
Hand-curated phase-2 — replaces the v10 LLM-generated phase-2 entry in `components/sutta-studio/demoPacket.json`. Full V2 amendments applied: morph fields on suffixes, epistemicBasis + confidence + notes on all senses, isAnchor on visuddhiyā, plain-first tooltip prose (no bracketed grammar prefixes or decorative emoji), translator-tradition citations for the 6 visuddhi senses (Sujato / Ñāṇamoli-Bodhi / Thanissaro), and cross-phase notes connecting sattā to bhikkhū (phases e-g) and visuddhiyā to samatikkamāya (phase-3).
Experiment scaffolding at `docs/sutta-studio/experiments/`:
Curation log at `docs/sutta-studio/curation/phase-2.md` — 185 lines following the established phase-{a..h,1}.md format. Documents every decision, predicts what v11 should produce, and identifies which gaps deterministic post-passes (A3) could close.
What this PR does NOT do
Next steps after merge
The user runs the v11 compile. Two paths:
CLI path:
```bash
Edit scripts/sutta-studio/generate-new-phases.ts:
SEGMENT_RANGE = ['mn10:2.1']
add wordRange [4, 6] for phase-2's words (sattānaṁ visuddhiyā)
Then:
OPENROUTER_API_KEY=... tsx scripts/sutta-studio/generate-new-phases.ts
```
UI path: open /sutta/demo, trigger compile, save phase-2 output.
Save the result to `docs/sutta-studio/experiments/phase-2-v11-output.json`. Then write the analysis at `phase-2-analysis.md`.
Test plan
🤖 Generated with Claude Code