feat(sutta-studio): provider abstraction + Citation extension (Tier-1 commit A of 5) by anantham · Pull Request #38 · anantham/LexiconForge

anantham · 2026-05-11T16:13:22Z

Summary

First of five Tier-1 commits implementing the grounded curation data layer per ADR SUTTA-008. Lands the provider abstraction + Citation provider-attribution fields. Subsequent commits add DPD, VRI edition, Aṭṭhakathā commentary, SC bilara/suttaplex, and curation helper.

Draft because B-E are coming on this branch. Convert to ready-for-review when the Tier-1 set is complete (or when we decide to merge A independently).

What's in

services/providers/types.ts — interfaces: LexiconProvider, MorphologyProvider, CommentaryProvider, EditionProvider, WitnessProvider, ParallelProvider. Every response carries sourceId + deterministic citationId per ADR amendment refactor(db): remove legacy indexedDB facade #2.
services/providers/citationHelpers.ts — citationIdFor (cite:{providerId}:{sourceId} or cite:{providerId}:q:{query}) and materializeCitation. Citation minting is mechanical, not hand-glued.
services/providers/lexiconRegistry.ts — LexiconProviderRegistry runs providers in parallel; preserves per-provider entries in entriesBySource (powers ADR UI vision feat(prompts): metadata preamble #7 source-disagreement inspector); isolates one provider's failure from others.
services/providers/suttaCentralDictionary.ts — SuttaCentralDictionaryProvider wraps the existing /api/dictionary_full/{lemma} endpoint as a first citizen. Per-session cache; preserves raw payload as rawExcerpt.
services/providers/index.ts — barrel + defaultLexiconRegistry singleton.
types/suttaStudio.ts — Citation extended with provenance / query / excerpt / license / fetchedAt; new CitationProvenance enum (15 sources).
42 new tests across the three modules + Citation round-trip.

What's not in (intentionally)

services/compiler/index.ts:387 (the fetchDictionaryEntry callsite) is unchanged. Compiler wiring lands with DpdProvider in commit B as one coherent unit — we change the lexicographer prompt builder once, not twice.
DPD / VRI / bilara / suttaplex providers — they'll register into the same defaultLexiconRegistry in commits B-D.

Test plan

42 new tests pass (citationHelpers 11, lexiconRegistry 8, suttaCentralDictionary 10, Citation round-trip 13)
65 pre-existing sutta-studio tests still pass (suttaStudioRehydrator, sutta-studio-utils, model-override, suttacentral adapter)
Typecheck: only the 5 pre-existing baseline errors unchanged from main (none in services/providers/)
When B-E land, full test suite re-runs
Hand-curation calls defaultLexiconRegistry.lookup during phase-a re-curation (task feat(db): pre-migration backup + version gate + restore #14) — first real usage exercise

ADR alignment

ADR Section	This PR
§Decision · Principle (amended)	Citation extension carries the attestation trail
§Provider abstraction	All six interfaces sketched; `LexiconProvider` instantiated
§Provider abstraction · Source-local handles	`sourceId` + deterministic `citationId`
§Citation schema extension	`provenance` / `query` / `excerpt` / `license` / `fetchedAt`
§Build order step 1+2	This commit
§UI Vision #7 source disagreement	`entriesBySource` preserves per-provider entries; test demonstrates grouping by `query`

🤖 Generated with Claude Code

…ion fields Tier-1 commit A per ADR SUTTA-008 §Build order. Lands the data-layer plumbing that subsequent commits attach DPD, VRI, bilara, and suttaplex providers to. No behaviour change for the live compiler in this commit; hand-curation tooling can already call the registry. New: services/providers/ - types.ts — LexiconProvider, MorphologyProvider, CommentaryProvider, EditionProvider, WitnessProvider, ParallelProvider interfaces. Every response carries `sourceId` + deterministic `citationId` (`cite:{providerId}:{sourceId}` or `cite:{providerId}:q:{query}`) so citation materialisation is mechanical, not hand-glued. - citationHelpers.ts — citationIdFor + materializeCitation. - lexiconRegistry.ts — LexiconProviderRegistry runs providers in parallel, preserves per-provider entries in `entriesBySource` (powers the source-disagreement inspector in ADR UI vision §7), isolates one provider's failure from the others. - suttaCentralDictionary.ts — SuttaCentralDictionaryProvider wraps `/api/dictionary_full/{lemma}`; per-session cache; preserves raw payload as `rawExcerpt` so the LLM prompt + UI see unmodified attestations. First citizen of the registry. - index.ts — barrel + defaultLexiconRegistry singleton (SC only for now). - Tests: 29 across the three modules (citationHelpers 11, lexiconRegistry 8, suttaCentralDictionary 10). Modified: types/suttaStudio.ts - Citation extended with provenance / query / excerpt / license / fetchedAt. Excerpt is baked into the packet so the renderer doesn't re-fetch. - CitationProvenance enum: sc-dictionary-full, dpd, ms-dpd, ped-dsal, cpd, vri-attha, vri-cscd, sc-bilara, sc-suttaplex, buddhanexus, bdrc, cbeta, gretil, 84000, manual. - 3 new round-trip tests + 1 disagreement-grouping test in types/suttaStudio.test.ts. Compiler path (services/compiler/index.ts:387) is intentionally unchanged. The existing fetchDictionaryEntry callsite keeps working; provider wiring will land alongside DpdProvider in commit B as a single coherent unit. Verified: 42 new tests pass; 65 existing sutta-studio tests still pass; typecheck shows only the 5 pre-existing baseline errors unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-11T16:13:28Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lexicon-forge	Ready	Preview, Comment	May 12, 2026 6:34pm

…1 commit B.1) Per ADR SUTTA-008 §Build order step 3, lands the DPD data layer for MN10. Provider impl follows in B.2; compiler wiring in B.3. ms-dpd vs full dpd-db decision: ms-dpd is verb-blind — its inflection table has zero verb conjugation rows, only declensions. For the assasati verb family central to MN10 this kills it. Using full dpd-db. Storage strategy (resolved during this spike — ADR §Open Questions #2): Per-sutta subsets, not full corpus. Full DPD export is 80-120MB JSON; committing that for one sutta is disproportionate. The script extracts only headwords referenced by surface forms in the target sutta. Total committed for MN10: ~656KB. Each new sutta adds its own subset directory. Surface→lemma resolution: Heuristic stem-stripping over dpd.txt (the 4MB human-readable release; no SQLite required). Initial pass: 34%. After parser fix for single-digit homonyms (DPD uses both "me 1" and "a 1.1" styles): 81.6% coverage on MN10 (436/534 surface forms). Remaining 18% are mostly compounds (sammāsambuddhassa, ajjhattikabāhiresu) and inflected verb forms that live in DPD's SQLite inflection table. Documented as unmatchedSurfaces in manifest.json; SQLite escalation is a future commit if curation needs higher coverage. Files: - scripts/build-dpd.ts — Node TS, no native deps. Downloads pinned DPD release dpd-txt.zip (4MB) on first run, caches in data/_raw/ (gitignored), parses to structured DpdRecord, fetches bilara MN10 Pāli root, extracts surface forms, resolves via stem-stripping + quotative marker handling + locative→stem restoration. Projects to LexiconEntry shape per the providers types added in commit A. - data/dpd/mn10/headwords.json (618KB) — lemma → LexiconEntry[] - data/dpd/mn10/forms.json (20KB) — surface → lemma candidates - data/dpd/mn10/manifest.json — coverage stats + unmatched surfaces - data/LICENSE-DATA.md — CC BY-NC-SA 4.0 with DPD attribution + placeholders for VRI / bilara / future providers - .gitignore — data/_raw/ added (upstream zip + extracted txt) - package.json — `npm run build:dpd` script entry Pinned release: dpd-db v0.4.20260501 (May 2026). Re-ingest with `npm run build:dpd -- --force` after bumping DPD_RELEASE_TAG in the script. Monthly cadence upstream. Verified: 42 pre-existing provider tests still pass. No app code changed in this commit; the DpdProvider that consumes this data lands in B.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the MN10 DPD dataset from B.1 into a LexiconProvider that hand-curation scripts and (after B.3) the live LLM compiler will both call. services/providers/dpd.ts (isomorphic — no Node imports) - DpdProvider implements LexiconProvider - Lookup strategy: 1. Direct lemma match in headwords map 2. Surface-form match via forms map → resolve candidate lemmas, merge entries deduped by sourceId - Direct match is preferred over surface-form match - Lemma normalisation (trim + lowercase) at lookup time - mergeDpdData helper for combining multiple per-sutta subsets services/providers/dpd-loader-fs.ts (Node-only, separate from dpd.ts so browser bundles don't accidentally pull node:fs) - loadDpdSubsetFromFs(suttaUid, dataRoot?) — single subset - loadAllDpdSubsetsFromFs(dataRoot?) — merge every subset under data/dpd/, silently skipping siblings without headwords.json services/providers/dpd.test.ts (20 tests) - Synthetic data: direct match, surface→lemma, normalisation, multi-candidate, missing-lemma, empty-input, direct-vs-surface preference, provider contract (id/label/license) - mergeDpdData: conflict resolution, empty sources, sources without forms - Real MN10 integration: 5 tests that load data/dpd/mn10 and verify common lemmas (sati, viharati, bhikkhu) resolve; locative surface (kāye) routes via forms to kāya; DPD POS=fem projects to MorphHint.gender='f'; absent lemmas return empty services/providers/index.ts - Re-exports DpdProvider + types - Does NOT register DPD into defaultLexiconRegistry — registration requires environment-specific loaders (Vite glob in browser, fs.readFileSync in Node) and is wired in commit B.3. Hand-curation scripts construct directly via the FS loader. Verified: 62 provider tests pass (42 from commit A + 20 new); 65 existing sutta-studio tests still pass. No app code touched — compiler wiring lands in B.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mit E) Per ADR SUTTA-008 §Build order step 7. Unblocks phase-by-phase hand re-curation of demoPacket.json (task #14) by making every lemma lookup grounded in real attestation rather than memory. Usage: npm run sutta:lookup -- --phase phase-a npm run sutta:lookup -- --lemmas evaṁ,me,sutaṁ npm run sutta:lookup -- --phase phase-a --json > /tmp/out.json For a phase, the script reads demoPacket.json, extracts every paliWord surface form (concatenated segment text), calls every registered provider in parallel, and prints per-source blocks: [ 1] evaṁ (function) — wordId=a1 ✓ SC dictionary_full (3 entries): ... ✓ DPD (5 entries): • eva [ind]: only; just; merely; exclusively citationId: cite:dpd:dpd:18051 ... The citationIds shown are deterministic — the curator pastes them into Sense.sourceCitationIds and the Citation entry materialises via the helpers from commit A. Implementation notes: - Constructs a fresh LexiconProviderRegistry (never mutates the default). Registers SC + DPD (the DPD subset for the sutta is loaded via dpd-loader-fs.ts). - --json mode emits a structured blob for programmatic consumption. - Errors in one provider don't poison the others (registry isolation behaviour from commit A). - Network calls happen via SC's existing fetchJsonViaProxies path. Smoke-tested against phase-a (evaṁ, me, sutaṁ): - DPD returns 5 senses for evaṁ, 6 case-by-case senses for me, 6 entries for sutaṁ (past participle, neuter noun, masc/fem homonyms suta/sutā) — each with deterministic citationIds and structured MorphHint where the POS maps. - SC dictionary_full returns 3 entries per lemma; their first- sense extraction shows "(no sense)" for shapes my SC parser doesn't fully handle (raw payload is still preserved in rawExcerpt for the curator to read). Known refinement. Verified: 62 provider tests still pass; no app code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… commit B.3) Closes the loop on ADR SUTTA-008's keystone principle — hand-curation and the LLM compiler now share the same data layer. Every Sutta Studio compilation at /sutta/<uid> consults DPD alongside SC dictionary_full, giving the lexicographer LLM structured morphology + multiple attested senses per Pāli word with deterministic citationIds. Additive, low-risk wiring: - DictionaryCache (IndexedDB) unchanged; cached SC payloads still load - SC fetchDictionaryEntry path unchanged - DPD is consulted *in parallel* as a separate data source - Old callers of buildLexicographerPrompt without dpdLookups get exactly the prior prompt (optional parameter) - If DPD loading or lookup fails for any reason, the lexicographer pass logs a warning and falls through to SC-only behaviour New: services/providers/dpd-loader-vite.ts - Uses Vite's `import.meta.glob` to eager-load every bundled data/dpd/<sutta>/{headwords,forms}.json at build time - Merges into one DpdData singleton via getBundledDpdData() - Wrapped in try/catch so any glob resolution issue degrades gracefully rather than breaking the compiler Modified: services/compiler/prompts.ts - buildLexicographerPrompt accepts optional dpdLookups param - When present, renders a structured "DPD attestations" block: • lemma [pos]: first-sense gloss {morphHint} cite=cite:dpd:dpd:N - Caps at 5 entries per word so compound-heavy phases stay bounded Modified: services/compiler/index.ts - After SC dictionary fetch + cache work, runs DpdProvider.lookup for every content word in parallel against the bundled subset - Logs DPD hit rate: "DPD attestations: 14/15 words matched" - Passes dpdLookups to buildLexicographerPrompt Verified: - 157 tests pass across 9 suites (no regressions in provider, compiler, sutta-studio-utils, or rehydrator tests) - Typecheck clean for changed files (5 pre-existing baseline errors elsewhere unchanged) - `npx vite build` succeeds; DPD JSON shards bundle in (~656KB contribution from data/dpd/mn10/, well under chunk-size limits already triggered by pre-existing chunks) This is the final commit of Tier-1 B. Tier-1 sequence so far: A 9168b5a provider abstraction + Citation extension B.1 82fae37 DPD ingestion + MN10 subset (81.6% coverage) B.2 49d3eba DpdProvider impl + tests E 5ff46c0 curation helper (npm run sutta:lookup) B.3 (this) compiler wired to DPD via Vite-bundled subsets Tier-1 C (VRI providers) and D (SC bilara + suttaplex providers) remain. Task #14 (MN10 phase-a re-curation) is now fully unblocked — both manual lookup (E) and automated compilation (B.3) consult DPD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s (Tier-1 commit D) Closes Tier-1 of ADR SUTTA-008 §Build order alongside the deferred C (Aṭṭhakathā provider — to land as a follow-up). Phase-a re-curation now has every Tier-1 data source feeding the curation helper. services/providers/scBilaraVariants.ts - SuttaCentralBilaraVariantsProvider fetches raw.githubusercontent.com/suttacentral/bilara-data/published/ variant/pli/ms/sutta/<basket>/<uid>_variant-pli-ms.json - Parses bilara's free-form "original → reading (witness1, witness2)" notation; multi-variant entries split on `|` - Returns VariantReading[] per canonical segment id, compatible with DeepLoomPacket.provenance.segmentVariants - Per-sutta cache; missing variant files (common for stable openings like mn10:1.1) cached as empty so we don't refetch - License: CC BY-SA 4.0 services/providers/scSuttaplex.ts - SuttaCentralSuttaplexParallelProvider implements ParallelProvider - Calls https://suttacentral.net/api/parallels/{uid} - Handles work-level keys (top-level "mn10") and segment-level keys ("mn10#44.1", with # separator) — the latter projected to canonicalSegmentId form ("mn10:44.1") for consistency with bilara - Returns ParallelRef[] with deterministic citationIds + sourceIds services/providers/index.ts - Re-exports both providers + their result types scripts/sutta-studio/lookup-phase.ts - At phase start, prints "Phase-level evidence": • Parallels for the sutta (top 8 of N) — workId, type, title, acronym • Variant readings for the phase's canonicalSegmentIds, or "(none)" when stable across witnesses - --json mode includes the parallels + variants in the structured blob Verified end-to-end against phase-a / mn10:1.1: ## Phase-level evidence Parallels for mn10 (top 8 of 16): → dn22: full · The Long Discourse about the Ways of Attending to Mindfulness · DN 22 → ea12.1: full · The One Way In Sūtra · EA 12.1 → ma98: full · 念處 · MA 98 → mn119: full · Mindfulness of the Body · MN 119 ... Variant readings: (none for mn10:1.1 — stable across witnesses) Tests: - 7 tests for SuttaCentralBilaraVariantsProvider (parse, cache, network failure, malformed entries) - 9 tests for SuttaCentralSuttaplexParallelProvider (work-level, segment-level, # → : conversion, merging, fallbacks) - 173 total tests pass across 11 suites; no regressions Tier-1 status: A 9168b5a provider abstraction + Citation extension B.1 82fae37 DPD ingestion + MN10 subset (81.6%) B.2 49d3eba DpdProvider impl + tests E 5ff46c0 curation helper script B.3 bc46e47 compiler wired to DPD via Vite bundle D (this) SC bilara variants + suttaplex parallels providers C pending VRI edition + Aṭṭhakathā commentary (deferred per ADR §Open Questions #4; alignment unknown; not blocking phase-a re-curation) Task #14 MN10 phase-a re-curation is now fully unblocked with every Tier-1 data source flowing through the curation helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…log skeleton Locks the curation process before phase-a re-curation begins. Operationalises task #14 (MN10 phase-by-phase re-curation) with explicit gates, artifact shapes, and human-approval moments. The discipline this protocol enforces: Schema and UI insights are extracted *after* the packet diff, not allowed to hijack the packet diff. Two new docs: docs/sutta-studio/CURATION_PROTOCOL.md (327 lines) - §0 Five invariant questions (always-load-bearing) - §1 The 12-step loop (brief → evidence → … → commit → issue extraction) - §2 Artifact shapes: phaseBrief (with required `tension` field), evidenceBundle (with inline excerpts — gate-check must be semantic, not syntactic), alignment scaffold, epistemic classification table, curation log entry - §3 Three gates: Evidence Gate, Ghost Gate, Affordance Gate. `required` GhostKind is discouraged as default; curator must name a specific kind from the expanded set. - §4 Role locks for future multi-agent runs (curator can edit packet + curation/, builder can edit services/types/tests, observer reads .runs/ ledger only, human at semantic gates) - §5 Eight specific human-approval moments - §6 Batching recommendation for MN10 first 15 phases (a alone, then b-d, e-h, i-o; re-evaluate before 16-51) - §7 Machine-observability deferred (L1 logs → L2 tmux → L3 events.jsonl), with sutta_curation_conductor angel sketched but not built; per the "earn-the-externalization" principle, protocol stabilises first - §8 Where each kind of content lives (curation log vs commit message vs FEATURES.md vs ADR vs new issue) - §9 Known failure modes + remedies - §10 Four open protocol questions for the next refinement cycle docs/sutta-studio/curation/phase-a.md (141 lines) - Skeleton with section-by-section to-fill markers - Committed empty so the protocol structure is in place before work begins; filled during the actual phase-a run Refinements absorbed into this protocol: From Aditya's draft Grounded Curation Loop proposal: - Phase brief comes first - Evidence bundle as curated artifact, not terminal spam - Epistemic classification before tooltips - Curation log as durable why-behind-what record - Three gates (evidence, ghost, affordance) - Batch sizing (a, b-d, e-h, i-o) From my refinement pass: - Evidence bundle includes inline `excerpt` + `decisionRelevance`, not just citationIds. Closes the audit loop in one pass. - Phase brief includes `tension` field. Every UI affordance must resolve a named tension or it's decorative. From Aditya's multi-agent observability sketch: - Role locks codified (curator / builder / observer / human) - Three escalation levels (L1 logs → L2 tmux → L3 ledger) - Curation conductor angel deferred — script-then-protocol sequencing inverted; iterate protocol first Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ce (steps 0-2) Per CURATION_PROTOCOL.md §1, fills the first three steps of the Grounded Curation Loop for phase-a before any packet edit. Awaits human gate at §evidenceBundle before proceeding to alignment. §0 Phase brief - 3 tensions named (primary: grammar-bridge — oblique 'me' + past- participle 'sutaṁ' + ghost 'I') - Secondary tensions: evaṁ polysemy, transmission-frame - Plain-language summary: 4 English words conceal 3 grammatical facts + 1 transmission-frame claim §1 Current packet snapshot - 3 paliWords, 4 englishTokens, 1 relation - Strengths: dual-register tooltips already in place, content/function split respected - Gaps from new schema: MorphHint empty (case/form/gender across all segments), epistemicBasis + sourceCitationIds empty across all senses, isAnchor unset on candidate (sutaṁ), ghostKind on 'have' using catch-all 'required' instead of specific 'auxiliary', refrainId open question (cross-sutta scope) §2 Evidence bundle (7 usable citations from DPD) - eva: dpd:18054 (emphatic narrative-opener, PRIMARY) + dpd:18051 (restrictive sense, SECONDARY for polysemy tooltip) - me: dpd:53164 ('by me', PRIMARY) + dpd:53163 ('myself/me-object', SECONDARY for case polysemy) - suta: dpd:63769 (pp 'heard', PRIMARY) + dpd:63771 (nt 'what is heard', PRIMARY for sense 2) + dpd:63770 (literal 'learned', SECONDARY) - Parallels: 16 work-level (DN22, MN119, MA98...) — flagged as schema tension in §10 (where do work-level parallels live?) - Variants: ZERO for mn10:1.1 — opening stable across all witnesses - Gaps: SC dictionary_full low-info (parser limitation), Aṭṭhakathā not wired (commit C deferred), comparative basis not wired Next step (after human review): §3 alignment scaffold. No demoPacket edit yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…3-§10 Gate verdict from Aditya: approve §0-§2 conditionally with 6 amendments + clean deferral. All applied: 1. transmission-frame tension marked as packet-level (resolutionSurface: provenance.attribution + narrator frame + recited-speech span); phase-a introduces but does not resolve. 2. DPD evidence for evaṁ DOWNGRADED. Verified directly: dpd.txt has NO evaṁ/evaṃ headword, only eva (the bare particle) + evam-prefixed sandhi compounds. The stem-stripper in build-dpd.ts conflates evaṁ → eva mechanically, but the senses of eva (only/just/merely/ indeed/still) do NOT include the 'thus / in this way' reading required for the opening formula. This is derivational, not inflectional. Provider issue logged in §10 as fix-targets. 3. evaṁ sense upgraded to "Thus" + nuance "Narrative-opening deictic ('in this way') — points forward to what is about to be recited." Grounded in Pāli grammar (Geiger §66 / Warder Ch.13), not DPD. 4. sutaṁ confirmed as anchor candidate. Both pp 'heard' and nt 'what is heard' DPD citations preserved. Tooltips reframed per amendment: the -ṁ ending marks neuter nom/acc sg declension; it does NOT nominalize. Substantive use is a syntactic possibility of past participles in Pāli generally. 5. 16 work-level Suttaplex parallels NOT placed in phase-a.parallels (option c). Logged in §10 as schema gap — DeepLoomPacket needs a packet-level workParallels field separate from PhaseView.parallels. 6. Filled §3 alignment, §4 linguistic, §5 bridge, §6 pedagogy, §7 epistemic audit, §8 decisions, §9 open questions, §10 schema/UI tensions surfaced. §7 epistemic audit table covers every claim: - 5 lexical claims with citationIds - 5 etymological claims (Pāli grammar) - 5 curator-inferred claims (explicitly marked) - 0 naked authoritative claims §10 surfaces 6 schema/UI tensions for separate follow-up commits: - DPD stem-stripper conflates derivational forms - Packet-level workParallels schema gap - Cross-sutta formula recurrence (extend Span.kind) - Implicit-subject "I" rendering affordance - SC dictionary_full parser limitation - First-class curator-inference marker No demoPacket.json edit in this commit. The proposed JSON diff follows as a chat artifact for the second gate; demoPacket.json edits land only after gate approval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

First MN10 phase re-curated via the Grounded Curation Loop protocol (docs/sutta-studio/CURATION_PROTOCOL.md). Seven localized changes in demoPacket.json phase-a + four new packet.citations entries. Second-gate verdict from Aditya: "Approve to apply, with small wording/schema-basis amendments before commit." All seven amendments applied. See docs/sutta-studio/curation/phase-a.md §12 for the amendment list. Packet changes (phase-a): 1. a1 (evaṁ) sense upgraded to grounded deictic: "Thus" + nuance "Narrative-opening deictic ('in this way') — points forward to what is about to be recited" + notes contrasting with bare-eva senses + epistemicBasis: 'etymological' (placeholder — should be 'grammatical' once enum extends, §10.7). 2. a1.s1 tooltip softened: "Do not confuse evaṁ with bare eva, whose common senses include 'only', 'just', or 'indeed'. Here evaṁ functions as the adverbial deictic: 'thus; in this way.'" 3. a1.s2 tooltip rewritten: "[Niggahīta -ṁ] Marks the surface form evaṁ, the adverbial deictic 'thus; in this way.'" 4. a2 (me): morph: { case: 'gen' } added; sense gets sourceCitationIds + epistemicBasis: 'lexical' + confidence: 'high' tied to cite:dpd:dpd:53164. Relation (a2→a3 "Heard BY") gets confidence + epistemicBasis: 'etymological' (placeholder for 'grammatical', §10.7). 5. a3 (sutaṁ) marked isAnchor: true. Three existing senses retained, each gets sourceCitationIds + epistemicBasis: 'lexical' grounded in cite:dpd:dpd:63769 (pp 'heard'), :63771 (nt 'what is heard'), :63770 (pp 'learned'). a3.s2 (ta) gets morph: { form: 'participle' }; a3.s3 (ṁ) gets morph: { gender: 'n', number: 'sg' } AND its tooltips reframed to avoid the false "nominalizes" claim — now says "Declensional ending; supports either reading; English perfect collapses them." 6. ea2g.ghostKind: 'required' → 'auxiliary' (specific kind from FEATURES.md §2.3 expanded set). 7. packet.citations populated with 4 entries (the cited DPD records) carrying provenance + excerpt + license + fetchedAt — baked in so the future audit/disagreement UI renders without re-fetch. Curation log (docs/sutta-studio/curation/phase-a.md): - All 12 sections filled - §10 surfaces 9 schema/UI tensions (was 6; added 7-9 from the gate): grammatical/curatorial EpistemicBasis values, function on MorphHint, finer ghostKinds - §11 Outcome documents test + build results - §12 captures all 7 amendments from Aditya's second-gate review Provider issue logged for follow-up commit (§10.1): DPD stem-stripper conflates derivational forms (evaṁ → eva). Fix targets: scripts/build-dpd.ts. Verified: - JSON parses cleanly; field-level readback confirms every edit - 173/173 tests pass across 11 suites - npx vite build succeeds (23.30s) Tier-1 + first phase = first real grounding event. The live /sutta/demo phase-a will improve once main updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Kind + arrow calm-default) First renderer chunk after the phase-a re-curation. Per Aditya's "implicit visual grammar first, explicit explanation second" discipline: most schema fields should become FELT structure, not labels. This commit makes three of them visible without prose. components/sutta-studio/PaliWord.tsx Anchor styling on PaliWord.isAnchor=true: - subtle warm underline (border-amber-700/30, only when no refrain underline already present) - slight weight increase (font-medium) on the word's serif text Implicit cue — no badge, no "★", no label. A felt difference only. Phase-a's `sutaṁ` is the first word in MN10 with isAnchor=true. components/sutta-studio/EnglishWord.tsx Per-kind ghost styling — implicit visual grammar, not labels: - auxiliary → soft solid underline (border-slate-700/50) - pronoun_from_verb → faint dotted line (border-slate-700/70 dotted) - interpretive → italics only (already from isGhost styles) - required → dotted underline (existing behavior preserved) - article/copula/preposition_from_case/punctuation/quote_marker → default ghost styles only Phase-a's "have" ghost (now ghostKind: 'auxiliary') visibly distinguishes from generic 'required' ghosts. components/sutta-studio/SuttaStudioView.tsx Arrow default opacity: 0.4 → 0.2. Arrows are interactions, not furniture. Default state is quieter so the hover-summon feels like the arrow appears FOR the reader, not always-on clutter. Focused state unchanged at 0.9. This is chunk 1 of the renderer arc: 1. ✓ visual hints + arrow calm-default (this commit) 2. About-this-text provenance panel 3. Tooltip restructure (plain/grammar split, click-Pāli cycles facets) 4. Per-sense citation chips + DPD modal Phase-b curation paused until visual hints land and we look at phase-a in the browser to test the implicit-first principle empirically. Verified: 109 provider + sutta-studio component tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ooltip Aditya observed that clicking a Pāli segment pins its tooltip, but the visual state doesn't communicate "this is pinned now" vs "this is just on hover" — the cursor stays as help-? and the styling is identical. Confusing. Two small changes give the pin state a felt signal: components/sutta-studio/PaliWord.tsx Separate isPinned styling from isHovered: - hovered: white underline + bg (existing) - pinned: emerald-600/70 underline + bg + thin emerald ring around the segment (ring-1 ring-emerald-700/40) + cursor switches to 'pointer' (signals "click target") Hover and pin no longer share visual state. components/sutta-studio/Tooltip.tsx Accept optional pinned: boolean prop: - border becomes emerald-700/70 (vs slate) - small '×' glyph in the top-right corner indicating the tooltip is pinned and the segment can be clicked to unpin. - The × is decorative (pointer-events-none); actual unpinning happens by clicking the segment, same as before. This is the A piece of "A then B" — quick cosmetic fix to make the existing click-pin behavior legible. B (Chunk 3, tooltip restructure with click-cycles-facets) replaces the pin model entirely; this distinction will be repurposed for facet-state. Verified: 44 tests pass (SuttaStudioView + sutta-studio-utils). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the click-toggles-pin model on Pāli segments with the grounded-curation click model: each click on a Pāli segment pins the tooltip AND advances to the next tooltip facet. To dismiss the pinned tooltip, click the × glyph (now interactive). Per the user's curation rhythm: Pāli stays put; click on Pāli cycles the tooltip's facets (Meaning / What English hides / Example / etc.). Click on English continues to cycle senses (separate mechanism). The asymmetry maps to what each side is for — Pāli is canonical, English is plural. SuttaStudioView.tsx - New state: tooltipFacetIndices: Record<`${phaseId}-${segId}`, number> - New function: cycleSegmentTooltipFacet(phaseId, segId) advances the facet index, wrapping at seg.tooltips.length. No-op for segments with ≤1 tooltip. - Plumbs both to PaliWordEngine. PaliWord.tsx - Accepts tooltipFacetIndices + cycleSegmentTooltipFacet props - Picks tooltip from seg.tooltips[facetIdx] (was: implicitly coupled to word sense index — buggy when sense count ≠ tooltip count) - onClick: always pins the segment (no more toggle-unpin on same-segment click) + calls cycleSegmentTooltipFacet + still cycles segment senses if present - Passes facetIndex/facetTotal/onUnpin to Tooltip Tooltip.tsx - Accepts facetIndex + facetTotal: shows a small "1/3"-style indicator in the top-left when there are multiple facets so the reader knows clicking again advances - Accepts onUnpin: makes the × glyph an actual <button> (pointer-events-auto on the button only; the tooltip body stays pointer-events-none so it doesn't intercept hovers from the segment below) Verified: 109 tests pass (SuttaStudioView + sutta-studio-utils + all provider suites). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… of viewport Two tooltip layout bugs surfaced via Playwright verification of phase-a: 1. Horizontal overflow: long facet text on right-edge segments clipped past viewport — caused by `whitespace-nowrap` forcing single line. Fix: `whitespace-normal break-words` + `max-w-[min(28rem,90vw)]` + `w-max`. Tooltip wraps to multi-line for long text but never wider than 90% of viewport. 2. Vertical overflow at top: phase-a sits at the top of the scroll container; with the default "above segment" position (-top-14) the tooltip rendered off-screen at y=-14. Fix: useLayoutEffect measures the *segment*'s position (offsetParent) — not the tooltip's — and flips to below-segment positioning (top-full mt-3) when there isn't room above. Measuring the segment is invariant under flip; measuring the tooltip caused an infinite loop (above → top<8 → flip below → top>8 → flip back). Also added: - leading-relaxed for readable line-height in multi-line tooltips - text-left so wrapped tooltips read left-to-right cleanly - tooltip position uses bottom-full/top-full (dynamic) instead of fixed -top-14 — adapts to tooltip's actual height Trade-off: when the tooltip flips below for phase-a, it overlaps part of the English row ("by me" partially obscured). Acceptable for now because the tooltip is dismissible via the × glyph or clicking elsewhere. Future improvement: tooltip could be semi-transparent over English, or a higher z-index dimmer over the English row when tooltip pinned. Verified via Playwright: - sutaṁ ṁ segment click → tooltip at y=150 (below segment), all overflow checks false (top, bottom, left, right) - Zero console errors after the infinite-loop fix - Multi-line wrap working (height=77px = 3 lines for the long "In the formula 'me sutaṁ'..." facet text) - 44 component tests still pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…provenance Lands Level 3 audit per ADR SUTTA-008 §UI Vision and Aditya's "the audit you are most hungry for" framing — what historical/textual object am I reading? — work, expression, edition, transmission, translator, annotation layer, and crucially: visible unknowns. components/sutta-studio/AboutThisText.tsx (new) - Compact chip mounted at top of content: "▶ MN10 · Pāli · Theravāda · tr. Bhikkhu Sujato about" - Click expands a structured panel with sections: Work · Expression · Edition · Translation · Traditional Attribution · Annotation Layer · External References · Unknowns - Traditional Attribution surfaces Provenance.attribution.confidence ("traditional" / "attested" / "disputed") — keeps tradition from laundering itself as fact (the user's "make beliefs pay rent") - Unknowns section makes gaps visible as honest prose, not blank fields: - "No single manuscript witness is attached to this packet" - "First written attestation date is not recorded" - "mn10:1.1 has no recorded variants" (when relevant) - "Buddhaghosa's commentary not yet wired" - Annotation Layer is rendered as static prose (LexiconForge curator, DPD-backed). A first-class `Provenance.annotationLayer` schema field is a follow-up — flagged in §10.7 of phase-a curation log. components/sutta-studio/SuttaStudioView.tsx - Mount <AboutThisText packet={packet} /> after ScrollProgressBar, before the phases. The chip is the first thing the reader sees; the body remains collapsed by default. components/sutta-studio/demoPacket.json - Populate packet.provenance with what we honestly know about MN10: attribution: speaker, audience, legendary place/date, confidence='traditional' oralLineage: Theravāda, Pāli, 5th–1st c. BCE, bhāṇaka recitation edition: Mahāsaṅgīti Tipiṭaka 1956, Sixth Council, VRI digital source translation: Bhikkhu Sujato (2018, CC0, via SuttaCentral) external: link to suttacentral.net/mn10 - Deliberately NOT populated: firstWritten, manuscripts (we don't have witness data for this specific packet — surfaces as Unknowns prose) Verified via Playwright: - Chip renders with all 4 breadcrumb parts (workId, language, tradition, translator) - Click expands → 8 sections render with correct field plumbing - "Certainty: traditional" appears on the attribution row - "No single manuscript witness..." appears in Unknowns - Zero console errors Tier-1 + first phase + renderer arc + Level 3 audit affordance ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second MN10 phase re-curated via the Grounded Curation Loop. Six localized changes + 5 new packet.citations. Aditya's gate-2 verdict: approve with 2 amendments. Both applied. Packet changes (phase-b): 1. b1 (ekaṁ): morph case=acc, number=sg on stem; case=acc on ṁ suffix. sourceCitationIds on the 'one' sense → cite:dpd:dpd:17376 (card 'one') + cite:dpd:dpd:17382 (ind 'once'). confidence: high, lexical basis. 2. b2 (samayaṁ): **isAnchor: true** (amendment from Aditya — moved from b3 to b2 because phase-b's bridge-learning point is 'ekaṁ samayaṁ → At one time', not 'bhagavā'). Morph on segments: gender=m on aya root; case=acc, gender=m, number=sg on ṁ suffix. All 3 senses get citationIds (cite:dpd:dpd:59451 for 'occasion' + 'time'; cite:dpd:dpd:59452 for 'opportunity'). Relation b2s3 → b3 'Time WHEN' gets confidence: high + epistemicBasis: 'etymological' (placeholder for 'grammatical' once enum extends — phase-a §10.7). 3. b3 (bhagavā): **NOT marked isAnchor** per amendment. Retains refrainId: 'bhagava' (sufficient marker for recurring Buddha-epithet). Morph case=nom, gender=m, number=sg on vā segment. All 3 senses get cite:dpd:dpd:49147 (DPD packages all 3 readings — 'Sublime One' / 'Blessed One' / 'Fortunate One' — into one entry tagged as Buddha-epithet). 4. eb1g 'At' ghost: ghostKind 'required' → **'preposition_from_case'**. Specific reason: Pāli accusative-of-time-when (the -ṁ on ekaṁ and samayaṁ) surfaces as English preposition. Per ADR ghost gate, the specific kind is mandated over the catch-all. 5. packet.citations: 5 new DPD entries (eka×2, samaya×2, bhagavā×1). Citations now total 9 (4 phase-a + 5 phase-b). Curation log (docs/sutta-studio/curation/phase-b.md): - All sections filled with gate-2 amendments recorded in §8. - §7.10 new schema tension: RelationType lacks 'temporal' value. Phase-b's b2→b3 'Time WHEN' is logically temporal, currently encoded as 'location' as a hack. Proposed: add 'temporal' enum value + renderer palette color. Issue to file. - §7.11 cross-references phase-a §10.7 (EpistemicBasis missing 'grammatical' enum value — same gap surfaces here for the relation's basis). Verified: - JSON parses; field-level readback confirms every amendment landed - b3.isAnchor is None (NOT set); b2.isAnchor is True - Citations array has 9 entries - 44 sutta-studio component tests + utils tests pass - npx vite build succeeds (built in 18.65s — including packet bundling, no chunk-size regression) Tier-1 + first 2 phases curated + renderer arc (chunks 1+A+B+about panel + tooltip overflow fix) all on feat/opus-grounded-data-layer. PR #38 now has 11 commits since branching from main at cfdc48c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… principled tooltip-register check Aditya's pushback (paraphrased): "Why are those words forbidden? It feels fragile to have such blacklists. Can't we have some more principled approach?" The forbidden-words list (adverbial, deictic, cataphoric, niggahīta, neuter nom/acc singular, declensional ending, past participle, genitive, oblique, …) was a useful first heuristic but it has real failure modes: - Doesn't scale (new phases add new jargon never on the list) - Pattern-matches WORDS, not CONCEPTS the reader stumbles on - Forces euphemism that's sometimes worse than the original term - Has arbitrary cutoffs (accusative isn't on it; adverbial is) - Risks mode collapse: writers avoid the word rather than teach the concept Replaced with §3.4 Plain-Register Check — three criteria applied per-tooltip, contextually, not globally: 1. Reader profile (default): a thoughtful adult, no Pāli training, possibly familiar with popular Buddhism. Plain prose stands alone for this reader; other readers get the structured registers (grammar chip, audit modal). 2. Pay-rent rule: every technical term must answer (a) what concept does it label that the reader needs precision about and (b) why is precision needed here. If yes to both, keep it AND gloss it in the same sentence. If no, replace with plain English. Example: "accusative" pays rent (recurs across many phases, names a specific case-form). Example: "in the genitive, functioning adverbially" doesn't pay rent — replace with "the form 'of me' here works like 'by me'". 3. Register layering: plain prose is for the default reader; grammar chip allows any term (that's its job); audit modal carries provider-level technical language by definition. Plain prose doing the work of all three registers IS the failure mode. The old word list is preserved as DIAGNOSTIC EXAMPLES of the failure tone — not a rule. If the curator finds themselves reaching for one of those words in plain prose, it's a signal to pause and check whether plain English carries the load. Quick self-check before approving a tooltip: 1. Read aloud. Does the plain prose make sense to the default reader without the bracketed chip? 2. For each technical term, can you answer the two pay-rent questions? 3. Is anything in plain prose that would be better in the grammar chip or audit modal? Slots after §3.3 Affordance Gate as a 4th gate. Cross-references FEATURES.md §6 (format prescription). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eferences Per Aditya's gratitude framing: every named source in About-this-text should resolve to a real public page where the reader can encounter it directly. Audit + acknowledgment meet in the link. Additive schema changes to Provenance: edition: + url? (canonical edition page) + digitalSourceUrl? (the digital pipeline, e.g., bilara-data GitHub) + licenseUrl? (CC deed) translation: + url? (publisher / SC sutta page) + licenseUrl? references?: new top-level Array<{label, url, note?, category?}> — a curated list of works this packet rests on, framed as acknowledgments. Category enum for renderer grouping: 'dictionary' | 'translation' | 'edition' | 'manuscript-archive' | 'scholarly-reference' | 'commentary' | 'other' All fields optional → no migration; old packets still validate. Distinct from: - external[] — per-text registry links (BDRC, GRETIL, CBETA, …) - per-sense Citation rows — attest specific glosses with excerpts The renderer (next commit) will surface references with a gratitude-register header ("What this packet rests on") and link through to each upstream source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… named source becomes a place to visit Per Aditya's gratitude framing: the About panel was naming sources without showing their proofs. Each unlinked claim was a missed opportunity for both audit and acknowledgement. This commit gives every named source a real link the reader can follow. components/sutta-studio/demoPacket.json — packet.provenance: edition: + url (tipitaka.org), + digitalSourceUrl (github.com/ suttacentral/bilara-data), + license string translation: + url (suttacentral.net/mn10/en/sujato), + licenseUrl (creativecommons.org/publicdomain/zero/1.0/) references: 7 new entries with full attribution prose — 1. Digital Pāli Dictionary (DPD) — dpdict.net, Bryan Levman et al., CC BY-NC-SA 4.0, release v0.4.20260501 2. DPD source repository — github 3. SuttaCentral aggregated dictionary — /define 4. SuttaCentral bilara-data repo 5. Bhikkhu Sujato MN10 translation — direct sutta page 6. Vipassana Research Institute Tipiṭaka — tipitaka.org 7. SuttaCentral MN10 sutta page components/sutta-studio/AboutThisText.tsx: - New ExternalLink helper component with consistent styling - Edition section: name becomes a link (when url set); digital source line becomes a link; new license line with linked terms - Translation section: translator becomes a link; license becomes a link to its terms (CC0 deed) - New "What this packet rests on" section rendering references[]: - Italic gratitude-register intro: "Listed here as acknowledgement, not just citation. Each link goes to a real place the source lives — where you can meet the work on its own terms." - Per-reference: name as link + small mono-tag category chip (dictionary/translation/edition/etc.) + multi-line note in muted text - External references section unchanged in shape, now uses ExternalLink helper for consistency Verified via Playwright: - 12 outbound links rendering in the expanded panel - All hrefs reach the expected upstream pages (tipitaka.org, suttacentral.net, github.com/digitalpalidictionary/dpd-db, creativecommons.org, etc.) - 21 type-related tests + provider tests still pass Two-commit acknowledgment arc complete: 9b5b59c (schema extension for URLs + references) + this commit (content + renderer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d diff (awaiting gate) Compressed two-gate format: brief + snapshot + evidence + proposed diff in one log. Awaits gate verdict before packet edit. §0 Phase brief — Kurūsu viharati ("[was] dwelling among the Kurus") 3 tensions: locative-plural-as-preposition (primary; no ghost needed unlike phase-b's 'At'), historical-present (Pāli pres-tense rendered as English past), place-vs-people-vs-region (3 readings of the locative) §2 Evidence bundle - viharati: clean DPD coverage (cite:dpd:dpd:69661 + 69662) — both pr=present, finite. EXCELLENT structured morphology data. - kurūsu: **DPD STEM-STRIPPER BUG HITS AGAIN.** Resolved to 'kura [nt]: rice' (cite:dpd:dpd:22496) — totally wrong; kurūsu is locative plural of 'kuru' (the Kuru people), not the unrelated noun 'rice'. Second hit of the same conflation bug from phase-a (evaṁ → eva). DO NOT cite the kura entry. - Variants: zero for mn10:1.2 (line stable). §3 Proposed packet diff - c1 kurūsu: morph (number=pl on kurū stem; case=loc, number=pl on su suffix); relation 'Dwelling IN' gets confidence + basis; senses get nuance + basis but NO sourceCitationIds (provider bug). Honest grounding: 'etymological' (Pāli grammar inference). - c2 viharati: **isAnchor: true** (the action verb of the geographical-frame clause); morph on ti suffix (person=3, number=sg, tenseAspect=present, form=finite); 3 senses get cite:dpd:dpd:69661/69662 with confidence: high. - 2 new packet.citations (viharati's DPD entries). - No ghost upgrades (phase-c has zero English ghosts — locative case absorbs into sense gloss). §5 Schema tensions - Tension #1 (DPD stem-stripper conflation) hits 2nd time: phase-a evaṁ→eva, phase-c kurūsu→kura. 2 of 3 phases. Suggested: if phase-d hits a 3rd time, fix is overwhelming. Patch is small (~10 LOC in scripts/build-dpd.ts). - Tension #7 (EpistemicBasis 'grammatical' gap) hits 3rd time: phase-a (evaṁ + relation), phase-b (relation), phase-c (c1 senses + relation). 3 of 3 phases. Very strong signal. - No new tensions from phase-c. §6 Plain-register flag (not in diff): c2s3's only tooltip '[Thematic vowel] Class I verb marker' strips to empty when grammar-terms is off. Fails §3.4 check. Defer to plain-first tooltip rewrite chunk. §7 Open questions: - File the DPD stripper fix before phase-d, or batch later? - At what tension-hit count do we cut a small fix commit? Suggested heuristic: 5 hits OR after batch 2 complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ration Phase-a (evaṁ→eva) and phase-c (kurūsu→kura) both surfaced the same class of provider bug. Aditya: "more data before deciding." Two of three phases hit a flavour of this, so investigating now. Root cause was THREE distinct bugs, all in scripts/build-dpd.ts: Bug 1 — Niggahīta diacritic mismatch (root cause for evaṁ). DPD uses ṃ (U+1E43, m-with-dot-BELOW; ISO 15919). SuttaCentral bilara uses ṁ (U+1E41, m-with-dot-ABOVE; IAST). Same Pāli sound, different Unicode bytes. Direct lookup of bilara's 'evaṁ' against DPD's 'evaṃ' headword failed; the stem-stripper then fell through to the unrelated particle `eva` ('only/just/ indeed') — entirely different semantics from the adverbial evaṁ ('thus; in this way'). Fix: normalize DPD's ṃ → bilara's ṁ during parsing AND during surface-form extraction. Single source of truth. Bug 2 — Over-greedy 3-char endings 'ūsu' / 'ūhi'. These were listed as single morphological endings, but locative- plural is actually -su (with the long ū belonging to a sandhi- lengthened stem: kuru → kurū before -su). The over-greedy strip collapsed kurūsu → 'kur', then tried 'kur'+a = 'kura' (the noun 'rice', totally unrelated to the Kuru people). Fix: removed 'ūsu' and 'ūhi' from PALI_ENDINGS. Bug 3 — Missing bare 'su' / 'hi' endings + missing vowel-shortening. Once 'ūsu' was removed, kurūsu still didn't match because: - 'su' wasn't in the endings list at all (only 'esu' for a-stems) - even when stripped, kurū → kuru required vowel-shortening (the locative-plural rule lengthens stem-final vowels) Fix: added 'su' / 'hi' to endings. Added vowel-shortening logic: after stripping any case ending, if the stem ends in long ā/ī/ū, also try the short variant (kurū → kuru, bhikkhū → bhikkhu). Verified end-to-end re-ingestion: - evaṁ → ['evaṁ', 'eva'] PRIMARY: 'thus; this; like this; similarly; in the same manner; just as; such' (DPD's evaṃ, now normalized) — exactly the sense phase-a's curation correctly identified as the right reading. SECONDARY: bare 'eva' (still derivationally related; surfaced for transparency rather than hidden). - kurūsu → ['kurū', 'kuru'] PRIMARY: 'name of the people of Kuru; Kurus' (DPD's kurū entry, long-vowel stem) SECONDARY: 'name of a country' (DPD's kuru entry) Both real Kuru entries; the unrelated 'kura' (rice) is GONE. - Coverage: 81.6% → 86.5% on MN10 (462/534 surface forms; 26 newly-matched surfaces). - 76 surfaces still unmatched (was 98); remaining gaps are compounds the stem-stripper can't decompose. Implications for curation: - phase-a (8e7b197) intentionally did NOT cite DPD for evaṁ because the conflation made the citation misleading. Those citations are now available and HONEST. Backfill is optional enrichment work; not done in this commit. - phase-c gate-pending diff (b46aa64) similarly assumed no DPD for kurūsu. The kurū / kuru entries are now citable. Backfill is part of phase-c gate-2 amendments. Tension #1 (DPD stem-stripper conflation) is fixed. Cumulative hit count was 2/3 phases; future phases should be cleaner. Verified: 78 tests pass, Vite build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Third MN10 phase re-curated. 5 localized packet changes + 4 new citations. Aditya's gate-1 surfaced the DPD stripper bug; gate-2 became "fix the bug, then apply with the now-honest citations." Mid-curation provider fix shipped as c33b115 (three stripper bugs: niggahīta normalization, over-greedy -ūsu/-ūhi endings, missing bare -su/-hi + vowel-shortening). Coverage 81.6% → 86.5%. Packet changes (phase-c): 1. c1 (kurūsu): morph (number=pl on kurū stem; case=loc, number=pl on su suffix). Relation c1s2 → c2 "Dwelling IN" gets confidence high + epistemicBasis etymological (placeholder for 'grammatical' once enum extends — Tension #7). **DPD citations now real**: kurūsu correctly resolves to two distinct DPD entries (no longer the kura/rice misfire): - "among the Kurus" → cite:dpd:dpd:22524 (kurū masc — "name of the people of Kuru; Kurus") - "in Kuru territory" → cite:dpd:dpd:22502 (kuru masc — "name of a country") - "with the Kuru people" → cite:dpd:dpd:22524 (secondary use) All 3 senses: epistemicBasis 'lexical', confidence high/medium. 2. c2 (viharati): **isAnchor: true** (the action verb of the geographical-frame clause). Morph on ti suffix (person=3, number=sg, tenseAspect=present, form=finite) — exactly what DPD's pos=pr declares. 3 senses get cite:dpd:dpd:69661/69662 with confidence high: - "was dwelling" → :69661 (primary) - "was staying" → :69661 + :69662 (both senses contain this) - "was abiding" → :69662 (primary) 3. packet.citations: 4 new entries (kurū, kuru, viharati x2). Total now 13 (4 phase-a + 5 phase-b + 4 phase-c). Curation log: §8 records the mid-flight provider fix and how it changed the citation landscape (kurūsu went from "no DPD citation" to two real ones in the same draft). §9 Outcome filled. Tension #1 (DPD stripper) marked RESOLVED. Verified: - JSON parses; spot-checks confirm every field landed - c2.isAnchor = True (viharati anchor); c1 morph cases correct - 21 component + type tests pass - Vite build green Tier-1 done + first 3 MN10 phases curated + first provider quality fix shipped. Stack: feat/opus-grounded-data-layer on PR #38. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… link honest Phase-a (8e7b197) deliberately left a1.senses[0] (evaṁ "Thus") without a DPD citation because the stem-stripper at the time conflated evaṁ with the unrelated bare particle `eva`. The conflation was Tension #1 on the schema-tensions list — fixed in commit c33b115 (niggahīta normalization + endings-list fixes + vowel-shortening). DPD's evaṃ headword (now normalized to evaṁ in our index) carries exactly the sense phase-a's curation correctly identified: "thus; this; like this; similarly; in the same manner; just as; such" (cite:dpd:dpd:18134, ind). Backfill applied: a1.senses[0]: - epistemicBasis: 'etymological' → 'lexical' - sourceCitationIds: + ['cite:dpd:dpd:18134'] - confidence: 'high' (new) - notes: updated to reference DPD's own treatment of evaṁ/eva as distinct headwords (the curation's "do not confuse evaṁ with bare eva" framing is VALIDATED by DPD, not contradicted by it) packet.citations: + 1 new entry (cite:dpd:dpd:18134) Total packet.citations: 13 → 14 phase-a curation log §13 records the backfill. Verified: - JSON parses; a1 has lexical basis + citation + confidence - 21 tests still pass - Vite build will be triggered with next renderer change The "Do not confuse evaṁ with bare eva" tooltip on a1.s1 remains correct; DPD's separate headword treatment makes it MORE accurate post-fix than at original curation time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…curatorial' to EpistemicBasis Tension #7 surfaced in phase-a/b/c curation: claims grounded in syntactic/morphological rules were being labeled 'etymological' as the closest enum fit. But etymology is word-history (sandhi, cognate); these claims are GRAMMATICAL (agent-in-genitive-of-passive- participle, accusative-of-time-when, locative-as-location). Hit 3/3 phases — strong signal per the user's "more data first" rule. types/suttaStudio.ts - EpistemicBasis enum: added 'grammatical' + 'curatorial'. Now 7 values: etymological / grammatical / commentarial / contextual / lexical / comparative / curatorial. - Doc block explains the resolution history and what each new value covers ('grammatical' for syntactic rules; 'curatorial' for explicit inference grammatically grounded but not from a single attestation). components/sutta-studio/demoPacket.json - 3 placeholder usages migrated from 'etymological' → 'grammatical': * phase-a a2.s1.relation 'Heard BY' (agent-genitive-of-passive- participle pattern) * phase-b b2.s3.relation 'Time WHEN' (accusative-of-time-when) * phase-c c1.s2.relation 'Dwelling IN' (locative-as-location) - All 3 are syntactic rules, not etymology. Migration is honest; new enum value makes the basis accurate. - Zero remaining 'etymological' values in packet (all migrated; no legitimate word-history claims in phase-a/b/c yet). types/suttaStudio.test.ts - New test: EpistemicBasis enum round-trips all 7 values. Tension #7 closed at 3/3 phase hits. Cumulative schema-tensions resolved by today's work: #1 DPD stem-stripper conflation (commit c33b115) #7 EpistemicBasis 'grammatical'/'curatorial' (this commit) Verified: 14 types tests pass; types compile clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ed citations visible ADR SUTTA-008 §UI Vision #4 ("Why does this gloss say X?") promised a way to surface Sense.sourceCitationIds to the reader. 14 DPD citations have accumulated in packet.citations across phase-a/b/c, all invisible until now. Design choice — show only when pinned, not on hover: - Hover = reading flow → minimal noise - Pin = audit moment → source revealed Matches the existing pin-as-engagement model. components/sutta-studio/SuttaStudioView.tsx citationsById = useMemo lookup table from packet.citations. Threaded down to PaliWordEngine. components/sutta-studio/PaliWord.tsx - New citationsById prop. - In the segment render loop, resolve activeSense (segment-level senses take priority over word-level for compounds). - When activeSense.sourceCitationIds is non-empty, resolve via citationsById and pass as citations[] to Tooltip. - Imports Citation type. components/sutta-studio/Tooltip.tsx - New citations?: Citation[] prop. - When pinned AND citations.length > 0, render a footer below the main tooltip body: * "SOURCES" header in emerald-500/70 uppercase tracking-widest * Per-citation: short ref in slate-300 + italic excerpt in slate-500 (quoted) - Hidden entirely on hover (matches the audit-on-pin discipline). Verified via Playwright on /sutta/demo: - Click sutaṁ's ṁ segment → tooltip shows facet 2/2 ("In the formula 'me sutaṁ'...") plus SOURCES footer: DPD s.v. suta (pp — heard) "suta [pp]: heard" - Click eva segment → tooltip shows evaṁ deictic prose plus SOURCES footer with cite:dpd:dpd:18134 (the backfilled evaṁ entry). - Zero console errors. Net effect: the entire grounded-curation arc is now READABLE in the UI. Pin any Pāli segment whose word/segment senses have citations, see exactly which DPD entries support the gloss + the excerpt. 58 tests pass across providers, sutta-studio, types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…c session Captures the entire session arc — Tier-1 grounded data layer, 3 MN10 phases curated via the Grounded Curation Loop protocol, full renderer arc (anchor styling, pin model, click-cycles-facets, tooltip overflow flip, About-this-text panel with linked acknowledgments, citation chips), DPD bug fix, EpistemicBasis enum extension. WORKLOG.md - New top entry for 2026-05-12 with status, sources of truth, 25-commit arc summary, deferred-list, resume instructions. - Previous 2026-05-11 entries preserved. HANDOVER.md - Replaces prior handover (preserved in git history). - Sections: session summary, 25 commits categorized by arc, what landed (6 categories), pending threads (priority/effort ordered with context-cost notes), key context the next instance needs (curation rhythm, DPD bug pattern, pinned- tooltip discipline, schema-tension hit-count discipline, gratitude register house style), resume instructions. The "Pending threads" section is the load-bearing one — sorted high context-value first so a fresh agent can pick tasks aligned with their available context. Last commit before handover-and-stop per session-boundary discipline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gressions Adds 27 unit tests for the three root causes fixed in c33b115: 1. normalizeNiggahita — ṃ (U+1E43) → ṁ (U+1E41) conversion; idempotency; mixed-form handling; direct codepoint assertions. 2. PALI_ENDINGS — assertions that 'ūsu'/'ūhi' are absent (the over-greedy endings that conflated kurūsu → kura/rice) and that bare 'su'/'hi' are present (paired with vowel-shortening). 3. tryStemStrips — kurūsu produces 'kuru' (via vowel-shortening) and does NOT produce 'kur' or 'kura' (the bug-path); same pattern verified for bhikkhūsu; coverage of all three long vowels (ā→a, ī→i, ū→u); evaṁ direct-match preservation; quotative-tail handling. Refactor: build-dpd.ts now exports the pure helpers (normalizeNiggahita, PALI_ENDINGS, QUOTATIVE_TAILS, stripQuotative, tryStemStrips) and gates main() with a standard ESM Node entrypoint guard so importing the module during tests is side-effect-free. Verified: \`npm run build:dpd\` still runs correctly. Per HANDOVER §Pending #4: c33b115 shipped with only end-to-end verification. These unit tests catch regressions before they surface in coverage drops on the next phase's curation. Tests: 112 passed under \`services/providers\` + \`scripts\`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ings Phase-d surfaced the same stem-stripper conflation pattern that c33b115 fixed for -ūsu/-ūhi: kurūnaṁ (gen pl of Kuru) over-stripped to 'kur' and conflated with 'kura' (rice). Hit #3 in 3/3 batch-2 phases — crosses the threshold phase-c §5 set for "overwhelming case to fix the stripper." Parallel fix mirroring c33b115: - Removed 'ūnaṁ' and 'unaṁ' from PALI_ENDINGS (u-stem gen pl is vowel-lengthening + bare -naṁ, not a single 4-char ending) - Added bare 'naṁ' paired with the existing vowel-shortening rule - Kept 'ānaṁ' (a-stem gen pl IS a real single ending in standard analyses — dhammānaṁ = dhamm + ānaṁ, not dhammā + naṁ) Coverage: 86.5% → 86.9% on MN10 (+2 surface forms now resolved). kurūnaṁ now resolves to ['kurū', 'kuru'] — identical to phase-c's kurūsu, the correct DPD entries (dpd:22524 "name of the people of Kuru" + dpd:22502 "name of a country"). Tests: added 10 regression cases (PALI_ENDINGS membership assertions for ūnaṁ/unaṁ/naṁ/ānaṁ, tryStemStrips coverage for kurūnaṁ + bhikkhūnaṁ, a-stem gen pl preservation regression net). 37 pass under build-dpd.test; 20 pass under dpd.test against the rebuilt dataset. Resolves schema tension #1 — DPD stripper conflation. Both -su/-hi and -naṁ patterns now closed for u-stems. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…urūnaṁ nigamo Closes batch 2 of CURATION_PROTOCOL §6 (phase-b, phase-c, phase-d). Changes (one phase, one commit, per protocol): - d1 Kammāsadhammaṁ: isAnchor=true; d1s3 morph {number:sg, gender:n} (per gate-2 amendment: don't overclaim case — neuter sg has identical nom/acc forms; ambiguity noted in tooltip). 3 senses with separated epistemic basis: lexical (DPD dpd:20396) / curatorial (Jātaka-derived "Spotted-One Tamed" — note softened per amendment to make traditional derivation honest, not lexical-asserted) / etymological (compound parse). - d2 nāma: sense lexical + dpd:36427 (the naming-particle DPD entry, selected from 7 nāma homonyms). - d3 kurūnaṁ: d3s1 morph {number:pl}; d3s2 morph {case:gen, number:pl}; relation extended with confidence + epistemicBasis=grammatical ("Town OF" is case-derived). 3 senses lexical + REUSES phase-c citations (dpd:22524 kurū + dpd:22502 kuru) — same stem, new case. - d4 nigamo: d4s3 morph {case:nom, number:sg, gender:m}. 3 senses: market-town/township lexical (dpd:36863 + dpd:74785); "trading center" curatorial + low confidence per amendment (DPD doesn't attest it). - packet.citations: 14 → 18 (added dpd:20396, dpd:36427, dpd:36863, dpd:74785). 2 phase-c entries reused. Methodological win (per Aditya's framing): phase-d forces the system to separate four kinds of claim — lexical attestation, grammatical inference, traditional/commentarial etymology, curatorial pedagogical expansion. First phase to exercise the new 'curatorial' EpistemicBasis value (added in 4323310) for real, on the Jātaka derivation + trading-center expansion. Schema tensions: - #1 (DPD stripper conflation) — RESOLVED across both -su/-hi (c33b115) and -naṁ (be2b141, this session). All u-stem oblique plurals now correctly handled. - #7 (EpistemicBasis enum) — first real load on 'curatorial'; no laundering of curator inference as etymology. - No new tensions surfaced from phase-d. Curation log at docs/sutta-studio/curation/phase-d.md (gates, amendments, plain-register deferrals, open questions). Tests: 321/322 pass (1 flake in SessionInfo unrelated to phase-d; passes in isolation). Batch 2 complete → re-evaluate protocol before starting batch 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves docs/WORKLOG.md conflict: 1242e43's temporary "claim" entry (landed on main yesterday signalling this branch's work was in progress) is superseded by the comprehensive "done" entry produced this session. Also updates the done entry to reflect today's three additional commits: - b1b7fdb (DPD bug-fix unit tests — 37 regression cases) - be2b141 (DPD parallel fix — -ūnaṁ/-unaṁ removal, +0.4pp coverage) - b5a52a9 (phase-d re-curation, closes CURATION_PROTOCOL §6 batch 2) Total: 28 commits on the branch (29 including this merge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 11, 2026 16:32 View deployment

vercel Bot deployed to Preview May 11, 2026 16:39 View deployment

vercel Bot deployed to Preview May 11, 2026 16:46 View deployment

vercel Bot deployed to Preview May 11, 2026 16:50 View deployment

vercel Bot deployed to Preview May 11, 2026 17:00 View deployment

vercel Bot deployed to Preview May 11, 2026 17:07 View deployment

vercel Bot deployed to Preview May 11, 2026 20:25 View deployment

vercel Bot deployed to Preview May 11, 2026 20:33 View deployment

vercel Bot deployed to Preview May 11, 2026 20:51 View deployment

vercel Bot deployed to Preview May 11, 2026 21:24 View deployment

vercel Bot deployed to Preview May 11, 2026 23:36 View deployment

vercel Bot deployed to Preview May 11, 2026 23:41 View deployment

vercel Bot deployed to Preview May 11, 2026 23:59 View deployment

vercel Bot deployed to Preview May 12, 2026 00:05 View deployment

vercel Bot deployed to Preview May 12, 2026 01:17 View deployment

vercel Bot deployed to Preview May 12, 2026 14:21 View deployment

vercel Bot deployed to Preview May 12, 2026 14:23 View deployment

vercel Bot deployed to Preview May 12, 2026 14:27 View deployment

vercel Bot deployed to Preview May 12, 2026 15:59 View deployment

vercel Bot deployed to Preview May 12, 2026 16:53 View deployment

vercel Bot deployed to Preview May 12, 2026 17:10 View deployment

vercel Bot deployed to Preview May 12, 2026 17:21 View deployment

vercel Bot deployed to Preview May 12, 2026 17:28 View deployment

vercel Bot deployed to Preview May 12, 2026 17:33 View deployment

vercel Bot deployed to Preview May 12, 2026 17:36 View deployment

anantham and others added 4 commits May 12, 2026 13:53

anantham marked this pull request as ready for review May 12, 2026 18:33

vercel Bot deployed to Preview May 12, 2026 18:34 View deployment

anantham merged commit 3880174 into main May 12, 2026
4 checks passed

anantham deleted the feat/opus-grounded-data-layer branch May 12, 2026 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sutta-studio): provider abstraction + Citation extension (Tier-1 commit A of 5)#38

feat(sutta-studio): provider abstraction + Citation extension (Tier-1 commit A of 5)#38
anantham merged 31 commits into
mainfrom
feat/opus-grounded-data-layer

anantham commented May 11, 2026

Uh oh!

vercel Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantham commented May 11, 2026

Summary

What's in

What's not in (intentionally)

Test plan

ADR alignment

Uh oh!

vercel Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 11, 2026 •

edited

Loading