Skip to content

refactor(db): remove legacy indexedDB facade#2

Merged
anantham merged 22 commits into
mainfrom
feature/import-improvements-and-flaggable-ops
Nov 22, 2025
Merged

refactor(db): remove legacy indexedDB facade#2
anantham merged 22 commits into
mainfrom
feature/import-improvements-and-flaggable-ops

Conversation

@anantham
Copy link
Copy Markdown
Owner

@anantham anantham commented Nov 18, 2025

Summary

  • Decompose ChapterView/SettingsModal into modular components and hooks with dedicated tests.
  • Normalize image version deletion, auto-hydrate image state on navigation, and backfill translation metadata snapshots.
  • Add Gemini 3 Pro image preview pricing and expose strict XHTML sanitizer; polish SessionInfo layout and log work.

Root Cause (if bug fix)

Missing translation metadata snapshots and absent image hydration after navigation caused delete controls to vanish and retranslation prompts to misfire; legacy image version shapes also broke deletion across refreshes.

Changes

  • components/chapter/, hooks/use for reader extraction; tests/components/chapter/; tests/hooks/
  • services/navigationService.ts, services/db/operations/maintenance.ts, store/bootstrap/initializeStore.ts, store/slices/chaptersSlice.ts
  • services/db/operations/imageVersions.ts, store/slices/imageSlice.ts, tests/services/db/ImageOps.test.ts
  • components/settings/*, hooks/useAdvancedPanelStore.ts, useAudioPanelStore.ts, useExportPanelStore.ts, useProvidersPanelStore.ts, useNovelMetadata.ts
  • config/constants.ts, config/costs.ts, services/translate/HtmlSanitizer.ts, components/SessionInfo.tsx, docs/WORKLOG.md

Testing

  • npx tsc --noEmit
  • npm run test -- run tests/components/chapter
  • npm run test -- run tests/hooks/useTranslationTokens.test.tsx tests/hooks/useFootnoteNavigation.test.tsx tests/hooks/useChapterTelemetry.test.tsx
  • npm run test -- run tests/components/diff/ChapterView.mapMarker.test.tsx
  • npm run test -- run tests/services/db/ImageOps.test.ts
  • npm run test -- run tests/store/bootstrap/bootstrapHelpers.test.ts
  • npm run test -- run components/settings

Review Checklist

  • No direct commits to main
  • Follows commit format
  • Ready for Codex review
  • No unrelated changes mixed in

Aditya A P and others added 9 commits October 28, 2025 08:29
MOTIVATION:
- Manual browser testing was time-consuming and error-prone
- Previous work fixed a deadlock issue in ensureChapterSummaries()
- Need automated testing to prevent regression of initialization issues
- Schema drift errors and re-entrant openDatabase() calls were causing hangs

APPROACH:
- Installed Playwright for browser automation testing
- Created comprehensive E2E test suite covering:
  * Fresh install initialization (clearing IndexedDB)
  * Database schema verification (all 10 stores present)
  * Deadlock detection (re-entrant openDatabase calls)
  * Prompt template initialization
  * Existing database upgrades
- Added debug logging throughout indexeddb.ts initialization sequence
- Fixed database name mismatch (LexiconForge → lexicon-forge)
- Updated page.reload() to use waitUntil: 'domcontentloaded'

CHANGES:
- playwright.config.ts: New Playwright configuration for E2E tests
- tsconfig.playwright.json: TypeScript config for Playwright
- tests/e2e/initialization.spec.ts: 5 comprehensive initialization tests
- tests/e2e/debug-console.spec.ts: Debug test for console log capture
- tests/e2e/novel-library-flow.test.tsx: Moved to tests/ (was Vitest, not Playwright)
- package.json: Added Playwright scripts (test:e2e, test:e2e:ui, test:e2e:debug)
- package-lock.json: Added @playwright/test@^1.56.1
- services/indexeddb.ts: Added [DEBUG:*] console logging to trace initialization
- config/constants.ts: Fixed JSON import with 'with { type: json }' attribute

IMPACT:
- E2E tests can now automatically verify database initialization
- Debug logging reveals exact initialization sequence and timing
- Tests confirmed app initializes successfully in ~5 seconds
- Tests currently failing due to console message capture timing issues
- Foundation laid for automated regression testing

TESTING:
- Manual verification: App loads successfully with debug logs visible
- Debug test (debug-console.spec.ts): PASSES - shows full init sequence
- 5 initialization tests: Currently FAILING (console capture issue)
- Next: Fix console message capture timing to make tests reliable

KNOWN ISSUES:
- Tests set up console listeners after page navigation, missing early logs
- Console message array remains empty or incomplete
- Need to investigate React StrictMode double-rendering impact
- Tests timeout waiting for messages that may have already been logged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
MOTIVATION:
- New Playwright E2E testing infrastructure needs documentation
- Developers need guidance on running and debugging tests
- Document known issues and workarounds for future reference

APPROACH:
- Created docs/E2E-TESTING.md with complete testing guide
- Documented setup, configuration, and usage
- Listed all test suites and their purposes
- Explained debug logging system
- Documented known issues and future improvements

CHANGES:
- docs/E2E-TESTING.md: New comprehensive E2E testing documentation

IMPACT:
- Developers can quickly understand and use E2E tests
- Clear documentation of current status (working vs broken)
- Reference for debugging test failures
- Foundation for improving test infrastructure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Nov 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
lexicon-forge Ready Ready Preview Comment Nov 22, 2025 8:33am

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +365 to 373
static async storeEnhanced(enhanced: any): Promise<void> {
const chapter: Chapter = {
title: enhanced.title,
content: enhanced.content,
originalUrl: enhanced.canonicalUrl || enhanced.originalUrl,
canonicalUrl: enhanced.canonicalUrl,
nextUrl: enhanced.nextUrl,
prevUrl: enhanced.prevUrl,
chapterNumber: enhanced.chapterNumber,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve raw URLs when storing enhanced chapters

The modern backend’s storeEnhanced path assigns originalUrl as enhanced.canonicalUrl || enhanced.originalUrl. When both properties are present (the common case when normalizing a fetched chapter), this stores the canonical URL as the primary key and storeChapterModern only generates URL mappings for that value. The original/raw URL is never persisted, so subsequent lookups by the raw URL (e.g. findChapterByUrl, navigation to the original link after a reload, or imports that reference the raw URL) will fail because no mapping exists. The memory implementation still prefers enhanced.originalUrl, so the inconsistency is easy to miss. Prefer the raw originalUrl when available and fall back to the canonical URL to keep both URLs mapped and maintain backward compatibility.

Useful? React with 👍 / 👎.

…-for-chapters

fix(db): preserve raw enhanced chapter urls
@anantham anantham merged commit 5508c01 into main Nov 22, 2025
2 of 4 checks passed
@anantham anantham deleted the feature/import-improvements-and-flaggable-ops branch November 22, 2025 08:33
anantham added a commit that referenced this pull request May 8, 2026
…E-008 draft

Continues the investigation framework with two more issues investigated
end-to-end and the first proposed-ADR draft.

#11 (comparison panel follows chapter) — verdict: already fixed.
Commit 0c5162b (2026-04-10) added a useEffect in useComparisonPortal.ts
that dismisses on currentChapterId change. Authored manually by Aditya
with no agent transcript captured (a useful data point about archaeology
coverage: it works for agent-driven changes, blind to human-direct edits).
Test gap remains — no regression test for the dismissal behavior.

#2 (fan toggle restarts translation) — verdict: matrix prediction
partially refuted. The codebase has dual-layer in-flight guards (mediator's
shouldAutoTranslate check + handleTranslate's pendingTranslations.has
entry guard), so the completion-only-guards theme does NOT instance here.
The jit-vs-precompute theme partially instances at the settings-fingerprint
axis (any settings change forks the version tree, costing API calls).
Paused on user repro — without specific reproduction steps, can't tell if
a real bug exists or if user perceived a settings-side-effect. The theme
docs are updated to record the falsification.

CORE-008 draft (Derived Views Are Recomputed, Not Stored) lands at
issues/_themes/proposed-adrs/. Working draft only — not in docs/adr/
until Aditya ratifies. Names the umbrella principle that CORE-006,
FEAT-001, and FEAT-003 already apply at narrower scopes. Identifies the
v1-composite-as-raw-vs-derived question as the load-bearing decision
needed before drafting can complete.

Theme updates:
- completion-only-guards: removes #2 from instances; N=4 (was 5);
  documents the falsification as evidence the matrix is doing real work
- jit-vs-precompute: links to the new CORE-008 draft as leverage point
- _themes/README.md: adds a "Proposed ADRs" table tracking draft status

ADR audit findings (from earlier in this session) integrated into:
- issues/README.md status table — #1, #6, #9, #12 upgraded from spec-gap
  to ADR-vs-code drift (CORE-006 commits to "render shell immediately",
  FEAT-001 commits to "ensure *a* translation is available")
- issue #1's section 4 promoted from (A2,B2,C2) to (A1*,B2,C2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anantham added a commit that referenced this pull request May 8, 2026
…no push

Captures session state for the next instance. Four phases per the
expansion:handover skill:

  Phase 1 — Commit checkpoint: all session work in 16 commits on main,
            none pushed (per multi-agent CLAUDE.md rule). Pre-existing
            dirty files (Issues.md, WORKLOG.md, etc.) explicitly NOT
            committed — they predate this session.

  Phase 2 — Thread inventory: active threads (manual validation, #1,
            #16, skill-update Patch 6), blocked threads (#2 awaiting
            user repro, CORE-008 awaiting ratification, skill push
            awaiting authorization), deferred (useAsyncAction design,
            8 un-investigated issues).

  Phase 3 — Session learnings: project-level (investigation framework
            location, bridge service path), cross-project (the
            investigation pipeline pattern, useAsyncAction 3-shape
            taxonomy), skill update candidates (Patch 6, Patch 7),
            ADR candidates (CORE-008/009/010/011).

  Phase 4 — Handover document at docs/HANDOVER.md with resume
            instructions and 6 calibration moments.

Lives at docs/HANDOVER.md so next instance can read on session start.
Git-tracked so it survives compaction and persists across sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anantham added a commit that referenced this pull request May 8, 2026
User reported (2026-05-06 screenshot): two "Continue Reading" cards for
the same novel (Forty Millenniums of Cultivation), one showing Chapter
338 and the other Chapter 2; chapter count rendered as 6528 when the
registry declares 3521. Both root causes are duplicate IDB records that
the V2 dedup migration (repairScopedStableIdDuplicates) was supposed to
clean — but its one-time flag is set on existing databases, so any
duplicates created after V2 ran sit there forever.

Three layers of fix, defense in depth:

1. NovelLibrary.tsx — render-side dedup of Continue Reading entries.
   Group by novelId, keep the most recent lastReadAtIso. Catches any
   duplicates that slip past the migration or appear after it runs.

2. fetchNovelChapterCounts (services/db/operations/summaries.ts) —
   dedupe by (novelId, chapterNumber) before counting. OR translation
   status across duplicate rows. Falls back to stableId for summaries
   without chapterNumber. Pre-fix: FMC's 3521 chapters → 6528 displayed
   count due to 1.85x duplication. Post-fix: counts unique chapters.

3. MaintenanceOps.consolidateBookshelfDuplicates (V3) — new boot-time
   migration gated on bookshelfDedupedV3 flag. Reads bookshelf-state,
   groups by novelId, keeps the most-recent entry per novel, re-keys
   under buildLibraryScopeKey so legacy unscoped winners get pulled
   forward to scoped form using a sibling's versionId when available.
   Idempotent on clean state; runs once per database.

The render-side fixes (#1, #2) take effect immediately for ANY user.
The migration (#3) cleans the underlying IDB on next boot. Together
they cover both "fix the symptom now" and "prevent re-display."

Files:
- components/NovelLibrary.tsx (+~20 lines): dedupedBookshelfEntries
  derived map keyed by novelId, fed into continueReadingEntries.
- services/db/operations/summaries.ts: fetchNovelChapterCounts
  rewritten with seenByNovel Map<novelId, Map<dedupKey, ...>>.
- services/db/operations/maintenance.ts:
  - SETTINGS.BOOKSHELF_DEDUPED_V3 flag added
  - MaintenanceOps.consolidateBookshelfDuplicates static method
  - ~150 lines including JSDoc explaining why this is V3 and not
    inside the existing repairScopedStableIdDuplicates path
- store/bootstrap/initializeStore.ts: wired into the bootRepairs list
  after syncSummaries (last in chain so all prior backfills land first).
- tests/current-system/bookshelf-dedup.test.ts: 9 regression tests
  covering both new pieces (5 for the migration, 4 for the count dedup).

Verified:
- npx vitest run → 1227 pass, 16 skip (1218 baseline + 9 new tests, no
  regressions)
- npx tsc --noEmit clean for changed files (3 pre-existing errors on
  main unchanged)

What this does NOT do:
- The duplicate chapter records themselves (the source of the inflated
  count) are NOT removed from IDB. That's a heavier migration that
  needs careful canonical-row selection (which version's stableId
  wins?) and reference repair (translations, feedback, amendment
  logs all key by stableId). Defer until a separate investigation.
  This fix counts correctly even with the duplicates present.
- Same for the duplicate "Chapter 339" entries with different
  capitalization in the dropdown — separate stableId-generation
  cleanliness bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anantham added a commit that referenced this pull request May 11, 2026
…1 commit B.1)

Per ADR SUTTA-008 §Build order step 3, lands the DPD data layer for
MN10. Provider impl follows in B.2; compiler wiring in B.3.

ms-dpd vs full dpd-db decision:
  ms-dpd is verb-blind — its inflection table has zero verb conjugation
  rows, only declensions. For the assasati verb family central to MN10
  this kills it. Using full dpd-db.

Storage strategy (resolved during this spike — ADR §Open Questions #2):
  Per-sutta subsets, not full corpus. Full DPD export is 80-120MB JSON;
  committing that for one sutta is disproportionate. The script
  extracts only headwords referenced by surface forms in the target
  sutta. Total committed for MN10: ~656KB. Each new sutta adds its own
  subset directory.

Surface→lemma resolution:
  Heuristic stem-stripping over dpd.txt (the 4MB human-readable
  release; no SQLite required). Initial pass: 34%. After parser fix
  for single-digit homonyms (DPD uses both "me 1" and "a 1.1" styles):
  81.6% coverage on MN10 (436/534 surface forms). Remaining 18% are
  mostly compounds (sammāsambuddhassa, ajjhattikabāhiresu) and
  inflected verb forms that live in DPD's SQLite inflection table.
  Documented as unmatchedSurfaces in manifest.json; SQLite escalation
  is a future commit if curation needs higher coverage.

Files:
  - scripts/build-dpd.ts — Node TS, no native deps. Downloads pinned
    DPD release dpd-txt.zip (4MB) on first run, caches in data/_raw/
    (gitignored), parses to structured DpdRecord, fetches bilara MN10
    Pāli root, extracts surface forms, resolves via stem-stripping +
    quotative marker handling + locative→stem restoration. Projects
    to LexiconEntry shape per the providers types added in commit A.
  - data/dpd/mn10/headwords.json (618KB) — lemma → LexiconEntry[]
  - data/dpd/mn10/forms.json (20KB) — surface → lemma candidates
  - data/dpd/mn10/manifest.json — coverage stats + unmatched surfaces
  - data/LICENSE-DATA.md — CC BY-NC-SA 4.0 with DPD attribution +
    placeholders for VRI / bilara / future providers
  - .gitignore — data/_raw/ added (upstream zip + extracted txt)
  - package.json — `npm run build:dpd` script entry

Pinned release: dpd-db v0.4.20260501 (May 2026). Re-ingest with
`npm run build:dpd -- --force` after bumping DPD_RELEASE_TAG in the
script. Monthly cadence upstream.

Verified: 42 pre-existing provider tests still pass. No app code
changed in this commit; the DpdProvider that consumes this data
lands in B.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anantham added a commit that referenced this pull request May 13, 2026
…ū bhagavato paccassosuṁ

The monks' response to the Buddha's call — 6 words, 10 segments. Three
pedagogical firsts surfaced:

  - Same surface 'bhikkhū' in its THIRD grammatical role across e/f/g
    (acc-pl / voc-pl-via-Bhikkhavo / nom-pl). Same noun, three contexts
    disambiguate. g4s2's cross-reference facet teaches the pattern
    explicitly.
  - bhagavā stem in GENITIVE (1st gen-bhagavā vs phase-b/e's nom-sg).
    g5s2's case-contrast facet teaches nom→gen by reference. Genitive
    functions as dative for verbs of speaking — Pāli quirk worth the
    arrow.
  - Aorist 3PL (paccassosuṁ -suṁ) vs phase-e's aorist 3sg (āmantesi
    -si). Same tense, different number. Verb-side number marking.

Changes:
  - g1 Bhadante: morph voc/sg/m. Senses curatorial + etymological
    (Bhadante isn't in DPD Lookup — coverage gap like Bhikkhavo phase-f;
    grounded in bhadanta lemma).
  - g2 ti: cross-reference to phase-f's ti; lexical + dpd:30431 REUSED.
  - g3 te: color-explanation facet + morph nom/pl/m + cross-reference to
    phase-e's tatra (same demonstrative system); lexical + dpd:31134.
  - g4 bhikkhū: morph nom/pl/m. 3-CASE CROSS-REFERENCE FACET on g4s2.
    DROPPED "Replied BY" relation (tension #12 — subject-of-verb doesn't
    fit the case-quirk-only palette). 3 senses lexical + dpd:49885 REUSED.
  - g5 bhagavato: morph gen/sg/m. KEPT "Replied TO" relation with
    epistemicBasis='grammatical' (legitimate dative-recipient reading —
    Pāli gen-as-dative for verbs of speaking; DPD attests both readings).
    Case-contrast facet on g5s2. 3 senses: lexical (dpd:49136 dative +
    dpd:49137 genitive).
  - g6 paccassosuṁ: isAnchor=true (verb of the response). morph
    aor/3pl/finite on g6s2. 3 senses lexical + dpd:39413. Plain-first
    explanation of paṭi+su compound logic ("reply IS hearing-back").

Schema tension #12 (S-V-O palette gap) — HIT #2 across batch 3:
  Phase-e dropped two relations (both mis-shaped); phase-g drops one
  (subject-of-verb) and KEEPS one (dative-recipient).

  ARROW-EARNING RULE clarified through curation: relations earn their
  arrow when the Pāli case-marker does work English doesn't have an
  analog for. Genitive-as-dative ✓ (Pāli quirk). Subject-of-verb ✗
  (universal). Hit count 2/3 batch-3 phases qualifies per §3.3 to file
  as GH issue / FEATURES.md §1.3 clarification (separate commit).

Refrain status update:
  - bhikkhu: 4/6 phases (definitively recurring)
  - bhagavā: 3/6 phases (definitively recurring, multiple cases now)
  - viharati: 1/6 (single appearance)

EpistemicBasis distribution this phase: 13 lexical / 1 etymological /
1 curatorial = 86% lexical (highest yet — confirms the DPD root-cause
fix is delivering clean evidence in subsequent curation).

Curation log: docs/sutta-studio/curation/phase-g.md (§0-§8). Includes
proposed §3.4.X cross-phase facet rule and the arrow-earning rule
clarification.

Batch 3 progress: 3/4 phases complete (e ✓ / f ✓ / g ✓ / phase-h).

Tests: 220 component pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant