Skip to content

refactor: DB lifecycle hardening during OTA content updates#1532

Closed
CraigBuckmaster wants to merge 4 commits into
masterfrom
refactor/db-lifecycle-during-updates
Closed

refactor: DB lifecycle hardening during OTA content updates#1532
CraigBuckmaster wants to merge 4 commits into
masterfrom
refactor/db-lifecycle-during-updates

Conversation

@CraigBuckmaster
Copy link
Copy Markdown
Owner

@CraigBuckmaster CraigBuckmaster commented Apr 19, 2026

What

Hardens DB lifecycle handling during OTA content updates. Four commits: three cherry-picked from the original PR #1531 (after #1533's defer idea was evaluated and deferred to phase 2), plus a phase-1 safety valve.

Why this is a separate PR from #1529 / #1534

#1529 and #1534 address the first-launch crash. This PR addresses bugs that bite users on subsequent launches during OTA content updates (delta apply, full-DB swap). Different code path, different failure mode, separate review.

Commits

  1. 8fbcfba5 — Reuse live DB handle in getInstalledVersion()
    Adds getDbIfInitialized() to database.ts. ContentUpdater.getInstalledVersion() now reuses the live handle instead of opening a second connection, avoiding potential handle invalidation.

  2. a8e2a92f — Close live DB connection before OTA swaps
    Adds closeDatabaseConnection() to database.ts. ContentUpdater closes the live handle before delta-apply and full-DB-swap paths, so file replacement doesn't happen while SQLite still has the old file open.

  3. 0ff1e019 — Reopen DB after updater closes live handle
    Pairs with Merge pull request #1 from CraigBuckmaster/codex/review-recent-master… #2. Tracks whether a live handle was closed, then reloadDatabase() after success OR as best-effort recovery on failure.

  4. 74174ec0 — Close-failure safety valve + Option B TODO (NEW)
    Hardens Merge pull request #1 from CraigBuckmaster/codex/review-recent-master… #2 against the case where closeAsync itself throws. The original implementation swallowed close errors silently; this PR makes closeDatabaseConnection() return a boolean, and updates ContentUpdater to abort the swap (returning up_to_date) if close failed. Also adds a doc block describing Option B (defer-to-cold-start) as the long-term architectural target.

Why the safety valve matters

If closeAsync throws, the native SQLite handle may or may not be released. Proceeding to move a temp file over a DB file that SQLite still has open can cause lock errors — or, on iOS 26 with FTS5 tables, the teardown segfault from PR #1534. Better to fail an OTA update and try again next launch than to corrupt the DB file.

Long-term direction (Option B)

The close-and-reopen approach this PR implements is phase 1. Phase 2 (a separate future project) is to defer OTA applies to cold-start, doing the file swap before initDatabase() ever opens the DB. That eliminates the concurrent-access problem by design. Documented inline above ContentUpdaterService with full context on the tradeoff.

PR #1533's original defer-OTA idea informed this — but it cannot be implemented as-is because the DB is always open after startup, so a naive defer-on-live-db check would mean updates never fire.

Tests

  • 41 ContentUpdater tests pass (39 + 2 new safety-valve tests)
  • Broader sweep: 30 suites / 454 tests pass
  • Typecheck clean, lint 0 errors

Sequencing

Recommend merging after build 21 from #1534 has been validated. Current priority is confirming the FTS5 teardown crash is resolved before layering more DB lifecycle changes.

Closed as obsoleted

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 19, 2026

Test Results

✅ All tests passed

Passed Failed Total
Tests ✅ 3436 ❌ 0 3436
Suites ✅ 465 ❌ 0 465

Coverage

Statements Branches Functions Lines

⏱️ Duration: 77.6s

…tion

## Safety valve
Previously, closeDatabaseConnection() swallowed errors from closeAsync
and always set db = null, reporting success. If the native SQLite close
actually failed (e.g., a statement still in flight), the subsequent
file-swap would then move a temp file over a DB that SQLite still held
open — leading to lock errors or, on iOS 26 with FTS5 tables, the
teardown crash that PR #1534 just fixed for a different code path.

Now closeDatabaseConnection() returns Promise<boolean>:
  - true  : close succeeded OR db was already null (safe to swap)
  - false : closeAsync threw (NOT safe to swap)

ContentUpdater.applyDelta and .downloadFullDb both check the return
value. On false, they:
  1. Log a warning
  2. Cleanup any temp files (full-DB path)
  3. Return { status: 'up_to_date' }

The next OTA check will try again. Failing an update is strictly
better than corrupting the DB file.

## Option B TODO
Added a 30-line documentation block above ContentUpdaterService
explaining the long-term architectural direction: defer-to-cold-start
instead of close-and-reopen. Rationale:
  - The ONLY fully-safe time to swap the DB file is when no SQLite
    connection exists at all.
  - That condition naturally occurs during app cold-start, before
    initDatabase() runs.
  - Current close-and-reopen creates that condition artificially mid-
    session, which is fragile (stale getDb() refs, in-flight query
    errors).
  - Defer-to-cold-start trades 'updates land immediately' for
    'updates land on next launch', which is acceptable for scholarly
    content updates.

This PR ships phase 1 (close-and-reopen with safety valve). Phase 2
(full cold-start-swap architecture) is a separate project tracked as
a TODO comment. PR #1533's original 'defer OTA when live DB open'
idea informed this direction but cannot be implemented as-is (the
DB is always open after startup, so defer-on-live-db would mean
updates never fire).

## Tests
- Added 2 focused tests covering the safety valve:
  - aborts delta apply when live DB close fails
  - aborts full DB swap when live DB close fails
  Both assert status=up_to_date AND that the swap/backup was NOT
  attempted (mockFileOps.move / copy).
- Updated mock default return values for closeDatabaseConnection
  from undefined to true (matches new signature).
- 41 ContentUpdater tests pass (was 39 + 2 new).
- Broader sweep: 30 suites / 454 tests pass.
- Typecheck clean, lint 0 errors.

## Not in scope
Cold-start-swap implementation is out of scope for this PR — that's
a separate project. This PR keeps #1532's original close-and-reopen
strategy and hardens it; it does not replace it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant