Skip to content

fix + diagnostic: native DB download, correct startup ordering, JS error handler#1529

Merged
CraigBuckmaster merged 4 commits into
masterfrom
diagnostic/js-error-handler
Apr 19, 2026
Merged

fix + diagnostic: native DB download, correct startup ordering, JS error handler#1529
CraigBuckmaster merged 4 commits into
masterfrom
diagnostic/js-error-handler

Conversation

@CraigBuckmaster
Copy link
Copy Markdown
Owner

@CraigBuckmaster CraigBuckmaster commented Apr 19, 2026

Combined build — three crash hypothesis fixes + diagnostic safety net

This PR now bundles four logical changes that together test three strong crash hypotheses in a single build, with a JS-error diagnostic as a safety net if none of the fixes resolve the crash.

The four commits

1. Global JS error handler diagnostic (058c01c7)

Hooks ErrorUtils.setGlobalHandler to capture JS errors upstream of ExceptionsManager, writes to Documents/last-js-error.json synchronously, chains to the prior handler so RN default + Sentry still run. On next launch, App.tsx fires displayLastCrashIfAny() which pops a native Alert.alert with the captured error + stack preview. Does NOT gate render tree, does NOT do async work before first paint — structurally incapable of producing the build-19 black-screen regression.

2. Post-download init error surfacing (71e1a8ae)

App.tsx's post-download onComplete now throws when initDatabase() returns non-'ready' or throws. DbDownloadScreen awaits onComplete and surfaces errors through its existing retry UI. Previously, a failed post-download init silently transitioned to the main app tree with a broken DB, guaranteeing downstream crashes.

3. Native download + PRAGMA integrity_check for large DBs (cherry-picked from #1531)

Likely the biggest contributor to the crash. Two memory eliminations rolled into one:

  • XHR responseType='arraybuffer' kept the entire ~100MB payload in JS heap even after the chunked write finished. Now File.downloadFileAsync streams directly to disk for payloads ≥ 25MB. XHR path retained only for small payloads where fine-grained progress is useful.
  • verifyChecksum() was loading the downloaded file via base64atobUint8Array — ~400MB peak JS heap for a 100MB DB. Replaced with db_meta.content_hash + PRAGMA integrity_check — kilobytes, not hundreds of megabytes.

Build 18 crashed ~15s after launch with a signature matching OOM-during-XHR-response processing. This fix addresses both memory sources.

4. Startup hydration ordering (cherry-picked from #1531)

App.tsx previously ran initUserDatabase(), Sentry binding, and all three store hydrations PLUS pruneEvents() even when scripture.db wasn't yet present (needs_download status). That guarantees downstream errors (pruneEvents runs SQL against the missing scripture.db). The catch-all try/catch swallowed those errors silently but left stores in partial-hydration state — which could manifest as a crash later.

Now: on needs_download, app short-circuits to DbDownloadScreen immediately. Full hydration runs ONCE — either during normal init (when DB was already ready) OR inside the post-download onComplete callback (after first-launch download). Extracted to hydrateAppState() at module scope so both paths share the same sequence.

What's NOT included from #1531

Three commits from PR #1531 were held back to keep this PR's scope reviewable:

  • c367db92 — reuse live DB handle in getInstalledVersion() (legit optimisation but doesn't address first-launch crash)
  • 5b0c070d — close live DB before OTA swaps
  • bf17eaee — reopen DB after swap succeeds/fails

The last two together are correct but introduce DB lifecycle complexity. Every caller of getDb() becomes a potential stale-reference bug if anything caches the reference across an update. Worth doing — in a follow-up PR with time for proper review. They address the OTA update path, not the first-launch crash we're chasing.

The hypothesis test

If build 20 launches cleanly → one or more of commits 2-4 fixed it. Diagnostic never fires.

If build 20 still crashes → the diagnostic captures the actual JS error and the next launch shows a native Alert. We finally see what's throwing. At that point we know whether to chase the OTA path, native-layer issues, or something else entirely.

Either way: one build, definitive answer.

Files changed

  • app/src/utils/crashHandler.ts (new, ~150 LOC)
  • app/src/services/ContentUpdater.ts (native download path, integrity_check verification)
  • app/src/screens/DbDownloadScreen.tsx (async-aware onComplete, surfaces errors)
  • app/App.tsx (crash-handler call, hoisted hydrateAppState, correct status gating, full hydration in post-download)
  • app/index.ts (crash handler install)
  • app/__tests__/services/ContentUpdater.test.ts (PRAGMA integrity_check mock chain added to 6 tests)

Verification

  • Typecheck: clean
  • Lint: 0 errors (7 pre-existing/intentional warnings)
  • ContentUpdater tests: 36 pass (including 3 chunked-write regression tests)
  • Broader suite: 30 suites / 449 tests pass (DB, stores, services)

Supersedes

After crash is fixed

Revert crashHandler.ts and its two call sites. The other three changes (native download, hydration ordering, onComplete hardening) stay — they're improvements regardless.

## Why
Builds 15/16/18 crash ~15s after launch with a signature that shows
the terminate path (RCTFatal → RCTExceptionsManager.reportException
→ void TurboModule NSException → abort) but NOT the originating JS
error. Sentry is installed but has captured zero events from
production iOS across all crashing builds — the 'smoke test' event
visible in the Sentry dashboard was a manual DSN-verification from
outside the app, not real telemetry.

PR #1525's startup-probe approach failed (reverted in #1528): it
gated the render tree and produced a worse symptom (black screen,
no crash log) than the crash it was trying to diagnose.

## Approach
Hook React Native's ErrorUtils.setGlobalHandler and write a tiny
JSON blob to disk on any JS error, upstream of ExceptionsManager.
On next launch, display via native Alert.alert.

ErrorUtils is installed by React Native's JS bootstrap before any
module loads — see Libraries/vendor/core/ErrorUtils.js comment on
line 41: 'this ErrorUtils must be defined (and the handler set)
globally before requiring anything.' So installing from the first
import in index.ts captures errors during module resolution too.

## Why this won't repeat the probe mistake
- Does NOT wrap App or any other component
- Does NOT gate the render tree
- Does NOT do any async work before first paint
- Display via native Alert.alert (renders on top of any screen)
- fire-and-forget from existing useEffect, no new render paths

## Chain design
When a JS error fires:
  1. Our handler runs first (captures error to disk, sync + small)
  2. We call the prior handler (RN default → ExceptionsManager)
  3. ExceptionsManager does whatever it does (likely what causes
     the NSException crash in the first place)
  4. Process dies — but our file is already on disk

On next launch: App.tsx init useEffect fires displayLastCrashIfAny()
in parallel with init, which pops an Alert if a file exists.

## What this build will tell us
Three possible outcomes:
  - App crashes, next launch shows Alert with JS error → we know
    the cause, fix it, done
  - App crashes, next launch shows no Alert → error is happening
    before our handler installs (i.e. during crashHandler's own
    require graph resolution). Rare but possible.
  - App doesn't crash → something incidentally fixed it, or the
    bug is heisen-style

## Files
- app/src/utils/crashHandler.ts (new, ~150 LOC)
- app/index.ts (+2 lines, 1 new import block)
- app/App.tsx (+1 import, +5 lines in existing init useEffect)

## Verified
- typecheck: clean
- lint: 0 errors, 7 pre-existing warnings + 1 intentional import
  ordering warning (crashHandler must load first)
- tests: 18 suites / 326 tests pass (DB, stores, ContentUpdater)

## Remove when done
This is a diagnostic module. Once we know the JS error and fix it,
revert this commit.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 19, 2026

Test Results

✅ All tests passed

Passed Failed Total
Tests ✅ 3432 ❌ 0 3432
Suites ✅ 465 ❌ 0 465

Coverage

Statements Branches Functions Lines

⏱️ Duration: 84.1s

## Bug
App.tsx's onComplete callback for DbDownloadScreen previously:

  try { await initDatabase(); }
  catch (e) { console.error('Post-download init error:', e); }
  setDbStatus('ready');

If initDatabase() threw, the catch logged it and then STILL transitioned
the app to 'ready'. A downstream DB read against the uninitialised DB
would then throw — and on iOS 26 that's exactly the kind of JS error
that manifests as the ExceptionsManager / NSException crash we've been
chasing on builds 15/16/18.

The return value was also ignored: even if initDatabase() returned
'needs_download' (indicating a corrupt/invalid downloaded file), the app
would still transition to 'ready'.

## Fix
- App.tsx onComplete: only transition to 'ready' when initDatabase()
  explicitly returns 'ready'. If it throws or returns 'needs_download',
  re-throw so DbDownloadScreen can display the error and offer Retry
  via its existing error UI.
- DbDownloadScreen:
  - Widen onComplete type to () => void | Promise<void>
  - Await onComplete in startDownload and catch any thrown error, routing
    it through the existing setStatus('error') + setError(...) flow.

## Relationship to the startup crash
This is a latent bug independent of the main build-18 crash, but it
could contribute to or mask that crash in some scenarios. Fixing it
eliminates one class of 'crash happens right after onComplete' as a
candidate explanation, so the diagnostic crashHandler added in this
same PR sees a cleaner picture when it fires.

## Credit
The bug was identified and the fix pattern proposed in code review.
Implementation uses the reviewer's cleaner unified try/catch structure
preserving the original error instance via .

## Verified
- typecheck clean
- 14 test suites / 246 tests pass (DB + ContentUpdater)
- No new dedicated DbDownloadScreen tests added in this PR — would
  require mocking ContentUpdater in a new helper; deferred as scope
  creep on a diagnostic branch. Typecheck enforces the interface.
@CraigBuckmaster
Copy link
Copy Markdown
Owner Author

Added commit 71e1a8a — onComplete post-download init fix

In code review, a separate bug was spotted in App.tsx's DbDownloadScreen#onComplete callback: if initDatabase() threw after download, the catch logged the error and still transitioned the app to 'ready', leading to a downstream broken-DB crash. The return value was also ignored — 'needs_download' would still set 'ready'.

Fix now on this branch:

  • App.tsx: only transition to 'ready' when initDatabase() returns 'ready'. Throw otherwise.
  • DbDownloadScreen: widen onComplete type to () => void | Promise<void>, await it, route any thrown error through the existing setStatus('error') + setError(...) flow so the user sees it and can Retry.

Relationship to the crash investigation: Latent bug independent of the main crash, but could contribute to or mask it. Fixing eliminates one candidate so the diagnostic crashHandler added in this PR captures a cleaner picture.

Tests: 14 suites / 246 tests pass (DB + ContentUpdater). No new DbDownloadScreen-specific tests — that would require mocking ContentUpdater in a new helper, scope creep on a diagnostic branch that needs to ship.

Credit: Bug identified + cleaner fix pattern proposed in review. Commit uses the reviewer's unified try/catch structure.


PR now contains 2 commits:

  • 058c01c7 — original crashHandler diagnostic
  • 71e1a8ae — onComplete post-download fix

Ready to merge + build.

@CraigBuckmaster CraigBuckmaster changed the title diagnostic: global JS error handler for startup crash investigation fix + diagnostic: avoid full-DB checksum memory spike + JS error handler Apr 19, 2026
@CraigBuckmaster CraigBuckmaster force-pushed the diagnostic/js-error-handler branch from f697b94 to 955be7f Compare April 19, 2026 22:58
@CraigBuckmaster CraigBuckmaster changed the title fix + diagnostic: avoid full-DB checksum memory spike + JS error handler fix + diagnostic: native DB download, correct startup ordering, JS error handler Apr 19, 2026
@CraigBuckmaster CraigBuckmaster merged commit 50a4f61 into master Apr 19, 2026
6 checks passed
@CraigBuckmaster CraigBuckmaster deleted the diagnostic/js-error-handler branch April 19, 2026 23:01
CraigBuckmaster pushed a commit that referenced this pull request Apr 20, 2026
…S5 teardown crash)

## The crash
TestFlight 1.0.7(20) segfaulted 3.5s after launch — EXC_BAD_ACCESS
on Thread 7, triggered by:

  exsqlite3_finalize
  sqlite3Fts5IndexClose
  fts5DisconnectMethod
  sqlite3VtabUnlock
  disconnectAllVtab
  sqlite3Close
  SQLiteModule.closeDatabase (expo-sqlite iOS, SQLiteModule.swift:489)

The call chain: SDK 54 expo-sqlite's closeAsync → sqlite3Close →
disconnectAllVtab → iterates every FTS5 vtab → fts5DisconnectMethod
→ fts5FreeVtab → sqlite3Fts5IndexClose → exsqlite3_finalize on a
stale/freed statement pointer → crash.

This is a teardown-path bug in FTS5 virtual-table cleanup that
manifests specifically when:
  1. The DB has FTS5 tables (scripture.db has them for verse/people search)
  2. PRAGMA integrity_check was run on the connection (which loads
     and validates every FTS5 index, leaving them in a state that
     doesn't clean up correctly)
  3. closeAsync is called on that connection

Our full-DB download verification path added by #1529 (cherry-pick
from #1531 commit 1) did exactly that: open temp DB, run content_hash
check, run integrity_check, close. The integrity_check was the
trigger.

## The fix
Remove the PRAGMA integrity_check from the full-DB post-download
verification path. Keep the content_hash check — it's sufficient
validation: the hash was computed at build time over the full file
contents, baked into db_meta at build time, and uploaded with the
DB to R2. A corrupt/truncated download is overwhelmingly likely to
fail at SQLite open (exception thrown before our read) or to miss
the db_meta row entirely.

This is less rigorous than integrity_check but avoids the teardown
crash. TODO: re-add proper integrity verification once expo-sqlite's
FTS5 teardown bug is resolved upstream, or migrate to streaming
SHA-256 during the chunked write.

## Scope
INTENTIONALLY LIMITED. The delta-apply path in applyDelta() has
the same latent PRAGMA integrity_check → close pattern (~line 234).
Left in place because:
  - Delta updates don't fire for first-launch users (the crash we're
    chasing is build-20 first-launch)
  - Delta integrity_check runs AFTER transactional SQL mutation, so
    it's more valuable as a corruption check than the post-download
    one was
  - Touching it expands scope beyond the confirmed crash site
  - If it ever DOES crash for a delta user, same fix applies; we'll
    know when we see the crash log

## Tests
- Removed 'returns failed when integrity_check fails after download'
  (asserted behavior that no longer exists)
- Removed 5 dead integrity_check mocks from full-DB tests where they
  were set up but would never be consumed now
- Delta-path tests untouched (still exercise integrity_check)
- 35 ContentUpdater tests pass (was 36, minus 1 removed)
- tsc clean, lint 0 errors

## Build budget
Build 21 will be the 2nd build on master since #1529 shipped. With
no other changes, failure mode cascade from the cascade in #1529:
  - If Alert shows on relaunch → a JS error got past the fixes.
    Handler captured it. We see it next launch.
  - If app reaches home screen → all three crash causes fixed.
    We revert the diagnostic and ship.
  - If app still crashes silently → exceptionally rare; a different
    native-layer issue we haven't hit yet.

## References
- Crash incident: 8C94684A-D8BE-45F5-B9A7-412450196AC4
- iOS Analytics log: CompanionStudy-2026-04-19-..-1.0.7(20)
- Supersedes PR #1533 (which kept the integrity_check call in place)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant