fix(sqlite): forward worker boot-debug logs to the main thread#288
Merged
Conversation
Contributor
|
✓ Changelog fragment found — thanks! |
The SQLite Web Worker's boot-debug diagnostics (`[xNet] sqlite op` per-op queue/exec timing and `[xNet] db stats @ open`) were emitted only in the dedicated worker's console, which the in-app Logs panel — a main-thread console tap — never captures. Every boot-stall capture/export therefore came back missing exactly the lines needed to localize the stall (exploration 0229). The worker now postMessages each boot-debug line under a dedicated discriminator key (no collision with Comlink RPC), and WebSQLiteProxy re-emits it on the main console where the Logs panel captures it. Gated by `xnet:boot:debug`; no effect on normal operation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
|
Preview removed for PR #288. |
18836d5 to
b0cd77c
Compare
crs48
added a commit
that referenced
this pull request
Jun 26, 2026
Implements **Wave 1** of exploration [0233](docs/explorations/0233_[_]_THE_15_SECOND_COLD_FIRST_QUERY_OPFS_PAGE_IN_AND_DB_BLOAT.md) — the cheap, safe cold-read mitigations. The boot-debug trace (enabled by #288) finally named the culprit in one line: ``` [xNet] sqlite op query {"lane":"interactive","queueMs":0,"execMs":15858} ``` One landing query, **15.8 s of real execution**, `queueMs:0` — and every op after it `execMs:0`. That's a cold OPFS page-in of the first query's working set on the single SQLite worker; the 256 MB page cache only helps *re-reads*, so it never touched the first read. The file is large/fragmented (a 424k-row append-only change log + 1.3M-row scalar index grown interleaved), so the cold read faults pages scattered across it. ## What's in this PR - **B1 — `PRAGMA mmap_size`** ([web.ts](packages/sqlite/src/adapters/web.ts)): memory-map reads so the first cold query faults via the OS instead of thousands of synchronous 8 KiB reads. Guarded — a no-op where `opfs-sahpool` doesn't support it. This is the lever for the first read. - **A3 — one-time idle `VACUUM`** ([db-vacuum.ts](apps/web/src/lib/db-vacuum.ts)): defragments the OPFS file so subsequent cold boots fault a smaller, denser working set. Mirrors the 0229 presence cleanup (one-shot, localStorage-gated, idle-scheduled, off the critical path, never throws). Logs file size before/after — which also captures the `db stats` measurement the doc asked for. - **C2 — defer the prewarm fan-out** ([WorkingSetPrewarm.tsx](apps/web/src/components/WorkingSetPrewarm.tsx)): mounts the five prewarm subscriptions one animation frame late so the active route's own query reaches the worker first and is the one that pays the cold read and paints. ## Deliberately deferred (separate, focused PRs) The doc is a multi-wave plan that **gates later waves on measurement** and touches the protocol kernel — implementing them blind here would be reckless: - **Capture `db stats`** / **instrument the ~4.4 s secondary gap** — runtime observations; A3 now logs the file size, and the rest needs a live capture against the user's OPFS data. - **C1 instant-shell snapshot** — a new persistence + write-through + render path; its own feature. - **A1 change-log compaction** + **A2 split `changes` DB** — `changes` is the signed, hash-chained protocol kernel (0200, golden vectors). Pruning it needs a spec + cross-impl test updates. - **D1 WASM stream-compile** (~1 s, secondary) and **D2 hub redeploy** (ops; clears the orthogonal `INVALID_HASH`). The 0233 doc stays `[_]` (3/10 implementation items done) — Wave 1 lands here; the architectural waves follow with measurements in hand. ## Test plan - New `db-vacuum.test.ts` (3 cases: opfs vacuum+latch, memory-skip-no-latch, flag no-op). - `@xnetjs/sqlite` + `apps/web`: typecheck clean, full suites green (677 tests in touched areas). - Pre-push full-suite "failures" were a `core.bare` flip from the hook's own `pnpm install` breaking devkit git tests — all pass with healthy git; none are in touched code. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A returning user with a fully populated local cache still waits ~18–20s before any data renders. Exploration [0229] proved this is one operation monopolizing the single SQLite worker thread at boot — every landing query queues behind it and drains together (identical
candidateQueryDurationMs ≈ 19765). The cache is fine; the hub is fine (connects in ~350ms).0229 shipped the instrumentation to finally name the culprit op — a per-op
queueMs/execMstracer and one-shotdb stats @ open(file size / page_count / freelist) in the SQLite worker, gated byxnet:boot:debug. But those lines are emitted viaconsole.*inside the dedicated worker, and the in-app Logs panel only taps the main-thread console (useLogsPanel.ts). So every capture/export came back missing exactly the lines needed to localize the stall — a big reason it's taken multiple attempts.What
boot-log-bridge.ts: purebootLogMessage(args)/readBootLogArgs(data)helpers using a dedicated discriminator key (__xnetSqliteBootLog) that can't collide with a Comlink RPC message on the same port.web-worker.ts: anemitBootLog(...)helper logs to the worker console andpostMessages the line to the main thread. The two boot-debug sites ([xNet] sqlite op,[xNet] db stats @ open) now route through it.web-proxy.ts: the proxy listens for forwarded messages and re-emits them on the main console, where the Logs-panel tap captures them. Non-boot-log messages are left untouched for Comlink.Gated entirely by
xnet:boot:debug; no effect on normal operation. After this, enabling boot debug and reloading putsdb stats @ openand thesqlite op {queueMs, execMs}lines into the in-app Logs panel and its export — so the next capture can finally name the ~19.7s op and confirm whether the file is still bloated.Test plan
boot-log-bridge.test.ts(7 cases): round-trip, arg order/types, structured-clone (postMessage) survival, ignores Comlink-shaped + malformed messages.@xnetjs/sqlite: typecheck clean, full package suite green (98 passed).dashboard-runtime.test.tsx(awaitForlive-query flake, unrelated to sqlite) — passes on retry.🤖 Generated with Claude Code