Skip to content

fix(sqlite): forward worker boot-debug logs to the main thread#288

Merged
crs48 merged 2 commits into
mainfrom
claude/objective-brattain-1c1a7f
Jun 26, 2026
Merged

fix(sqlite): forward worker boot-debug logs to the main thread#288
crs48 merged 2 commits into
mainfrom
claude/objective-brattain-1c1a7f

Conversation

@crs48

@crs48 crs48 commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Why

A returning user with a fully populated local cache still waits ~18–20s before any data renders. Exploration [0229] proved this is one operation monopolizing the single SQLite worker thread at boot — every landing query queues behind it and drains together (identical candidateQueryDurationMs ≈ 19765). The cache is fine; the hub is fine (connects in ~350ms).

0229 shipped the instrumentation to finally name the culprit op — a per-op queueMs/execMs tracer and one-shot db stats @ open (file size / page_count / freelist) in the SQLite worker, gated by xnet:boot:debug. But those lines are emitted via console.* inside the dedicated worker, and the in-app Logs panel only taps the main-thread console (useLogsPanel.ts). So every capture/export came back missing exactly the lines needed to localize the stall — a big reason it's taken multiple attempts.

What

  • New boot-log-bridge.ts: pure bootLogMessage(args) / readBootLogArgs(data) helpers using a dedicated discriminator key (__xnetSqliteBootLog) that can't collide with a Comlink RPC message on the same port.
  • web-worker.ts: an emitBootLog(...) helper logs to the worker console and postMessages the line to the main thread. The two boot-debug sites ([xNet] sqlite op, [xNet] db stats @ open) now route through it.
  • web-proxy.ts: the proxy listens for forwarded messages and re-emits them on the main console, where the Logs-panel tap captures them. Non-boot-log messages are left untouched for Comlink.

Gated entirely by xnet:boot:debug; no effect on normal operation. After this, enabling boot debug and reloading puts db stats @ open and the sqlite op {queueMs, execMs} lines into the in-app Logs panel and its export — so the next capture can finally name the ~19.7s op and confirm whether the file is still bloated.

Test plan

  • New boot-log-bridge.test.ts (7 cases): round-trip, arg order/types, structured-clone (postMessage) survival, ignores Comlink-shaped + malformed messages.
  • @xnetjs/sqlite: typecheck clean, full package suite green (98 passed).
  • The one full-suite failure was dashboard-runtime.test.tsx (a waitFor live-query flake, unrelated to sqlite) — passes on retry.

🤖 Generated with Claude Code

@crs48 crs48 temporarily deployed to pr-288 June 26, 2026 21:14 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

✓ Changelog fragment found — thanks!

@crs48 crs48 added the skip-changelog Exclude this PR from the changelog label Jun 26, 2026
The SQLite Web Worker's boot-debug diagnostics (`[xNet] sqlite op` per-op
queue/exec timing and `[xNet] db stats @ open`) were emitted only in the
dedicated worker's console, which the in-app Logs panel — a main-thread
console tap — never captures. Every boot-stall capture/export therefore came
back missing exactly the lines needed to localize the stall (exploration 0229).

The worker now postMessages each boot-debug line under a dedicated discriminator
key (no collision with Comlink RPC), and WebSQLiteProxy re-emits it on the main
console where the Logs panel captures it. Gated by `xnet:boot:debug`; no effect
on normal operation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
github-actions Bot added a commit that referenced this pull request Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Preview removed for PR #288.

@crs48 crs48 force-pushed the claude/objective-brattain-1c1a7f branch from 18836d5 to b0cd77c Compare June 26, 2026 21:20
@crs48 crs48 temporarily deployed to pr-288 June 26, 2026 21:20 — with GitHub Actions Inactive
@crs48 crs48 temporarily deployed to pr-288 June 26, 2026 21:27 — with GitHub Actions Inactive
github-actions Bot added a commit that referenced this pull request Jun 26, 2026
@crs48 crs48 merged commit 027de00 into main Jun 26, 2026
13 checks passed
@crs48 crs48 deleted the claude/objective-brattain-1c1a7f branch June 26, 2026 21:34
github-actions Bot added a commit that referenced this pull request Jun 26, 2026
crs48 added a commit that referenced this pull request Jun 26, 2026
Implements **Wave 1** of exploration
[0233](docs/explorations/0233_[_]_THE_15_SECOND_COLD_FIRST_QUERY_OPFS_PAGE_IN_AND_DB_BLOAT.md)
— the cheap, safe cold-read mitigations. The boot-debug trace (enabled
by #288) finally named the culprit in one line:

```
[xNet] sqlite op query {"lane":"interactive","queueMs":0,"execMs":15858}
```

One landing query, **15.8 s of real execution**, `queueMs:0` — and every
op after it `execMs:0`. That's a cold OPFS page-in of the first query's
working set on the single SQLite worker; the 256 MB page cache only
helps *re-reads*, so it never touched the first read. The file is
large/fragmented (a 424k-row append-only change log + 1.3M-row scalar
index grown interleaved), so the cold read faults pages scattered across
it.

## What's in this PR

- **B1 — `PRAGMA mmap_size`**
([web.ts](packages/sqlite/src/adapters/web.ts)): memory-map reads so the
first cold query faults via the OS instead of thousands of synchronous 8
KiB reads. Guarded — a no-op where `opfs-sahpool` doesn't support it.
This is the lever for the first read.
- **A3 — one-time idle `VACUUM`**
([db-vacuum.ts](apps/web/src/lib/db-vacuum.ts)): defragments the OPFS
file so subsequent cold boots fault a smaller, denser working set.
Mirrors the 0229 presence cleanup (one-shot, localStorage-gated,
idle-scheduled, off the critical path, never throws). Logs file size
before/after — which also captures the `db stats` measurement the doc
asked for.
- **C2 — defer the prewarm fan-out**
([WorkingSetPrewarm.tsx](apps/web/src/components/WorkingSetPrewarm.tsx)):
mounts the five prewarm subscriptions one animation frame late so the
active route's own query reaches the worker first and is the one that
pays the cold read and paints.

## Deliberately deferred (separate, focused PRs)

The doc is a multi-wave plan that **gates later waves on measurement**
and touches the protocol kernel — implementing them blind here would be
reckless:

- **Capture `db stats`** / **instrument the ~4.4 s secondary gap** —
runtime observations; A3 now logs the file size, and the rest needs a
live capture against the user's OPFS data.
- **C1 instant-shell snapshot** — a new persistence + write-through +
render path; its own feature.
- **A1 change-log compaction** + **A2 split `changes` DB** — `changes`
is the signed, hash-chained protocol kernel (0200, golden vectors).
Pruning it needs a spec + cross-impl test updates.
- **D1 WASM stream-compile** (~1 s, secondary) and **D2 hub redeploy**
(ops; clears the orthogonal `INVALID_HASH`).

The 0233 doc stays `[_]` (3/10 implementation items done) — Wave 1 lands
here; the architectural waves follow with measurements in hand.

## Test plan

- New `db-vacuum.test.ts` (3 cases: opfs vacuum+latch,
memory-skip-no-latch, flag no-op).
- `@xnetjs/sqlite` + `apps/web`: typecheck clean, full suites green (677
tests in touched areas).
- Pre-push full-suite "failures" were a `core.bare` flip from the hook's
own `pnpm install` breaking devkit git tests — all pass with healthy
git; none are in touched code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Exclude this PR from the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant