fix(sync): eliminate the 18s blank-page boot stall + registry persistence (0227)#278
Merged
crs48 merged 7 commits intoJun 26, 2026
Merged
Conversation
…orker head-of-line blocking) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iming + large-blob guardrail Presence docs (presence-*) are now never cold-loaded from yjs_state nor persisted back, removing the 18s presence-doc read that head-of-line blocked every landing query on the single SQLite worker at boot (exploration 0227). Adds a boot-debug loadDoc timing log, a >5MiB blob guardrail, and a one-shot xnet:docpool:first-acquire performance mark for the boot timeline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Joins the workspace presence room on requestIdleCallback instead of during the initial render burst, so presence-doc warming never competes with the landing read queries on the single SQLite worker (exploration 0227). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…contention Splits the first doc-warm out of the connect window: BootTimelineProbe observes the runtime's xnet:docpool:first-acquire mark and records a docwarm phase (store:ready -> first doc acquired), so a storage stall is attributed to storage rather than mislabelled as network connect time (exploration 0227). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The registry stored its tracked-node set under the synthetic key _xnet_tracked_nodes via setDocumentContent, which writes yjs_state and violates its node_id -> nodes(id) foreign key (SQLITE_CONSTRAINT_FOREIGNKEY 787) — so the registry silently never persisted. Adds FK-free getAppState/setAppState (backed by sync_state) to the storage adapter and routes the registry through it (exploration 0227). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…l-sqlite-worker-head-of-line-blocking
Contributor
🖼️ UI changes in this PRInteractionsAuto-captured by CI · run. Informational — not a blocking check. |
Contributor
|
Preview removed for PR #278. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Implements exploration 0227: the ~18s blank-page cold-start stall, plus two latent sync bugs surfaced in the same logs.
Root cause
Every storage op — landing read queries and Yjs document I/O — goes through one FIFO SQLite worker. At boot the workspace presence doc (
presence-main) was cold-loaded first; that one read sat at the head of the queue and head-of-line blocked every landing query for ~18s (all queries reported~18403msyet drained within a 56ms burst — queue wait, not SQL time). Presence isgc:falseand persisted on every tick, so itsyjs_stateblob grew unboundedly.What landed (code-complete: 6/8 checklist items)
node-pool.ts):presence-*docs are never cold-loaded fromyjs_statenor persisted back. Removes the head-of-line read at its source and bounds the blob. NewNodePoolConfig.isEphemeral/largeDocWarnBytes.CommsContext.tsx): workspace presence joins onrequestIdleCallback, so landing reads paint first.node-pool.ts): boot-debugloadDoctiming log + a >5MiB blob warning tripwire.boot-timeline.ts/BootTimelineProbe.tsx): a newdocwarmphase (observes the runtime'sxnet:docpool:first-acquiremark) so a storage stall is attributed to storage, not mislabelled as networkconnecttime.sqlite-adapter.ts,types.ts,sync-manager.ts): the registry persisted its tracked-node set under the synthetic key_xnet_tracked_nodesviasetDocumentContent, hittingyjs_state'snode_id → nodes(id)FK (SQLITE_CONSTRAINT_FOREIGNKEY787) — so it silently never persisted. Adds FK-freegetAppState/setAppState(backed bysync_state) and routes the registry through it.Tests
New:
node-pool.test.ts(5),boot-timelinedocwarm (2), sqlite-adapterapp state(4). All touched suites green;data/runtime/apps/webtypecheck clean.Deliberately out of scope (the other 2 checklist items)
INVALID_HASHflood is the tenant hub running an incompatible@xnetjs/sync(the client circuit breaker already handles it correctly). That's an ops/redeploy action, not a repo change.Validation items in the doc are field/manual QA (real cold-boot timing, throttled CPU, hub redeploy) pending a deployed build.
🤖 Generated with Claude Code