fix(edge,shareddoc): ws auth close via message listener + flush save deadlock#157
Merged
Conversation
- wsAuthFailureResponse now defers close() until the client sends its first message, preventing the CF Workers "Network connection lost." exception that occurred when close() was called before the 101 response was fully established. - flushSave no longer returns early when a save is in-flight; external callers now correctly await savingPromise. The deadlock that could occur when persistence.update → closeConn → flushSave re-entered the save is prevented by skipping flushSave in closeConn when doc.saving is true. - Reduced log noise for expected auth failures in persistence.update: 401 → console.warn (message only), 403 → console.log (message only), other errors → console.error (full Error object). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5 tasks
- wsAuthFailureResponse: arm a 5s safety timeout so clients that never send a first message still get closed; clear it from message/error/close listeners. - closeConn: replace ydoc.saving duplicate-state guard with an isReentrant flag passed explicitly from persistence.update's closeAll loop, eliminating the dual-state drift risk and making the intent explicit at the call site. - persistence.update error logging: guard err?.message?.startsWith against non-Error throws; unify 401 and 403 to console.warn with full Error object; keep console.error for all other failures. - Tests: extract testWsUpgradeAuthFailure helper; assert triggerMessage is defined before calling; add safety-timeout test; add re-entrancy deadlock regression test; update log-level assertions to expect Error object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
401 → console.warn (message only), 403 → console.log (message only), other errors → console.error (full Error). Uses safe optional chaining to guard against non-Error throws. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
🎉 This PR is included in version 1.5.3 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow up of #156 and #149.
The
Network connectivity Errorswas first to the closing of the server which was too early and then, when reverted, the deadlock introduced by the flushSave guard (if 401 or 403 during a save, close connections, which triggers a flushSave, which waits for the initial pending save).WS auth close:
wsAuthFailureResponsenow defersclose()until the client sends its first message (viaaddEventListener('message', ...)). Callingclose()immediately afteraccept()— before the 101 response is established — caused a CF Workers "Network connection lost." runtime exception.flushSave deadlock: The
if (saving) { return; }guard influshSavewas too broad — it caused external callers (e.g. flush-on-disconnect, flush-request messages) to return immediately instead of waiting for the in-flight save. Fixed by removing the guard fromflushSave(external callers now correctlyawait savingPromise) and instead skippingflushSaveinsidecloseConnwhendoc.savingis true, which is the only path that could deadlock (persistence.update → closeConn → flushSave).Log noise: Reduced severity for expected auth failures in
persistence.update: 401 →console.warn(message only), 403 →console.log(message only), other errors →console.error(full Error object).Test plan
npm test— 126 tests passingflushSavewaits for in-flight PUT before resolving🤖 Generated with Claude Code