fix(desktop): actually close relay sockets — plugin:websocket|disconnect does not exist#1481
Conversation
…ect does not exist tauri-plugin-websocket registers only `connect` and `send`, but the relay clients invoked `plugin:websocket|disconnect` to tear down sockets. Every call rejected with "command not found" and was swallowed, so no socket was ever closed: each workspace switch leaked the previous authenticated relay connection, the workspace-rail unread observer leaked one socket per inactive workspace per 30s poll, and every reconnect cycle leaked one more. The zombie pile (observed: 59 sockets to one relay edge IP, growing) then triggered edge resets/throttling — the "frequent relay disconnects when switching workspaces" report. Close the socket the way the plugin's own JS API does: send a Close frame through `plugin:websocket|send`, which completes the close handshake and drops the connection from the plugin's manager. The e2e bridge mocked a working `plugin:websocket|disconnect`, so tests passed while production silently no-opped. Remove the mock handler and add a regression test that fails if any source file ever invokes the nonexistent command again. Co-authored-by: Brain <21994759fc7a6fa6b965551d35cfd7897d262f2495467f2d78694ddcfa6a5c7e@sprout-oss.stage.blox.sqprod.co> Signed-off-by: Wes <wesbillman@users.noreply.github.com>
wpfleger96
left a comment
There was a problem hiding this comment.
Approve. Verified the root cause against the crate source (tauri-plugin-websocket 2.4.2 in Cargo.lock): src/lib.rs:291 registers generate_handler![connect, send] only — there is genuinely no disconnect command, so every plugin:websocket|disconnect invoke was rejecting and getting swallowed. Nothing was closing in production.
The Close-frame fix is correct end-to-end:
- The
{ type: "Close", data: { code, reason } }payload matches the plugin'sWebSocketMessage::Close(CloseFrame { code: u16, reason: String })deserialization (lib.rs:139-141,256-258). - The plugin drops the socket from its manager map when the close handshake completes (
lib.rs:205-207), exactly as the doc comment states. - Fire-and-forget (
void, unawaited) is well-justified:ConnectionManageris a singleMutex<HashMap>(lib.rs:72) andsendholds that lock across its.await(lib.rs:249-261), so awaiting a Close on a wedged socket could stall other sockets' sends behind the global lock.
The e2e bridge change is consistent — __BUZZ_E2E_COMMAND_PAYLOADS__ is populated at e2eBridge.ts:7338 with the {command, payload} shape the onboarding spec now polls for. The regression guard is a nice touch. Two optional nits inline; neither blocks.
| * tauri-plugin-websocket 2.4.2 registers only `connect` and `send` — there is | ||
| * no `disconnect` command, so invoking one rejects and the socket leaks. Close | ||
| * the way the plugin's own JS API does: send a Close frame; the plugin's read | ||
| * loop drops the connection when the close handshake completes. |
There was a problem hiding this comment.
Nit (🟢): The comment says the plugin "drops the connection when the close handshake completes." Worth being precise — the plugin removes the writer entry only when its read loop receives a Message::Close back (lib.rs:205-207), i.e. when the peer echoes our Close. If a relay never echoes it (half-open / dead peer), the entry lingers until the underlying TCP read stream errors out and terminates read.for_each. That's the plugin's own behavior and strictly better than the old code (which closed nothing), but "leak-free" ultimately depends on the peer or TCP layer eventually terminating the read stream, not on our send returning. Not a blocker — just flagging so the comment doesn't over-promise.
| type: "Close", | ||
| data: { code: 1000, reason }, | ||
| }, | ||
| }).then( |
There was a problem hiding this comment.
Nit (🟢): The rejection handler silently swallows all errors. That's the right default for ConnectionNotFound (socket already gone — nothing to release), but it also hides genuinely unexpected failures (e.g. a malformed payload if the plugin's schema ever changes). Consider a console.debug in the catch so it's greppable during future debugging without adding noise. Optional.
…rejections Co-authored-by: Brain <21994759fc7a6fa6b965551d35cfd7897d262f2495467f2d78694ddcfa6a5c7e@sprout-oss.stage.blox.sqprod.co> Signed-off-by: Wes <wesbillman@users.noreply.github.com>
Problem
Frequent relay disconnects when switching workspaces via the workspace rail (reported by Wes on staging).
Root cause:
tauri-plugin-websocket2.4.2 registers onlyconnectandsendcommands, but the relay clients invokedplugin:websocket|disconnectin three places to tear down sockets. Every one of those calls rejected with "command not found" and was swallowed by.catch(() => {})— no relay socket was ever actually closed in production:Observed live on a running app: 92 → 142 established TLS sockets over 65s, including 59 connections to a single relay edge IP. The zombie pile triggered CDN edge resets/throttling (
Connection reset without closing handshake,HTTP error: 302 Foundon reconnect) — the "frequent relay disconnects" symptom.Why tests never caught it: the e2e bridge mocked a working
plugin:websocket|disconnecthandler, so tests exercised a disconnect that doesn't exist in production.Fix
closeWebSocket()helper that closes the socket the way the plugin's own JS API does: send aCloseframe throughplugin:websocket|send, which completes the close handshake and drops the connection from the plugin's manager. Fire-and-forget (not awaited) so a hung socket can't block callers via the plugin's global send mutex.relayClientSession.ts×2,readOnlyRelayClient.ts).disconnecthandler and updated the onboarding e2e assertion to expect the Close frame.plugin:websocket|disconnectagain.Testing
channels.spec.ts:867,scroll-history.spec.ts:289) reproduce identically on a pristineorigin/mainworktree, i.e. pre-existing flakes unrelated to this change.