fix(opencode): prevent unbounded memory growth from stuck SSE streams#24058
fix(opencode): prevent unbounded memory growth from stuck SSE streams#24058autopilotgrowth wants to merge 1 commit intoanomalyco:devfrom
Conversation
When an SSE peer goes into TCP CLOSE_WAIT, Hono's stream.onAbort may never fire on Bun and writeSSE can neither resolve nor reject on a half-closed socket. The existing AsyncQueue had no size limit, so heartbeats and Bus events accumulated at ~14 MB/sec per zombie connection (see anomalyco#22198). Three guards: - AsyncQueue.size + AsyncQueue.close(sentinel) so producers can detect drain failure and tear the stream down immediately. - writeSSEWithTimeout wraps each write in a 30s race so half-closed sockets no longer block the consumer forever. - On timeout or any write error, stop() clears the heartbeat, unsubs from the Bus, and closes the queue. Applied to both /event and /global/event. Fixes anomalyco#22198
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found one potentially related PR: Related PR:
This PR appears to address the same issue of memory leaks from zombie SSE connections. Since the current PR (24058) fixes issue #22198 with similar symptoms (unbounded memory growth from stuck SSE streams), PR #22552 may be a related attempt to fix similar SSE connection issues and should be reviewed for potential overlap or if it has already been resolved. |
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Fixes #22198.
Problem
When an SSE peer disconnects via TCP half-close (CLOSE_WAIT), Hono's
stream.onAbortnever fires on Bun, andwriteSSEcan neither resolve nor reject on the half-closed socket. The existingAsyncQueuehad no size limit, so heartbeats and Bus events accumulated at ~14 MB/sec per zombie connection (Windows desktop peaks reported at 24.5 GB).Fix
Three independent guards so the stream tears down no matter which failure mode hits:
AsyncQueuegets asizegetter and aclose(sentinel)that drops buffered items and unblocks the consumer immediately, so producers can detect drain failure and stop.writeSSEWithTimeouthelper races eachwriteSSEagainst a 30s timeout so half-closed sockets no longer block the consumer forever.stop()clears the heartbeat, unsubscribes from the Bus, and closes the queue.Applied symmetrically to
/event(per-instance) and/global/event.Verification
packages/opencode/test/util/queue.test.ts— 8 tests covering size, close, buffered-drop, idempotencypackages/opencode/test/util/sse.test.ts— 4 tests covering timeout, success, error propagationbun typecheckpasses.bun test test/util200/200 pass.bun test test/server test/control-plane/sse.test.ts27/27 pass.Repro from the issue on Windows 11:
Before: RAM grows ~14 MB/sec with stable zombie count. After: queue either caps at 10k entries or write times out at 30s; either way the stream self-terminates and memory returns to baseline.