Skip to content

fix(opencode): prevent unbounded memory growth from stuck SSE streams#24058

Closed
autopilotgrowth wants to merge 1 commit intoanomalyco:devfrom
autopilotgrowth:fix/sse-close-wait
Closed

fix(opencode): prevent unbounded memory growth from stuck SSE streams#24058
autopilotgrowth wants to merge 1 commit intoanomalyco:devfrom
autopilotgrowth:fix/sse-close-wait

Conversation

@autopilotgrowth
Copy link
Copy Markdown

Fixes #22198.

Problem

When an SSE peer disconnects via TCP half-close (CLOSE_WAIT), Hono's stream.onAbort never fires on Bun, and writeSSE can neither resolve nor reject on the half-closed socket. The existing AsyncQueue had no size limit, so heartbeats and Bus events accumulated at ~14 MB/sec per zombie connection (Windows desktop peaks reported at 24.5 GB).

Fix

Three independent guards so the stream tears down no matter which failure mode hits:

  • AsyncQueue gets a size getter and a close(sentinel) that drops buffered items and unblocks the consumer immediately, so producers can detect drain failure and stop.
  • A shared writeSSEWithTimeout helper races each writeSSE against a 30s timeout so half-closed sockets no longer block the consumer forever.
  • On timeout or any write error, the existing stop() clears the heartbeat, unsubscribes from the Bus, and closes the queue.

Applied symmetrically to /event (per-instance) and /global/event.

Verification

  • packages/opencode/test/util/queue.test.ts — 8 tests covering size, close, buffered-drop, idempotency
  • packages/opencode/test/util/sse.test.ts — 4 tests covering timeout, success, error propagation
  • Full bun typecheck passes. bun test test/util 200/200 pass. bun test test/server test/control-plane/sse.test.ts 27/27 pass.

Repro from the issue on Windows 11:

$id = (Get-Process -Name 'opencode-cli').Id
$conns = netstat -ano | Select-String "$id"
"RAM: $([math]::Round((Get-Process -Id $id).WorkingSet64/1GB,2))GB | Alive: $(($conns|Select-String ESTABLISHED).Count) | Zombie: $(($conns|Select-String CLOSE_WAIT).Count)"

Before: RAM grows ~14 MB/sec with stable zombie count. After: queue either caps at 10k entries or write times out at 30s; either way the stream self-terminates and memory returns to baseline.

When an SSE peer goes into TCP CLOSE_WAIT, Hono's stream.onAbort may never
fire on Bun and writeSSE can neither resolve nor reject on a half-closed
socket. The existing AsyncQueue had no size limit, so heartbeats and Bus
events accumulated at ~14 MB/sec per zombie connection (see anomalyco#22198).

Three guards:
- AsyncQueue.size + AsyncQueue.close(sentinel) so producers can detect
  drain failure and tear the stream down immediately.
- writeSSEWithTimeout wraps each write in a 30s race so half-closed
  sockets no longer block the consumer forever.
- On timeout or any write error, stop() clears the heartbeat, unsubs
  from the Bus, and closes the queue.

Applied to both /event and /global/event.

Fixes anomalyco#22198
@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found one potentially related PR:

Related PR:

This PR appears to address the same issue of memory leaks from zombie SSE connections. Since the current PR (24058) fixes issue #22198 with similar symptoms (unbounded memory growth from stuck SSE streams), PR #22552 may be a related attempt to fix similar SSE connection issues and should be reviewed for potential overlap or if it has already been resolved.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak: SSE connections stuck in CLOSE_WAIT cause unbounded AsyncQueue growth (~14 MB/sec)

1 participant