-
Notifications
You must be signed in to change notification settings - Fork 35
[Bug] SSE connections leak — server never cleans up disconnected clients #580
Description
Problem
Tower accumulates SSE connections that never get cleaned up when clients disconnect. Over time this exhausts resources and makes the dashboard unresponsive.
Evidence
From a 25-day Tower uptime session:
- 2932 SSE connections opened
- 462 SSE disconnections logged
- ~2470 leaked connections (84% leak rate)
- Today alone: 234 connects vs 6 disconnects
Eventually the dashboard became completely unresponsive — buttons stopped working, hard refresh showed a spinner, but curl to the API still worked (short-lived connections). Tower restart fixed it immediately.
Root Cause (suspected)
The SSE endpoint isn't detecting client disconnections reliably. When a browser tab closes, refreshes, or loses connectivity (e.g. ERR_NETWORK_CHANGED), the server-side SSE response object stays open. Need to:
- Listen for
closeevent on the response/request object - Implement a heartbeat/keepalive to detect dead connections
- Clean up the SSE client set when connections go stale
Aggravating Factor
The annotation mtime polling (2311 errors from a single cleaned-up builder worktree) was generating ~1 request/second of error traffic, which may have accelerated resource exhaustion.
Fix Scope
- Keep SSE architecture as-is
- Add proper connection cleanup (close event listener, periodic sweep of dead connections)
- Consider logging SSE client count periodically for monitoring
- Bonus: stop annotation polling when the target file doesn't exist after N failures