Skip to content

[Bug] SSE connections leak — server never cleans up disconnected clients #580

@waleedkadous

Description

@waleedkadous

Problem

Tower accumulates SSE connections that never get cleaned up when clients disconnect. Over time this exhausts resources and makes the dashboard unresponsive.

Evidence

From a 25-day Tower uptime session:

  • 2932 SSE connections opened
  • 462 SSE disconnections logged
  • ~2470 leaked connections (84% leak rate)
  • Today alone: 234 connects vs 6 disconnects

Eventually the dashboard became completely unresponsive — buttons stopped working, hard refresh showed a spinner, but curl to the API still worked (short-lived connections). Tower restart fixed it immediately.

Root Cause (suspected)

The SSE endpoint isn't detecting client disconnections reliably. When a browser tab closes, refreshes, or loses connectivity (e.g. ERR_NETWORK_CHANGED), the server-side SSE response object stays open. Need to:

  1. Listen for close event on the response/request object
  2. Implement a heartbeat/keepalive to detect dead connections
  3. Clean up the SSE client set when connections go stale

Aggravating Factor

The annotation mtime polling (2311 errors from a single cleaned-up builder worktree) was generating ~1 request/second of error traffic, which may have accelerated resource exhaustion.

Fix Scope

  • Keep SSE architecture as-is
  • Add proper connection cleanup (close event listener, periodic sweep of dead connections)
  • Consider logging SSE client count periodically for monitoring
  • Bonus: stop annotation polling when the target file doesn't exist after N failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions