Skip to content

fix(sdk): close WS before HTTP disconnect to prevent presence race#105

Merged
khaliqgant merged 2 commits intomainfrom
fix/disconnect-presence-race
Mar 24, 2026
Merged

fix(sdk): close WS before HTTP disconnect to prevent presence race#105
khaliqgant merged 2 commits intomainfrom
fix/disconnect-presence-race

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Mar 24, 2026

Summary

  • Fixes e2e smoke test failure where BackendAgent stays "online" after disconnect()
  • Root cause: Server-side ping handler fired a heartbeat to PresenceDO without await. If the heartbeat landed at PresenceDO after the client's HTTP disconnect, the agent was re-added as online.
  • Fix (server): await the heartbeat fetch in AgentDO's ping handler. The DO runtime serializes webSocketMessage and webSocketClose, so awaiting guarantees the heartbeat settles before any subsequent disconnect is processed.
  • Fix (client): Close the WebSocket before sending the HTTP disconnect. This stops the ping timer immediately. The HTTP call then arrives last as the authoritative presence update.
  • Together these guarantee deterministic ordering at PresenceDO: heartbeat → server disconnect → client HTTP disconnect

Test plan

  • All 628 SDK + server unit tests pass
  • e2e smoke test should now reliably pass without BackendAgent staying online

🤖 Generated with Claude Code

The disconnect() method was calling markOffline() (HTTP) before closing
the WebSocket. Server-side, each WS ping triggers a fire-and-forget
heartbeat to PresenceDO. If a ping's heartbeat landed at PresenceDO
*after* the HTTP disconnect, the agent was re-added as online — causing
the e2e smoke test to fail with BackendAgent stuck "online".

Fix: close the WebSocket first (stopping the ping timer), pause briefly
for any in-flight server-side heartbeat to settle, then send the HTTP
disconnect as the authoritative final presence update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown

Preview deployed!

Environment URL
API https://pr105-api.relaycast.dev
Health https://pr105-api.relaycast.dev/health
Observer https://pr105-observer.relaycast.dev/observer

This preview shares the staging database and will be cleaned up when the PR is merged or closed.

Run E2E tests

npm run e2e -- https://pr105-api.relaycast.dev --ci

Open observer dashboard

https://pr105-observer.relaycast.dev/observer

Server: await the heartbeat in the AgentDO ping handler. The DO runtime
serializes message handlers and webSocketClose, so awaiting ensures the
heartbeat settles at PresenceDO before any subsequent disconnect.

Client: close the WebSocket before sending the HTTP disconnect. This
stops the ping timer immediately, and the HTTP call serves as the
authoritative final presence update.

Together these guarantee deterministic ordering at PresenceDO:
heartbeat (from last ping) → server disconnect (webSocketClose) →
client HTTP disconnect. The agent reliably ends up offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit f5b0b82 into main Mar 24, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/disconnect-presence-race branch March 24, 2026 20:35
khaliqgant added a commit that referenced this pull request Mar 24, 2026
- Merge origin/main (includes #103 schema + #105 disconnect fix)
- Resolve schema.test.ts conflicts: take main's 24-table version with
  directory/routing tables
- Fix PATCH /v1/agents/:name test: mock getAgentByName so the route's
  existence check passes before calling updateAgent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant