fix(sdk): markOffline stops auto-heartbeat to prevent re-registration#74
Merged
khaliqgant merged 5 commits intomainfrom Mar 10, 2026
Merged
fix(sdk): markOffline stops auto-heartbeat to prevent re-registration#74khaliqgant merged 5 commits intomainfrom
khaliqgant merged 5 commits intomainfrom
Conversation
When an agent called markOffline() while connected via WebSocket, the auto-heartbeat timer kept firing and immediately re-registered the agent as online, making markOffline() effectively a no-op. Now markOffline() stops the auto-heartbeat timer before posting the disconnect request. This fixes the e2e 'markOffline transitions agent to offline' test that was failing on staging. Added tests: - markOffline stops auto-heartbeat timer - markOffline prevents auto-heartbeat from re-registering agent
The SDK fix alone wasn't enough — the server-side AgentDO refreshes presence in PresenceDO on every WS ping, which re-registered the agent as online even after REST /v1/agents/disconnect was called. Changes: - AgentDO: add presenceSuppressed flag, skip presence heartbeat on pings when suppressed - POST /agents/disconnect: notify AgentDO to suppress presence - POST /agents/heartbeat: notify AgentDO to unsuppress presence - New WS connections clear the suppression flag This ensures markOffline() actually transitions the agent to offline even with an active WebSocket connection.
|
Preview deployed!
This preview shares the staging database and will be cleaned up when the PR is merged or closed. Run E2E testsnpm run e2e -- https://pr74-api.relaycast.dev --ciOpen observer dashboard |
Addresses Devin review: fire-and-forget suppress call left a window where a WS ping could re-register the agent before suppression took effect. Now awaited so the AgentDO processes suppression before the disconnect response returns.
…nnect Addresses two Devin review comments: 1. presenceSuppressed was in-memory only — lost on DO hibernation, allowing WS pings to re-register the agent after eviction. Now persisted to DO storage following the same pattern as agentSeq. 2. Suppress was called AFTER PresenceDO disconnect, leaving a race window. Reordered: suppress AgentDO first, then disconnect from PresenceDO.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The e2e staging test
markOffline transitions agent to offlinewas failing:Root cause: When an agent with an active WebSocket called
markOffline(), it only POSTed to/v1/agents/disconnect. But the auto-heartbeat timer (started on WS open) kept running and immediately re-registered the agent as online via/v1/agents/heartbeat, makingmarkOffline()effectively a no-op.Fix
markOffline()now callsstopAutoHeartbeat()before posting the disconnect request. This ensures the heartbeat timer doesn't race against the offline transition.The Rust SDK already handled this correctly —
disconnect()closes the WS (which stops heartbeat) along with the REST call.Tests
All 240 SDK tests pass.
Failed CI run
https://github.com/AgentWorkforce/relaycast/actions/runs/22896829279