Skip to content

feat: Communicate Mode SDK (on_relay)#565

Merged
khaliqgant merged 31 commits intomainfrom
feature/ts-communicate-core-pi
Mar 16, 2026
Merged

feat: Communicate Mode SDK (on_relay)#565
khaliqgant merged 31 commits intomainfrom
feature/ts-communicate-core-pi

Conversation

@khaliqgant
Copy link
Member

@khaliqgant khaliqgant commented Mar 13, 2026

Summary

  • New Communicate Mode: on_relay() puts any framework agent on Relaycast with a single call — no broker needed
  • Python SDK: Relay core + transport + 6 adapters (OpenAI Agents, Claude SDK, Google ADK, Agno, Swarms, CrewAI) with 96 tests
  • TypeScript SDK: Relay core + transport + 2 adapters (Pi, Claude SDK) with 16 tests
  • Docs: Overview page, 7 per-framework guides, updated introduction — all with .mdx/.md sync

Architecture

Two tiers of framework integration:

  • Tier 1 (Push): Claude SDK, Pi, Google ADK — messages injected mid-execution via hooks/callbacks
  • Tier 2 (Poll): OpenAI Agents, Agno, Swarms, CrewAI — messages surfaced at tool-call boundaries

Brokerless transport: HTTP + WebSocket directly to Relaycast API with auto-reconnect and exponential backoff.

Test plan

  • Python unit tests: core, transport, types (56 tests)
  • Python adapter tests: all 6 frameworks (40 tests)
  • Python integration tests: cross-framework + e2e (8 tests)
  • TypeScript unit tests: core, adapters (14 tests)
  • TypeScript integration tests: cross-framework (2 tests)
  • TypeScript build: tsc -p tsconfig.build.json passes clean
  • Docs: .mdx and .md files in sync per docs-sync rule

Spec: specs/connect-sdk.md

🤖 Generated with Claude Code


Open with Devin

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Member

@willwashburn willwashburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Communicate Mode SDK (on_relay)

Substantial and well-architected addition. The core Relay and transport layers are solid. However, there are two blocking issues plus several high-priority items to address before merge.

What's Good

  • Clean layered architecture: types → transport → core → adapters
  • Thorough MockRelayServer in Python conftest
  • Good test coverage (96 Python + 16 TS tests)
  • Lazy connection pattern — no network calls until first use
  • WebSocket reconnect with exponential backoff in both languages
  • Frozen Message dataclass is a solid immutability choice
  • Docs sync between .mdx and .md is generally correct

devin-ai-integration[bot]

This comment was marked as resolved.

@willwashburn
Copy link
Member

Medium & Low Priority Items

Medium

8. HTTP retry doesn't retry on aiohttp.ClientError (transport.py)
Network-level errors (aiohttp.ClientError) are caught and immediately re-raised as RelayConnectionError without retrying. These are typically more transient than 5xx errors and should also get retry attempts.

9. @agent-relay/config version mismatch (packages/sdk/package.json)
References "@agent-relay/config": "3.1.23" while the root package-lock.json and other packages use 3.2.1.

10. Google ADK adapter appends raw string to llm_request.contents (google_adk.py)
ADK expects Content objects (with parts and role), not raw strings. This will likely cause a runtime error when ADK tries to serialize.

11. autoCleanup uses process.once('beforeExit') (core.ts)
This doesn't fire on SIGTERM, SIGINT, or process.exit() — cleanup is best-effort. Consider process.on('exit', ...) for sync cleanup or document the limitation.

12. _run_sync raises in async contexts (core.py)
The atexit-registered close_sync will fail if there's a running event loop at exit time (common in frameworks like CrewAI/Agno). Consider nest_asyncio or a thread-based fallback.

Low / Suggestions

13. MockRelayServer is duplicated between two TS test files (~300 lines each) — extract to a shared fixture.

14. Tool definitions (relay_send, relay_inbox, relay_post, relay_agents) are copy-pasted across all 6 Python adapters — extract a shared create_relay_tools(relay) helper (TS already does this in pi.ts).

15. Spec (specs/connect-sdk.md) references withRelay but the implementation uses onRelay — update the spec.

16. export * from './transport.js' in index.ts exposes RelayTransport as public API — this is an implementation detail.

Docs Sync Gap

docs/markdown/communicate.md is missing the per-framework guide links that appear in the <CardGroup> section of docs/communicate.mdx.

khaliqgant added a commit that referenced this pull request Mar 13, 2026
- Fix claude_sdk adapter argument order: on_relay(options, relay, *, name)
  to match other adapters' (agent, relay) pattern
- Fix return type annotation: callable -> Callable[[], None] in core.py
- URL-encode WebSocket token in transport.py to prevent connection failures
- Add retry for aiohttp.ClientError in HTTP transport (was only retrying 5xx)
- Add Relay.peek() method and use it in OpenAI Agents/Agno instructions
  wrappers to avoid draining inbox and starving the relay_inbox tool
- Remove transport re-export from TS communicate/index.ts (internal detail)
- Improve error message for TS onRelay() auto-detection failures
- Add SIGTERM/SIGINT handlers for autoCleanup in TS core.ts
- Use thread-based fallback in _run_sync for async context compatibility
- Fix @agent-relay/config version: 3.1.23 -> 3.2.1 in SDK package.json
- Update spec: withRelay references -> onRelay
- Add per-framework guide links to docs/markdown/communicate.md (docs sync)
- Update all tests for new signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant added a commit that referenced this pull request Mar 13, 2026
- Fix claude_sdk adapter argument order: on_relay(options, relay, *, name)
  to match other adapters' (agent, relay) pattern
- Fix return type annotation: callable -> Callable[[], None] in core.py
- URL-encode WebSocket token in transport.py to prevent connection failures
- Add retry for aiohttp.ClientError in HTTP transport (was only retrying 5xx)
- Add Relay.peek() method and use it in OpenAI Agents/Agno instructions
  wrappers to avoid draining inbox and starving the relay_inbox tool
- Remove transport re-export from TS communicate/index.ts (internal detail)
- Improve error message for TS onRelay() auto-detection failures
- Add SIGTERM/SIGINT handlers for autoCleanup in TS core.ts
- Use thread-based fallback in _run_sync for async context compatibility
- Fix @agent-relay/config version: 3.1.23 -> 3.2.1 in SDK package.json
- Update spec: withRelay references -> onRelay
- Add per-framework guide links to docs/markdown/communicate.md (docs sync)
- Update all tests for new signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant added a commit that referenced this pull request Mar 13, 2026
- Fix claude_sdk adapter argument order: on_relay(options, relay, *, name)
  to match other adapters' (agent, relay) pattern
- Fix return type annotation: callable -> Callable[[], None] in core.py
- URL-encode WebSocket token in transport.py to prevent connection failures
- Add retry for aiohttp.ClientError in HTTP transport (was only retrying 5xx)
- Add Relay.peek() method and use it in OpenAI Agents/Agno instructions
  wrappers to avoid draining inbox and starving the relay_inbox tool
- Remove transport re-export from TS communicate/index.ts (internal detail)
- Improve error message for TS onRelay() auto-detection failures
- Add SIGTERM/SIGINT handlers for autoCleanup in TS core.ts
- Use thread-based fallback in _run_sync for async context compatibility
- Fix @agent-relay/config version: 3.1.23 -> 3.2.1 in SDK package.json
- Update spec: withRelay references -> onRelay
- Add per-framework guide links to docs/markdown/communicate.md (docs sync)
- Update all tests for new signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@khaliqgant
Copy link
Member Author

Devin Review Response

Reviewed all findings. Status:

Already fixed in prior commits:

  • ✅ Claude SDK argument order — tests already use correct on_relay(options, relay=relay) order
  • ✅ aiohttp import guard — both __init__.py files have try/except wrapper
  • ✅ Google ADK sync callback — already uses inspect.isawaitable()
  • ✅ HTTP retry on ClientError — transport.py already retries on aiohttp.ClientError
  • @agent-relay/config version — already 3.2.1 in both places
  • ✅ Python connect dedup — _connect_future already handles concurrent callers
  • ✅ Spec withRelay/onRelay — spec uses on_relay, code has withRelay as alias (backward compat)

Acknowledged (low priority, will address in follow-up):

  • MockRelayServer duplication across 2 test files — extract to shared fixture
  • Tool definitions copy-pasted across 6 Python adapters — extract shared helper
  • autoCleanup beforeExit limitation — document in README

New in this push: A2A protocol transport layer (Python 89 tests ✅ + TypeScript)

Copy link
Collaborator

@barryonthecape barryonthecape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge push here — there is a lot of good work, but I'm still seeing a few blocking issues that need to be fixed before merge:

  1. TS A2A parser bug (runtime crash): packages/sdk/src/communicate/a2a-types.ts

    • a2aMessageFromDict() currently uses:
      const parts = (d.parts as Record<string, unknown>[] | undefined ?? []).map(a2aPartFromDict);
    • Because of the expression form, this is not safely defaulting in the intended way and can leave parts undefined at runtime before .map(...), causing a crash.
    • Please change to explicit grouping:
      const parts = ((d.parts as Record<string, unknown>[] | undefined) ?? []).map(a2aPartFromDict);
  2. TS connect retry dead-end: packages/sdk/src/communicate/core.ts

    • ensureConnected() caches connectPromise, but if transport.connect() rejects once, the rejected promise is retained and every future call rethrows forever.
    • Need to clear/reset connectPromise on failure so a subsequent call can retry.
  3. Python connect future can deadlock: packages/sdk-py/src/agent_relay/communicate/core.py

    • _ensure_connected() sets _connect_future before connect attempt. If transport.connect() fails and fallback register_agent() also fails, _connect_future is never resolved/rejected.
    • Later calls await that stale pending future forever.
    • Please make sure _connect_future is always completed (set_exception) and reset on failure paths.
  4. Version skew in this PR: packages/sdk-py/pyproject.toml

    • Still version = "3.1.23" while this PR bumps the JS workspace packages to 3.2.1.
    • Please align Python package version with the release version strategy used in this PR.

Also: PR currently shows merge conflicts with main, so please rebase/merge main before final review pass.

khaliqgant added a commit that referenced this pull request Mar 16, 2026
…DoS regex

1. Remove onRelay() auto-detect that threw on empty config {}; only
   export onPiRelay/onClaudeRelay (Will's suggestion).
2. core.ts handleTransportMessage already buffers AND dispatches — verified.
3. transport.ts sendHttp already has retry with exponential backoff — verified.
4. Replace polynomial regex /\/+$/ with iterative trimTrailingSlashes()
   to eliminate ReDoS risk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 39 additional findings in Devin Review.

Open in Devin Review

@@ -0,0 +1,5 @@
"""Framework-specific adapters for on_relay()."""

from .pi import on_relay as on_pi_relay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Python adapters __init__.py imports Pi adapter at module load, breaking without subprocess dependency

The adapters __init__.py (packages/sdk-py/src/agent_relay/communicate/adapters/__init__.py:3) unconditionally imports from .pi import on_relay as on_pi_relay. The Pi adapter (pi.py) imports subprocess and threading (which are stdlib so that's fine), but it also defines PiRpcSession that expects a real subprocess.Popen process. More importantly, this top-level import means any from agent_relay.communicate.adapters import ... will execute the Pi adapter module. This is inconsistent with all other adapters which use lazy imports — the other adapters (google_adk, openai_agents, etc.) are NOT imported in __init__.py and are only loaded when specifically requested. If the Pi adapter is the only one exported, it should match the pattern of being lazily imported, or all adapters should be exported consistently.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

khaliqgant and others added 2 commits March 16, 2026 20:50
Implement the Connect SDK spec — a new "Communicate Mode" that lets
any framework agent join Relaycast with a single on_relay() call.

Python SDK:
- Relay core (lazy WebSocket, send/post/reply/inbox/agents/close)
- RelayTransport (HTTP + WS, auto-reconnect, exponential backoff)
- 6 framework adapters: OpenAI Agents, Claude SDK, Google ADK, Agno, Swarms, CrewAI
- Tier 1 (Push) adapters inject messages mid-execution via hooks/callbacks
- Tier 2 (Poll) adapters surface messages at tool-call boundaries
- 96 tests (unit + integration + cross-framework)

TypeScript SDK:
- Relay core, transport, types mirroring Python API
- 2 framework adapters: Pi (session.steer), Claude SDK (PostToolUse hook)
- 16 tests (unit + integration + cross-framework)

Docs:
- communicate.mdx overview + 7 per-framework guides
- Plain markdown mirrors (docs-sync rule)
- Updated introduction.mdx with Communicate Mode section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves npm ci failure due to missing @sinclair/typebox@0.34.48 in lock file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
khaliqgant and others added 10 commits March 16, 2026 20:52
Endpoints changed:
- POST /v1/agents/register → POST /v1/agents (returns {ok, data: {id, token}})
- DELETE /v1/agents/{id} → POST /v1/agents/disconnect (agent token auth)
- POST /v1/messages/dm → POST /v1/dm (agent token auth)
- POST /v1/messages/channel → POST /v1/channels/{name}/messages (agent token auth)
- POST /v1/messages/reply → POST /v1/messages/{id}/replies (agent token auth)
- GET /v1/inbox/{agent_id} → GET /v1/inbox (agent token auth, returns unread_dms/mentions)
- GET /v1/ws/{agent_id}?token=... → GET /v1/ws?token=...

Key fixes:
- Added _send_http_as_agent() for per-agent token auth (vs workspace key)
- Inbox now fetches DM conversation messages from /v1/dm/{conv_id}/messages
- WS dispatch handles Relaycast event types (message.created, dm.received, etc.)
- Error message extraction handles {ok, error: {message}} envelope
- Updated mock server + unit tests + e2e tests

All 64 unit tests + 2 real e2e tests against live Relaycast pass.
…CrewAI + review fixes

- openai-agents.ts: tool injection via Agent tools array, message routing via instructions
- langgraph.ts: relay tools as LangGraph tool nodes, message routing via graph.invoke
- google-adk.ts: FunctionTool-compatible relay tools, message routing via runner.runAsync
- crewai.ts: CrewAI tool interface, step_callback routing, onCrewRelay for full crews
- Review fixes: Pi cleanup exposed, CrewAI dead code removed, onCrewRelay exported, LangGraph naming
- All 39 adapter tests pass
… Relaycast

TS e2e (5): Pi, OpenAI Agents, LangGraph, Google ADK, CrewAI
Python e2e (8): Pi, OpenAI Agents, CrewAI, LangGraph, Google ADK, Swarms, Agno, Claude SDK

All test real agent registration, relay tool execution (send/inbox/post/agents),
message routing, and cleanup against live api.relaycast.dev
Adds Agent2Agent (A2A) protocol support as an alternative transport:
- a2a_types: AgentCard, Message, Task, Part data model
- a2a_transport: JSON-RPC 2.0 client for sending to A2A agents
- a2a_server: HTTP server exposing Relay agents as A2A endpoints
- a2a_bridge: Bidirectional bridge between A2A and Relay workspaces

Python: 4 source files (1,040 lines) + 4 test files (1,612 lines), 89/89 passing
TypeScript: 4 source files + 4 test files (parity with Python)

Spec: A2A_TRANSPORT_SPEC.md with 3-phase implementation plan
…DoS regex

1. Remove onRelay() auto-detect that threw on empty config {}; only
   export onPiRelay/onClaudeRelay (Will's suggestion).
2. core.ts handleTransportMessage already buffers AND dispatches — verified.
3. transport.ts sendHttp already has retry with exponential backoff — verified.
4. Replace polynomial regex /\/+$/ with iterative trimTrailingSlashes()
   to eliminate ReDoS risk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Claude SDK adapter: fix argument order to on_relay(name, options, relay), add hooks null check
- Google ADK adapter: return before_model_callback result instead of discarding
- core.py: fix return type callable → Callable[[], None], connect-dedup in _ensure_connected
- Updated test call sites to match new claude_sdk signature
- Issues 1,4,5,6,8,9,10,11,12 already fixed in prior commit (1e2d591)
- Fix A2A parser operator precedence bug in a2aMessageFromDict (TS)
- Clear connectPromise on rejection so retries work (TS)
- Ensure _connect_future is always resolved/rejected to prevent deadlock (Python)
- Align Python SDK version with workspace (3.2.3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant force-pushed the feature/ts-communicate-core-pi branch from b795a50 to 58649a1 Compare March 16, 2026 19:54
Comment on lines +234 to +238
JSON.stringify({
jsonrpc: '2.0',
error: { code: -32602, message: String(err) },
id: rpcId,
}),

Check warning

Code scanning / CodeQL

Information exposure through a stack trace Medium

This information exposed to the user depends on
stack trace information
.

Copilot Autofix

AI 7 days ago

In general, the fix is to stop sending the raw error string derived from the caught exception back to the client and instead send a generic, non-sensitive message while logging the detailed error on the server. This preserves debuggability for developers but prevents attackers from learning about internal implementation details via error text or stack traces.

The best targeted fix here is to modify the catch (err) block in _handleJsonRpc (lines 231–240). Specifically:

  • Replace message: String(err) with a generic message such as "Internal error" or a stable, non-sensitive description of the failure.
  • Optionally, log the error (including its stack) to the server console or another logging system so developers can still diagnose issues.
  • Keep the existing JSON-RPC structure, error code (-32602), id, and HTTP status code behavior unchanged to avoid breaking existing clients’ expectations, unless the project has a specific error contract that suggests a different generic message.

No new external dependencies are strictly necessary; Node’s built-in console.error can be used for logging. All changes are confined to the catch block in _handleJsonRpc within packages/sdk/src/communicate/a2a-server.ts.

Suggested changeset 1
packages/sdk/src/communicate/a2a-server.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/packages/sdk/src/communicate/a2a-server.ts b/packages/sdk/src/communicate/a2a-server.ts
--- a/packages/sdk/src/communicate/a2a-server.ts
+++ b/packages/sdk/src/communicate/a2a-server.ts
@@ -229,11 +229,13 @@
       res.writeHead(200, { 'Content-Type': 'application/json' });
       res.end(JSON.stringify({ jsonrpc: '2.0', result, id: rpcId }));
     } catch (err) {
+      // Log detailed error information on the server, but do not expose it to the client.
+      console.error('Error handling JSON-RPC request:', err);
       res.writeHead(404, { 'Content-Type': 'application/json' });
       res.end(
         JSON.stringify({
           jsonrpc: '2.0',
-          error: { code: -32602, message: String(err) },
+          error: { code: -32602, message: 'Internal error' },
           id: rpcId,
         }),
       );
EOF
@@ -229,11 +229,13 @@
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ jsonrpc: '2.0', result, id: rpcId }));
} catch (err) {
// Log detailed error information on the server, but do not expose it to the client.
console.error('Error handling JSON-RPC request:', err);
res.writeHead(404, { 'Content-Type': 'application/json' });
res.end(
JSON.stringify({
jsonrpc: '2.0',
error: { code: -32602, message: String(err) },
error: { code: -32602, message: 'Internal error' },
id: rpcId,
}),
);
Copilot is powered by AI and may make mistakes. Always verify output.
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 2 commits March 16, 2026 21:15
- Fix A2AServer constructor arg order and void return in bridge test
- Fix openai-agents test type annotations for callable instructions
- Exclude communicate test files from build tsconfig
- Guard Pi adapter import in adapters __init__.py with try/except
- Clear _connect_future on successful connect to avoid stale reference
- Guard send_dm/post_message/reply against None payload on 204 responses
- Switch OpenAI/Agno instructions wrapper from peek() to inbox() to prevent unbounded buffer
- Fix CrewAI TS adapter step_callback composition to preserve routing
- Fix CrewAI Python _resolve_sync to poll inbox even when event loop is running

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace individual test file exclusions with a single glob pattern
to prevent vitest type resolution errors in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

The tsconfig.json (used by check/build:full) overrides the exclude
from tsconfig.build.json. Apply the same src/__tests__/** glob
exclusion to both configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

- Add 2s timeout to TS WebSocket disconnect to prevent hanging
- Remove zod dependency from Google ADK TS adapter, use plain JSON schemas
- Fix Claude SDK e2e test call signatures to match on_relay(name, options, relay)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 38 additional findings in Devin Review.

Open in Devin Review

text = m.get("text", "")
channel = m.get("channel") or m.get("channel_name") or m.get("channelName") or payload.get("channel") or payload.get("channel_name")
thread_id = m.get("thread_id") or m.get("threadId") or m.get("conversation_id") or m.get("conversationId") or payload.get("thread_id")
timestamp = m.get("timestamp") or m.get("created_at") or m.get("createdAt") or payload.get("timestamp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Python _message_from_payload uses or instead of None-checks, storing string timestamps in float | None field

The _message_from_payload method at packages/sdk-py/src/agent_relay/communicate/transport.py:476 uses Python or chains to resolve the timestamp field. This has two problems:

  1. If m.get("timestamp") returns 0 (valid Unix epoch), the or treats it as falsy and falls through to created_at, which is an ISO date string — silently losing the numeric timestamp.
  2. If timestamp is absent but created_at is present (the common API case), a string like "2024-01-01T00:00:00Z" is stored in Message.timestamp which is typed float | None. This violates the type contract and will break any downstream code doing arithmetic or comparisons on the timestamp.

The equivalent TypeScript version at packages/sdk/src/communicate/transport.ts:468-469 correctly uses nullish coalescing (??) and filters with typeof rawTs === 'number' ? rawTs : undefined.

Suggested change
timestamp = m.get("timestamp") or m.get("created_at") or m.get("createdAt") or payload.get("timestamp")
raw_ts = m.get("timestamp") if m.get("timestamp") is not None else m.get("created_at") if m.get("created_at") is not None else m.get("createdAt") if m.get("createdAt") is not None else payload.get("timestamp")
timestamp = raw_ts if isinstance(raw_ts, (int, float)) else None
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

text=item.get("text", ""),
channel=None,
thread_id=conv_id,
timestamp=item.get("created_at"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Python check_inbox stores string created_at values in Message.timestamp (typed float | None)

The check_inbox method constructs Message objects with timestamp=item.get("created_at") at lines 260, 272, and 283. The Relaycast API returns created_at as an ISO 8601 string (e.g., "2024-01-01T00:00:00Z"), but Message.timestamp is typed float | None. This stores a string in a float field, violating the type contract.

The equivalent TypeScript code at packages/sdk/src/communicate/transport.ts:177 correctly handles this: typeof last.created_at === 'string' ? undefined : (last.created_at as number | undefined).

All three occurrences in check_inbox have this issue — the created_at string value should either be converted to a float (epoch seconds) or set to None.

Prompt for agents
In packages/sdk-py/src/agent_relay/communicate/transport.py, the check_inbox method at lines 260, 272, and 283 passes raw created_at string values into Message.timestamp which is typed float | None. Fix all three occurrences by either:

1. Setting timestamp=None when the value is a string (matching the TS behavior), or
2. Parsing the ISO 8601 string to a float epoch timestamp.

Specifically:
- Line 260: Change timestamp=item.get("created_at") to filter out non-numeric values
- Line 272: Change timestamp=last.get("created_at") to filter out non-numeric values  
- Line 283: Change timestamp=mention.get("created_at") to filter out non-numeric values

A helper like `_numeric_timestamp(val)` that returns `val if isinstance(val, (int, float)) else None` would reduce duplication.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Align mock routes with transport.ts endpoints (POST /v1/agents,
  POST /v1/dm, POST /v1/channels/{ch}/messages, etc.)
- Switch agent-authenticated routes to validate per-agent token
- Update WebSocket upgrade path to /v1/ws?token={token}
- Wrap responses in {ok: true, data: {...}} format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…encies

@google/adk, @langchain/langgraph, @openai/agents, and crewai are only
needed when using their respective adapters. Move them to peerDependencies
with optional: true to avoid bloating the global install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

- Fix Message.timestamp type to accept str | float | None (API returns ISO strings)
- Fix TS errorMessage double-consuming response body — read text first, then parse
- Use local onMessage buffer in OpenAI/Agno instructions wrapper instead of
  inbox()/peek() to avoid starving the relay_inbox tool
- Store CrewAI on_message unsubscribe handle on agent to prevent leak

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 43 additional findings in Devin Review.

Open in Devin Review

Comment on lines +163 to +167
except Exception:
# WebSocket failed — register agent via HTTP and fall back to polling
await self.transport.register_agent()
self._ws_connected = False
self._start_poll_loop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Python _ensure_connected catches config/auth errors in WebSocket fallback path

The except Exception on line 163 is intended to catch WebSocket connection failures and fall back to HTTP polling, but it catches all exceptions from transport.connect(), including RelayConfigError (missing API key/workspace) and RelayAuthError (401). When a config or auth error occurs, transport.connect() fails, the code falls through to transport.register_agent() on line 165 which also calls _require_config() (or hits the same 401), causing the error to be raised from the fallback path instead of the original call. This doubles HTTP requests for auth failures and produces a confusing error origin.

Trace of the wrong-path execution
  1. transport.connect() calls _require_config() which raises RelayConfigError (or register_agent() raises RelayAuthError)
  2. except Exception on line 163 catches it
  3. transport.register_agent() on line 165 also fails with the same error
  4. Exception propagates to outer except at line 177
  5. User sees the error from the fallback path, not the original connect() call

The fix should re-raise RelayConfigError and RelayAuthError (or at minimum non-transient errors) before falling back to polling.

Suggested change
except Exception:
# WebSocket failed — register agent via HTTP and fall back to polling
await self.transport.register_agent()
self._ws_connected = False
self._start_poll_loop()
except (RelayConfigError, RelayAuthError):
raise
except Exception:
# WebSocket failed — register agent via HTTP and fall back to polling
await self.transport.register_agent()
self._ws_connected = False
self._start_poll_loop()
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +47 to +62
def _resolve_sync(self) -> str:
messages = self._drain_buffer()
try:
loop = asyncio.get_running_loop()
except RuntimeError:
loop = None

if loop is None:
messages.extend(self._relay.inbox_sync())
else:
# Running inside an event loop — use a thread to avoid blocking
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
polled = pool.submit(asyncio.run, self._relay.inbox()).result()
messages.extend(polled)
return _format_backstory(self._dedupe(messages), self._base_backstory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CrewAI _RelayBackstory drains relay inbox and triggers network I/O on every string operation

Every __str__, __repr__, __contains__, __eq__, or __getattr__ call on _RelayBackstory invokes _resolve_sync() which calls self._relay.inbox_sync() (packages/sdk-py/src/agent_relay/communicate/adapters/crewai.py:55). This has two problems:

  1. Side effect: inbox_sync() drains the relay's _pending buffer (core.py:59-60), so any other consumer (other adapters, user code calling relay.inbox()) will see empty results after the backstory is accessed as a string.
  2. Performance: Simple operations like "text" in agent.backstory or str(agent.backstory) trigger a synchronous HTTP round-trip to the relay server (via inbox_sync()), which is surprising for what looks like a property access.

CrewAI may access backstory multiple times during a single task execution (for prompt construction, logging, etc.), each time draining the inbox and making network calls.

Prompt for agents
In packages/sdk-py/src/agent_relay/communicate/adapters/crewai.py, the _RelayBackstory class's _resolve_sync() method (lines 47-62) calls self._relay.inbox_sync() on every string operation (__str__, __repr__, __contains__, __eq__, __getattr__). This drains the relay's pending message buffer as a side effect and triggers network I/O on simple property accesses.

The fix should:
1. Remove the inbox_sync()/inbox() call from _resolve_sync() and _resolve_async(). The on_message callback buffer (self._buffer) already receives all messages via the subscription at line 139.
2. Only use self._drain_buffer() to get messages, since the on_message subscription at line 139 already populates the buffer with all incoming messages.
3. This eliminates the side effect of draining the relay's _pending buffer and removes the network I/O from string operations.

The key insight is that the _buffer_message callback (line 136-137) already captures all messages via relay.on_message(), so there's no need to also poll inbox(). The _dedupe call can also be removed since messages only come from one source.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@khaliqgant khaliqgant merged commit c38eab3 into main Mar 16, 2026
30 of 32 checks passed
@khaliqgant khaliqgant deleted the feature/ts-communicate-core-pi branch March 16, 2026 22:14
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 40 additional findings in Devin Review.

Open in Devin Review

Comment on lines +246 to +255
def _run_sync(awaitable: Any) -> Any:
try:
asyncio.get_running_loop()
except RuntimeError:
return asyncio.run(awaitable)
# Running inside an active event loop — execute in a separate thread
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
return pool.submit(asyncio.run, awaitable).result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Python _run_sync / CrewAI _resolve_sync reuses aiohttp session across incompatible event loops

Both Relay._run_sync() (core.py:246-255) and _RelayBackstory._resolve_sync() (crewai.py:47-62) use pool.submit(asyncio.run, coroutine).result() to run async code from a sync context. asyncio.run() creates a new event loop each time. If the Relay was previously connected (creating an aiohttp.ClientSession bound to event loop A), and then a sync method is called that needs to make HTTP requests, the existing session bound to the now-closed loop A is reused in loop B.

aiohttp sessions are bound to the event loop they were created in — using one in a different loop raises RuntimeError. The _ensure_session() check at transport.py:362-364 only checks self._session.closed, which is False for a session whose loop was closed (since session.close() was never called).

This manifests in the CrewAI adapter when: (1) the Relay connects asynchronously, then (2) CrewAI accesses agent.backstory synchronously, triggering _resolve_sync() which calls relay.inbox() in a new event loop. If the inbox path needs an HTTP call (e.g., WebSocket is disconnected and _ws_connected is False at core.py:54), it would use the stale session and crash.

Prompt for agents
In packages/sdk-py/src/agent_relay/communicate/core.py, the _run_sync static method (lines 246-255) and the CrewAI adapter's _resolve_sync (packages/sdk-py/src/agent_relay/communicate/adapters/crewai.py lines 47-62) both call asyncio.run() in a ThreadPoolExecutor, which creates a new event loop. If the RelayTransport already has an aiohttp.ClientSession bound to a different (now-closed) event loop, reusing it will fail.

To fix this, either:
1. In RelayTransport._ensure_session() (transport.py lines 362-364), also check whether the session's loop is still running (not just session.closed). If the loop is closed, recreate the session.
2. Or, in _run_sync, close the existing transport session before calling asyncio.run so it gets recreated in the new loop.
3. Or, track which event loop the session was created in and force recreation if it differs from the current running loop.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants