feat(sdk-py): add direct spawn/message API#473
Conversation
Add AgentRelay facade, AgentRelayClient, Agent handles, and Models constants so the Python SDK can spawn agents and exchange messages directly via the broker binary, matching the TypeScript SDK's API. - protocol.py: wire protocol types (AgentSpec, ProtocolEnvelope, etc.) - client.py: async broker subprocess client with JSON protocol - relay.py: high-level facade with event hooks and agent spawners - models.py: Claude, Codex, Gemini model constants - Updated __init__.py with new primary exports + backward compat - Legacy agent_relay/ shim re-exports from src/agent_relay/ - Version bump to 0.3.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use agent.name (from broker response) instead of input name for state cleanup in spawn() and spawn_and_wait() - Auto-download broker binary from GitHub releases on first use if not found at ~/.agent-relay/bin/ or on PATH. Includes platform detection, macOS codesigning, and binary verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add runtime auto-download fallback to resolveDefaultBinaryPath(). If the broker binary isn't found at any of the existing locations (Cargo build, bundled npm, ~/.agent-relay/bin/, PATH), automatically download it from GitHub releases. Matches the Python SDK behavior. Includes platform detection, macOS codesigning, and binary verification. Falls back to a helpful manual install message on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lead with the new AgentRelay API showing Claude + Codex collaboration, move workflow builder to an advanced section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix concurrent wait_for_exit/wait_for_idle overwrite bug: use list of futures per agent so multiple callers all resolve correctly - Remove dead exception handler in send_message: client already catches unsupported_operation, check result dict sentinel instead - Skip on_message_sent hook for unsupported operations - Fix version placeholder to match latest published version (3.0.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Python install/examples to introduction and quickstart using CodeGroup tabs - Create full Python SDK reference page (reference/sdk-py.mdx) - Add sdk-py to mint.json navigation - Cross-link between TypeScript and Python SDK reference pages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| try { | ||
| fs.mkdirSync(installDir, { recursive: true }); | ||
| execSync(`curl -fsSL "${downloadUrl}" -o "${targetPath}"`, { |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: Avoid passing untrusted or environment-derived values into a shell-processed command string. Use child_process.execFileSync (or spawnSync) with an argument array instead of execSync with a concatenated string. That way, targetPath and downloadUrl become plain arguments to curl, not subject to shell parsing.
Best specific fix here:
- Replace the
execSynccall that runscurlwithexecFileSync('curl', [...]), passingdownloadUrlandtargetPathas separate arguments. - Similarly, replace the
codesigninvocation on macOS withexecFileSync('codesign', [...]). - Replace the verification step
execSync(\"${targetPath}" --help`, ...)withexecFileSync(targetPath, ['--help'], ...)`. - Keep timeouts and stdio options as close as possible to the original behavior.
- To implement this, we only need to:
- Import
execFileSyncfromnode:child_processalongside the existing imports. - Update the three
execSyncusages insideinstallBrokerBinarytoexecFileSyncwith argument arrays.
- Import
All changes are confined to packages/sdk/src/client.ts in the shown regions.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execFileSync, execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -695,7 +695,7 @@ | ||
|
|
||
| try { | ||
| fs.mkdirSync(installDir, { recursive: true }); | ||
| execSync(`curl -fsSL "${downloadUrl}" -o "${targetPath}"`, { | ||
| execFileSync('curl', ['-fsSL', downloadUrl, '-o', targetPath], { | ||
| timeout: 60_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); | ||
| @@ -704,7 +704,7 @@ | ||
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { | ||
| execFileSync('codesign', ['--force', '--sign', '-', targetPath], { | ||
| timeout: 10_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); | ||
| @@ -714,7 +714,10 @@ | ||
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| execFileSync(targetPath, ['--help'], { | ||
| timeout: 10_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); | ||
| } catch (err) { | ||
| try { fs.unlinkSync(targetPath); } catch { /* ignore */ } | ||
| const message = err instanceof Error ? err.message : String(err); |
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, the safest way to fix this kind of issue is to avoid spawning a shell when executing external commands, and instead use APIs that accept the program name and its arguments as separate parameters (for Node.js, child_process.execFile / execFileSync or spawn). This way, any untrusted strings are passed directly as arguments to the target program and are not interpreted by a shell, eliminating shell injection concerns.
For this specific code, installBrokerBinary already imports execSync from node:child_process and uses it to run curl, codesign, and the newly installed broker binary. The only flagged command is the codesign invocation on macOS:
execSync(`codesign --force --sign - "${targetPath}"`, {
timeout: 10_000,
stdio: ['pipe', 'pipe', 'pipe'],
});The minimal, behavior-preserving fix is to replace this with a call to execFileSync and pass the arguments as an array: ['--force', '--sign', '-', targetPath]. This avoids using a shell entirely, so any contents of targetPath (even if influenced by environment variables) are treated purely as a path argument by codesign. To do this:
- Update the import from
node:child_processat the top ofpackages/sdk/src/client.tsto includeexecFileSyncalongside the existingexecSyncandspawn. - Replace the
execSyncinvocation within the macOScodesignblock withexecFileSync('codesign', ['--force', '--sign', '-', targetPath], { ... }), preserving the existing timeout and stdio options.
No other behavior changes are needed, and the rest of the function can remain untouched. This single change will also address any additional alert variants that refer to the same sink.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execSync, execFileSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -704,7 +704,7 @@ | ||
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { | ||
| execFileSync('codesign', ['--force', '--sign', '-', targetPath], { | ||
| timeout: 10_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); |
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: Avoid passing any value derived from untrusted input (including environment variables) into a shell command string. Prefer child_process.execFileSync (or spawn with shell: false) so arguments are passed as an array and not interpreted by a shell. This removes shell metacharacter interpretation regardless of the content of targetPath.
Best concrete fix here: change the verification step on line 717 from execSync with a constructed command string to execFileSync with targetPath as the executable and '--help' as an argument. That way, even if HOME/USERPROFILE are maliciously set, targetPath is treated purely as an executable path, not parsed by a shell.
Details:
- In
packages/sdk/src/client.ts, addexecFileSyncto the existing import fromnode:child_process. - Replace:
with:
execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] });
execFileSync(targetPath, ['--help'], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] });
- This keeps behavior the same (run the just-installed broker with
--helpto verify it works) but avoids shell interpretation. - No additional helper methods or complex sanitization are needed.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execSync, execFileSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -714,7 +714,7 @@ | ||
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| execFileSync(targetPath, ['--help'], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| } catch (err) { | ||
| try { fs.unlinkSync(targetPath); } catch { /* ignore */ } | ||
| const message = err instanceof Error ? err.message : String(err); |
Address Devin review comment - the Python SDK's wait_for_agent_message was missing handling for the agent_released event. If an agent was released before sending its first message, the method would hang until timeout instead of immediately rejecting with a clear error. This brings parity with the TypeScript SDK which handles all three cases: relay_inbound, agent_exited, and agent_released. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address Devin review comment - when user provides a custom env dict
to AgentRelay(env={...}), RELAY_API_KEY from os.environ was not being
injected. This would silently break Relaycast workspace connection.
Now matches TypeScript SDK behavior at packages/sdk/src/relay.ts:790-806
which explicitly handles injecting RELAY_API_KEY into custom env dicts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed `pip install agent-relay` to `pip install agent-relay-sdk` to match the package name in pyproject.toml. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| for listener in self._event_listeners: | ||
| listener(event) |
There was a problem hiding this comment.
🔴 Uncaught exception in event listener kills the stdout reader, silently breaking all event processing
If any event hook callback (e.g., on_message_received, on_agent_ready) raises an exception, it propagates through _handle_stdout_line → _read_stdout, terminating the reader coroutine. After this, no more broker events or responses are processed, causing all pending and future requests to hang until their timeouts.
Root Cause and Impact
In client.py:407, event listeners are called in a bare loop with no try/except:
for listener in self._event_listeners:
listener(event)The listeners include the relay's _wire_events handler (relay.py:636), which calls user-provided hooks like self.on_message_received(msg) at relay.py:653. If the user's hook raises (e.g., relay.on_message_received = lambda msg: 1/0), the exception propagates up through _handle_stdout_line at client.py:388, out of the while True loop in _read_stdout at client.py:367-373, killing the reader task.
Since the reader task is fire-and-forget (asyncio.create_task at client.py:345), the exception is silently lost ("Task exception was never retrieved"). All subsequent stdout from the broker is ignored. Pending requests in self._pending hang until their individual timeouts. The broker process continues running but the SDK is effectively dead.
The TypeScript equivalent uses Node.js readline's on('line', ...) event handler (client.ts:403-414), where an exception in a handler doesn't prevent subsequent line events from being processed — making the TS version naturally more resilient to callback errors.
Impact: A single user callback error permanently breaks the SDK instance, with no visible error or recovery path.
| for listener in self._event_listeners: | |
| listener(event) | |
| for listener in self._event_listeners: | |
| try: | |
| listener(event) | |
| except Exception: | |
| pass # Don't let listener errors kill the reader |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds
AgentRelayfacade,AgentRelayClient,Agenthandles, andModelsconstants to the Python SDK so it can spawn agents and exchange messages directly via the broker binary — matching the TypeScript SDK's API.New files:
protocol.py,client.py,relay.py,models.pyinpackages/sdk-py/src/agent_relay/Features:
workflow(),fan_out(),pipeline(),dag()) still works unchangedTarget usage
Split from PR #472
This PR contains only the Python SDK changes, split out from #472 as requested. The
openclaw-relaycastpackage will be in a separate PR.Test plan
🤖 Generated with Claude Code