feat(sdk-py): add direct spawn/message API#472
Conversation
Add AgentRelay facade, AgentRelayClient, Agent handles, and Models constants so the Python SDK can spawn agents and exchange messages directly via the broker binary, matching the TypeScript SDK's API. - protocol.py: wire protocol types (AgentSpec, ProtocolEnvelope, etc.) - client.py: async broker subprocess client with JSON protocol - relay.py: high-level facade with event hooks and agent spawners - models.py: Claude, Codex, Gemini model constants - Updated __init__.py with new primary exports + backward compat - Legacy agent_relay/ shim re-exports from src/agent_relay/ - Version bump to 0.3.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use agent.name (from broker response) instead of input name for state cleanup in spawn() and spawn_and_wait() - Auto-download broker binary from GitHub releases on first use if not found at ~/.agent-relay/bin/ or on PATH. Includes platform detection, macOS codesigning, and binary verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ff17586 to
b794bd4
Compare
Add runtime auto-download fallback to resolveDefaultBinaryPath(). If the broker binary isn't found at any of the existing locations (Cargo build, bundled npm, ~/.agent-relay/bin/, PATH), automatically download it from GitHub releases. Matches the Python SDK behavior. Includes platform detection, macOS codesigning, and binary verification. Falls back to a helpful manual install message on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| try { | ||
| fs.mkdirSync(installDir, { recursive: true }); | ||
| execSync(`curl -fsSL "${downloadUrl}" -o "${targetPath}"`, { |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, the fix is to avoid executing shell commands constructed as a single string that embeds untrusted data. Instead, use child_process.execFileSync (or spawnSync) with an explicit program and an array of arguments, so that arguments are not parsed by a shell and cannot change the command semantics via shell metacharacters.
For this code, the best fix without changing functionality is:
- Replace
execSyncuses that build a full command string withexecFileSyncand pass each argument as a separate string element in an array. - Keep using the same binaries (
curl,codesign, and the downloaded broker executable), timeouts, and stdio options. - Import
execFileSyncfromnode:child_processin addition to the existing imports.
Concretely in packages/sdk/src/client.ts:
-
Update the import on line 2 to also import
execFileSync. -
Change the
curlinvocation:From:
execSync(`curl -fsSL "${downloadUrl}" -o "${targetPath}"`, { ... });
To:
execFileSync('curl', ['-fsSL', downloadUrl, '-o', targetPath], { ... });
This removes shell parsing entirely;
targetPathis passed as a literal argument. -
Change the
codesigninvocation similarly:From:
execSync(`codesign --force --sign - "${targetPath}"`, { ... });
To:
execFileSync('codesign', ['--force', '--sign', '-', targetPath], { ... });
-
Change the verification step:
From:
execSync(`"${targetPath}" --help`, { ... });
To:
execFileSync(targetPath, ['--help'], { ... });
These changes preserve behavior (same commands, same arguments) while eliminating the shell and addressing all the variants of the alert that involve targetPath in a command string.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execSync, execFileSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -695,7 +695,7 @@ | ||
|
|
||
| try { | ||
| fs.mkdirSync(installDir, { recursive: true }); | ||
| execSync(`curl -fsSL "${downloadUrl}" -o "${targetPath}"`, { | ||
| execFileSync('curl', ['-fsSL', downloadUrl, '-o', targetPath], { | ||
| timeout: 60_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); | ||
| @@ -704,7 +704,7 @@ | ||
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { | ||
| execFileSync('codesign', ['--force', '--sign', '-', targetPath], { | ||
| timeout: 10_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); | ||
| @@ -714,7 +714,7 @@ | ||
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| execFileSync(targetPath, ['--help'], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| } catch (err) { | ||
| try { fs.unlinkSync(targetPath); } catch { /* ignore */ } | ||
| const message = err instanceof Error ? err.message : String(err); |
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: Avoid passing a concatenated command string containing targetPath to a shell. Instead, invoke codesign with execFileSync (or spawnSync) and pass arguments as an array so the path is not interpreted by a shell. This directly addresses any taint flowing from environment variables into shell parsing.
Best concrete fix here:
- Replace the
execSynccall on line 707, which currently uses a template string and shell, with anexecFileSynccall that:- Uses
'codesign'as the command. - Provides
['--force', '--sign', '-', targetPath]as the argument array. - Keeps the same options (
timeout,stdio).
- Uses
- To do this safely without changing other behavior:
- Import
execFileSyncfromnode:child_processalongside the existingexecSyncandspawnimport. - Only change the
codesigninvocation; the otherexecSynccalls (curl,targetPath --help) are not part of this particular alert and should stay as-is unless separately reported.
- Import
File/region details:
- File:
packages/sdk/src/client.ts - At the top of the file, update the import on line 2 to include
execFileSync. - In
installBrokerBinary():- Around lines 705–710, replace:
with:
execSync(`codesign --force --sign - "${targetPath}"`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'], });
execFileSync('codesign', ['--force', '--sign', '-', targetPath], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'], });
- Around lines 705–710, replace:
No additional helper methods are needed; just the new import and updated call.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execSync, execFileSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -704,7 +704,7 @@ | ||
| // macOS: re-sign to avoid Gatekeeper issues | ||
| if (process.platform === 'darwin') { | ||
| try { | ||
| execSync(`codesign --force --sign - "${targetPath}"`, { | ||
| execFileSync('codesign', ['--force', '--sign', '-', targetPath], { | ||
| timeout: 10_000, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); |
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); |
Check warning
Code scanning / CodeQL
Indirect uncontrolled command line Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: Avoid invoking a shell with a single concatenated command string when any part of that string can be influenced by environment variables. Instead, use execFileSync (or spawn) with the executable path and arguments passed as an array. This prevents the shell from interpreting metacharacters present in paths or arguments.
Best concrete fix here: On line 717, replace the execSync call that runs "${targetPath}" --help as a shell command with a call to execFileSync, passing targetPath as the executable and ['--help'] as the argument list. This removes shell interpretation entirely while preserving the functional intent: verify that the just-installed broker binary runs and responds to --help. To implement this, we need to import execFileSync from node:child_process alongside the existing imports, and then adjust only this call site (the other execSync uses are not part of the tainted path in this alert).
Concretely:
-
Edit
packages/sdk/src/client.tsimport on line 2 to addexecFileSyncto the destructuring fromnode:child_process. -
Edit the block around line 717 to replace:
execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] });
with:
execFileSync(targetPath, ['--help'], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] });
No other changes are needed to preserve behavior.
| @@ -1,5 +1,5 @@ | ||
| import { once } from 'node:events'; | ||
| import { execSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { execSync, execFileSync, spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'; | ||
| import { createInterface, type Interface as ReadlineInterface } from 'node:readline'; | ||
| import fs from 'node:fs'; | ||
| import os from 'node:os'; | ||
| @@ -714,7 +714,7 @@ | ||
| } | ||
|
|
||
| // Verify | ||
| execSync(`"${targetPath}" --help`, { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| execFileSync(targetPath, ['--help'], { timeout: 10_000, stdio: ['pipe', 'pipe', 'pipe'] }); | ||
| } catch (err) { | ||
| try { fs.unlinkSync(targetPath); } catch { /* ignore */ } | ||
| const message = err instanceof Error ? err.message : String(err); |
Lead with the new AgentRelay API showing Claude + Codex collaboration, move workflow builder to an advanced section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix concurrent wait_for_exit/wait_for_idle overwrite bug: use list of futures per agent so multiple callers all resolve correctly - Remove dead exception handler in send_message: client already catches unsupported_operation, check result dict sentinel instead - Skip on_message_sent hook for unsupported operations - Fix version placeholder to match latest published version (3.0.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| env = self._client_kwargs.get("env") | ||
| if env is None: | ||
| env_key = os.environ.get("RELAY_API_KEY") | ||
| if env_key: | ||
| self._client_kwargs["env"] = {**os.environ, "RELAY_API_KEY": env_key} | ||
| else: | ||
| self._client_kwargs["env"] = dict(os.environ) |
There was a problem hiding this comment.
🟡 Missing RELAY_API_KEY injection when user provides custom env dict
When a user provides a custom env dict to AgentRelay(env={...}), the _ensure_started method skips injecting RELAY_API_KEY from os.environ into that dict. This diverges from the TypeScript SDK's behavior, which explicitly handles this case.
Root Cause: Missing `elif` branch matching TypeScript SDK's `ensureRelaycastApiKey`
In relay.py:383-390, the env setup only handles the env is None case:
env = self._client_kwargs.get("env")
if env is None:
env_key = os.environ.get("RELAY_API_KEY")
if env_key:
self._client_kwargs["env"] = {**os.environ, "RELAY_API_KEY": env_key}
else:
self._client_kwargs["env"] = dict(os.environ)But the TypeScript SDK at packages/sdk/src/relay.ts:801-804 also handles the case where env exists but lacks the key:
if (!this.clientOptions.env) {
this.clientOptions.env = { ...process.env, RELAY_API_KEY: envKey };
} else if (!this.clientOptions.env.RELAY_API_KEY) {
this.clientOptions.env.RELAY_API_KEY = envKey;
}Impact: If a user creates AgentRelay(env={"SOME_VAR": "value"}) and has RELAY_API_KEY set in their system environment, the Python SDK will not propagate it to the broker subprocess, potentially causing broker authentication or Relaycast connection failures. The TypeScript SDK correctly injects the key in this scenario.
| env = self._client_kwargs.get("env") | |
| if env is None: | |
| env_key = os.environ.get("RELAY_API_KEY") | |
| if env_key: | |
| self._client_kwargs["env"] = {**os.environ, "RELAY_API_KEY": env_key} | |
| else: | |
| self._client_kwargs["env"] = dict(os.environ) | |
| env = self._client_kwargs.get("env") | |
| if env is None: | |
| env_key = os.environ.get("RELAY_API_KEY") | |
| if env_key: | |
| self._client_kwargs["env"] = {**os.environ, "RELAY_API_KEY": env_key} | |
| else: | |
| self._client_kwargs["env"] = dict(os.environ) | |
| else: | |
| env_key = os.environ.get("RELAY_API_KEY") | |
| if env_key and "RELAY_API_KEY" not in env: | |
| env["RELAY_API_KEY"] = env_key |
Was this helpful? React with 👍 or 👎 to provide feedback.
…-message-api # Conflicts: # packages/sdk-py/README.md
- Add Python install/examples to introduction and quickstart using CodeGroup tabs - Create full Python SDK reference page (reference/sdk-py.mdx) - Add sdk-py to mint.json navigation - Cross-link between TypeScript and Python SDK reference pages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| if (result.ok) { | ||
| console.log(result.message); | ||
| console.log(`\nWorkspace key: ${result.apiKey}`); |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, the fix is to stop logging secrets such as API keys or workspace keys in clear text. Either do not output the key at all, or replace it with a redacted form or an instruction that tells the user where to find or store it, without including the secret itself.
For this specific code, the minimal fix that doesn’t change application behavior elsewhere is to remove or alter the line that prints result.apiKey. We should keep the surrounding status and instruction messages but avoid including the key value. A straightforward approach is:
- Keep
console.log(result.message);as-is. - Replace
console.log(\\nWorkspace key: ${result.apiKey}`);` with a message that tells the user that a workspace key was created or is available, but omits the actual key string. If the rest of the codebase depends on the user seeing the key, a safer alternative would later be to write it to a secure location (config file, keyring), but since we can’t change other files, we’ll only change this logging line.
Concretely, in packages/openclaw-relaycast/src/cli.ts, within runSetup, change line 92 to a non-sensitive informational message. No new imports or helper methods are required; we only modify the single console.log call.
| @@ -89,7 +89,7 @@ | ||
|
|
||
| if (result.ok) { | ||
| console.log(result.message); | ||
| console.log(`\nWorkspace key: ${result.apiKey}`); | ||
| console.log('\nA workspace key has been created for this setup.'); | ||
| console.log('Share this key with other claws to join the same workspace.'); | ||
| } else { | ||
| console.error(`Setup failed: ${result.message}`); |
| console.log(`Claw name: ${config.clawName}`); | ||
| console.log(`Channels: ${config.channels.join(', ')}`); | ||
| console.log(`Base URL: ${config.baseUrl}`); | ||
| console.log(`API key: ${config.apiKey.slice(0, 12)}...`); |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, the fix is to avoid logging any part of sensitive data (like API keys) and instead log only non-sensitive metadata, such as whether an API key is configured or the length of the key, or a fixed masked representation (e.g., ********). The goal is to preserve the diagnostic value of the status output while ensuring that no portion of the secret itself is printed.
The best fix here without changing existing functionality much is to replace the line that prints a prefix of the apiKey with a line that prints a masked representation and maybe the key length, but no actual characters from the key. For example, log something like API key: [configured, length=32] or API key: ******** (configured); this still tells the user that a key is configured and roughly how it looks structurally, without revealing any part of the value. Concretely, in packages/openclaw-relaycast/src/cli.ts, in runStatus(), change line 143 from interpolating config.apiKey.slice(0, 12) to a string that does not include config.apiKey contents. No new imports or helper functions are needed; simple string formatting with existing variables is sufficient.
| @@ -140,7 +140,7 @@ | ||
| console.log(`Claw name: ${config.clawName}`); | ||
| console.log(`Channels: ${config.channels.join(', ')}`); | ||
| console.log(`Base URL: ${config.baseUrl}`); | ||
| console.log(`API key: ${config.apiKey.slice(0, 12)}...`); | ||
| console.log('API key: [configured, value hidden]'); | ||
|
|
||
| // Try to check connectivity | ||
| try { |
| // Write a minimal openclaw.json so MCP servers can be registered | ||
| const configPath = join(detection.homeDir, 'openclaw.json'); | ||
| if (!existsSync(configPath)) { | ||
| await writeFile(configPath, JSON.stringify({ mcpServers: {} }, null, 2) + '\n', 'utf-8'); |
Check failure
Code scanning / CodeQL
Potential file system race condition High
| } catch (err) { | ||
| // mcporter not installed — non-fatal, print manual instructions | ||
| console.warn('mcporter not found. Install MCP servers manually:'); | ||
| console.warn(` mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=${apiKey} --scope home`); |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general: avoid including secrets (API keys, tokens, passwords) in log messages. If you need to show an example command, use a placeholder such as <YOUR_API_KEY> or a partially redacted value (e.g., last 4 characters) rather than the full key. This preserves usability while protecting confidentiality.
Best concrete fix here: change the two console.warn lines that embed RELAY_API_KEY=${apiKey} to instead show a placeholder like RELAY_API_KEY=<YOUR_API_KEY>. This keeps the instructions while ensuring the real apiKey value is never written to logs on this mcporter-not-installed path. No new imports or helper methods are necessary; we only alter the string literals in packages/openclaw-relaycast/src/setup.ts around lines 255–257.
| @@ -253,8 +253,8 @@ | ||
| } catch (err) { | ||
| // mcporter not installed — non-fatal, print manual instructions | ||
| console.warn('mcporter not found. Install MCP servers manually:'); | ||
| console.warn(` mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=${apiKey} --scope home`); | ||
| console.warn(` mcporter config add openclaw-spawner --command npx --arg openclaw-relaycast --arg mcp-server --env RELAY_API_KEY=${apiKey} --scope home`); | ||
| console.warn(' mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=<YOUR_API_KEY> --scope home'); | ||
| console.warn(' mcporter config add openclaw-spawner --command npx --arg openclaw-relaycast --arg mcp-server --env RELAY_API_KEY=<YOUR_API_KEY> --scope home'); | ||
| } | ||
| } | ||
|
|
| // mcporter not installed — non-fatal, print manual instructions | ||
| console.warn('mcporter not found. Install MCP servers manually:'); | ||
| console.warn(` mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=${apiKey} --scope home`); | ||
| console.warn(` mcporter config add openclaw-spawner --command npx --arg openclaw-relaycast --arg mcp-server --env RELAY_API_KEY=${apiKey} --scope home`); |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix clear-text logging of sensitive information, you must avoid interpolating secrets (API keys, tokens, passwords) directly into log messages. Either: (a) omit them from logs entirely, (b) redact them (e.g., show only a short prefix), or (c) instruct the user to supply them manually without echoing the actual value.
Here, the only use of apiKey in logging is in the “mcporter not found” instructions on lines 256–257:
console.warn(` mcporter config add relaycast ... --env RELAY_API_KEY=${apiKey} ...`);
console.warn(` mcporter config add openclaw-spawner ... --env RELAY_API_KEY=${apiKey} ...`);We can preserve functionality by keeping the instructions but asking the user to insert their key themselves. The safest fix is to replace ${apiKey} with a neutral placeholder such as <YOUR_RELAY_API_KEY>. This keeps the commands accurate while preventing the real key from being displayed in logs.
Concretely:
- In
packages/openclaw-relaycast/src/setup.ts, update the twoconsole.warntemplate literals in thecatchblock at lines 254–258. - Replace
--env RELAY_API_KEY=${apiKey}with--env RELAY_API_KEY=<YOUR_RELAY_API_KEY>in both strings. - No new imports or helper functions are needed; we just change the string contents.
| @@ -253,8 +253,8 @@ | ||
| } catch (err) { | ||
| // mcporter not installed — non-fatal, print manual instructions | ||
| console.warn('mcporter not found. Install MCP servers manually:'); | ||
| console.warn(` mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=${apiKey} --scope home`); | ||
| console.warn(` mcporter config add openclaw-spawner --command npx --arg openclaw-relaycast --arg mcp-server --env RELAY_API_KEY=${apiKey} --scope home`); | ||
| console.warn(' mcporter config add relaycast --command npx --arg @relaycast/mcp --env RELAY_API_KEY=<YOUR_RELAY_API_KEY> --scope home'); | ||
| console.warn(' mcporter config add openclaw-spawner --command npx --arg openclaw-relaycast --arg mcp-server --env RELAY_API_KEY=<YOUR_RELAY_API_KEY> --scope home'); | ||
| } | ||
| } | ||
|
|
|
|
||
| // Try to check connectivity | ||
| try { | ||
| const res = await fetch(`${config.baseUrl}/v1/health`); |
Check warning
Code scanning / CodeQL
File data in outbound network request Medium
| '', | ||
| ].join('\n'); | ||
|
|
||
| await writeFile(join(relaycastDir, '.env'), env, 'utf-8'); |
Check warning
Code scanning / CodeQL
Network data written to file Medium
| pending.resolve(false); | ||
| } else { | ||
| const result = msg.payload as Record<string, unknown> | undefined; | ||
| console.log(`[openclaw-ws] RPC ${id} ok: runId=${result?.runId ?? 'n/a'} status=${result?.status ?? 'n/a'}`); |
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to prevent log injection when logging user-controlled values, you should sanitize those values before interpolation. For plain-text logs, this typically means removing or replacing newline and carriage return characters (and sometimes other control characters) and clearly delimiting user input. In TypeScript/JavaScript, a small helper like sanitizeLogValue that converts arbitrary values to strings and strips \r and \n is usually sufficient and minimally invasive.
For this specific code, the minimal, non-breaking fix is to sanitize id, result.runId, and result.status before including them in the console.log message on line 275. To avoid repeated inline regexes and to keep behaviour consistent, we can add a small helper method inside the same class that takes an unknown value, coerces it to string, and removes \r and \n. Then, in the RPC-response logging block, we call this helper on each potentially tainted value before constructing the log message. This preserves existing functionality (the log still shows the same semantic information), while preventing an attacker from injecting line breaks into the logs.
Concretely:
- Add a private method
sanitizeLogValue(value: unknown): stringto the class that containsdoConnectandhandleMessage. It should:- Convert
valueto string (e.g.,String(value)). - Replace
\rand\nwith empty strings usingreplace(/[\r\n]/g, '').
- Convert
- Update the RPC success logging line to use this helper for
id,runId, andstatus. For example:- Compute
const safeId = this.sanitizeLogValue(id); - Compute
const safeRunId = this.sanitizeLogValue(result?.runId ?? 'n/a'); - Compute
const safeStatus = this.sanitizeLogValue(result?.status ?? 'n/a'); - Log using those sanitized variables instead of the raw values.
- Compute
No new imports are needed; we only use built-in string and regex capabilities.
| @@ -136,13 +136,23 @@ | ||
| return this.connectPromise; | ||
| } | ||
|
|
||
| /** | ||
| * Sanitize values before logging to prevent log injection. | ||
| * Removes newline and carriage return characters from the string representation. | ||
| */ | ||
| private sanitizeLogValue(value: unknown): string { | ||
| return String(value).replace(/[\r\n]/g, ''); | ||
| } | ||
|
|
||
| private doConnect(): void { | ||
| if (this.stopped) return; | ||
|
|
||
| try { | ||
| this.ws = new WebSocket(`ws://127.0.0.1:${this.port}`); | ||
| } catch (err) { | ||
| console.warn(`[openclaw-ws] Connection failed: ${err instanceof Error ? err.message : String(err)}`); | ||
| console.warn( | ||
| `[openclaw-ws] Connection failed: ${err instanceof Error ? err.message : String(err)}` | ||
| ); | ||
| this.scheduleReconnect(); | ||
| return; | ||
| } | ||
| @@ -272,7 +276,12 @@ | ||
| pending.resolve(false); | ||
| } else { | ||
| const result = msg.payload as Record<string, unknown> | undefined; | ||
| console.log(`[openclaw-ws] RPC ${id} ok: runId=${result?.runId ?? 'n/a'} status=${result?.status ?? 'n/a'}`); | ||
| const safeId = this.sanitizeLogValue(id); | ||
| const safeRunId = this.sanitizeLogValue(result?.runId ?? 'n/a'); | ||
| const safeStatus = this.sanitizeLogValue(result?.status ?? 'n/a'); | ||
| console.log( | ||
| `[openclaw-ws] RPC ${safeId} ok: runId=${safeRunId} status=${safeStatus}` | ||
| ); | ||
| pending.resolve(true); | ||
| } | ||
| return; |
| })); | ||
| } catch (err) { | ||
| res.writeHead(500); | ||
| res.end(JSON.stringify({ ok: false, error: err instanceof Error ? err.message : String(err) })); |
Check warning
Code scanning / CodeQL
Information exposure through a stack trace Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, the fix is to stop including raw exception messages (or other stack-derived information) in HTTP responses and instead return a generic, stable error string while logging the detailed information on the server. The server logs are where developers can access stack traces and diagnostic data; external callers should only see high-level error text and, optionally, a correlation ID.
Concretely for gateway.ts, we should change the /spawn handler’s catch block to:
- Log the full error (including stack) to
stderror a logger. - Return a 500 JSON body with a generic error like
"Internal server error while spawning worker"that does not depend onerr.message.
We do not need to change the status code or the basic JSON structure; just make the error field generic and add server-side logging.
For process.ts, the main leak path is via the error ultimately thrown out of waitForGateway. We can keep throwing the original error so call sites can decide how to handle it, but we should:
- Log the failure (including stack) locally in
process.tsto aid debugging. - Ensure we do not introduce any new user-facing messages here. The actual exposure fix is on the HTTP side; in this snippet we only add server-side logging and keep behavior otherwise identical.
No new external dependencies are required; basic process.stderr.write is sufficient. The edits are limited to the shown regions: the /spawn handler catch block in gateway.ts and the waitForGateway try/catch in process.ts.
| @@ -785,8 +785,20 @@ | ||
| active: this.spawnManager.size, | ||
| })); | ||
| } catch (err) { | ||
| // Log detailed error information server-side, but do not expose it to the client. | ||
| if (err instanceof Error) { | ||
| process.stderr.write(`[gateway:/spawn] Failed to spawn process: ${err.message}\n`); | ||
| if (err.stack) { | ||
| process.stderr.write(`${err.stack}\n`); | ||
| } | ||
| } else { | ||
| process.stderr.write(`[gateway:/spawn] Failed to spawn process: ${String(err)}\n`); | ||
| } | ||
| res.writeHead(500); | ||
| res.end(JSON.stringify({ ok: false, error: err instanceof Error ? err.message : String(err) })); | ||
| res.end(JSON.stringify({ | ||
| ok: false, | ||
| error: 'Internal server error while spawning process', | ||
| })); | ||
| } | ||
| return; | ||
| } |
| @@ -143,6 +143,15 @@ | ||
| await waitForGateway(port, 30); | ||
| } catch (err) { | ||
| gatewayProcess.kill('SIGTERM'); | ||
| // Log the error locally for diagnostics; callers should map this to a generic user-facing error. | ||
| if (err instanceof Error) { | ||
| process.stderr.write(`[spawn:${options.name}:gateway] Failed to become healthy: ${err.message}\n`); | ||
| if (err.stack) { | ||
| process.stderr.write(`${err.stack}\n`); | ||
| } | ||
| } else { | ||
| process.stderr.write(`[spawn:${options.name}:gateway] Failed to become healthy: ${String(err)}\n`); | ||
| } | ||
| throw err; | ||
| } | ||
|
|
| res.end(JSON.stringify({ ok: released, active: this.spawnManager.size })); | ||
| } catch (err) { | ||
| res.writeHead(500); | ||
| res.end(JSON.stringify({ ok: false, error: err instanceof Error ? err.message : String(err) })); |
Check warning
Code scanning / CodeQL
Information exposure through a stack trace Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general terms, the fix is to stop sending the raw error message derived from the caught exception back to the client, and instead return a generic error message while logging the detailed error on the server. This maintains debuggability while preventing information exposure.
Concretely for this code, we should change the catch (err) block in handleControlRequest for the /release path so that:
- The response body contains only a generic error string (for example,
"Internal server error"or"Failed to release claw"), and - The full error (including message and possibly stack) is logged using
console.erroron the server side.
We will:
- Edit the
catch (err)block around lines 832–835. - Add a
console.errorcall that logs a clear message plus theerrobject. - Replace the current
error: err instanceof Error ? err.message : String(err)with a constant, non-sensitive message that does not depend onerr.
No new imports are required; console.error is available globally in Node.js.
| @@ -830,8 +830,9 @@ | ||
| res.writeHead(200); | ||
| res.end(JSON.stringify({ ok: released, active: this.spawnManager.size })); | ||
| } catch (err) { | ||
| console.error('Error handling /release control request:', err); | ||
| res.writeHead(500); | ||
| res.end(JSON.stringify({ ok: false, error: err instanceof Error ? err.message : String(err) })); | ||
| res.end(JSON.stringify({ ok: false, error: 'Internal server error' })); | ||
| } | ||
| return; | ||
| } |
| async def wait_for_idle(self, timeout_ms: Optional[int] = None) -> str: | ||
| """Wait for agent to go idle. Returns 'idle', 'exited', or 'timeout'.""" | ||
| if self._name not in self._relay._known_agents: | ||
| return "exited" | ||
| if timeout_ms == 0: | ||
| return "timeout" | ||
|
|
||
| future: asyncio.Future[str] = asyncio.get_running_loop().create_future() | ||
| self._relay._idle_resolvers.setdefault(self._name, []).append(future) |
There was a problem hiding this comment.
🔴 Agent.wait_for_idle hangs when agent is already idle
wait_for_idle does not check whether the agent is already in the _idle_agents set before registering a future on _idle_resolvers. If the agent is already idle when wait_for_idle is called, no new agent_idle event will fire from the broker, so the future will never be resolved.
Root Cause and Impact
At packages/sdk-py/src/agent_relay/relay.py:129-137, the method checks if the agent has exited (self._name not in self._relay._known_agents) and if timeout_ms == 0, but it never checks self._relay._idle_agents:
async def wait_for_idle(self, timeout_ms: Optional[int] = None) -> str:
if self._name not in self._relay._known_agents:
return "exited"
if timeout_ms == 0:
return "timeout"
future = asyncio.get_running_loop().create_future()
self._relay._idle_resolvers.setdefault(self._name, []).append(future)The _idle_resolvers futures are only resolved when the agent_idle event fires in _wire_events at packages/sdk-py/src/agent_relay/relay.py:718-727. But if the agent is already idle, the broker won't emit another agent_idle event, so the future never completes.
This differs from wait_for_exit which works correctly because exited agents are removed from _known_agents (line 682), so the early-return check on line 106 catches the already-exited case. Idle agents remain in _known_agents, so there is no equivalent safety net.
Impact: Calling wait_for_idle() on an already-idle agent will block indefinitely (no timeout) or return "timeout" instead of the expected "idle". This is a common usage pattern — e.g. polling for idle after a period of activity.
| async def wait_for_idle(self, timeout_ms: Optional[int] = None) -> str: | |
| """Wait for agent to go idle. Returns 'idle', 'exited', or 'timeout'.""" | |
| if self._name not in self._relay._known_agents: | |
| return "exited" | |
| if timeout_ms == 0: | |
| return "timeout" | |
| future: asyncio.Future[str] = asyncio.get_running_loop().create_future() | |
| self._relay._idle_resolvers.setdefault(self._name, []).append(future) | |
| async def wait_for_idle(self, timeout_ms: Optional[int] = None) -> str: | |
| """Wait for agent to go idle. Returns 'idle', 'exited', or 'timeout'.""" | |
| if self._name not in self._relay._known_agents: | |
| return "exited" | |
| if self._name in self._relay._idle_agents: | |
| return "idle" | |
| if timeout_ms == 0: | |
| return "timeout" | |
| future: asyncio.Future[str] = asyncio.get_running_loop().create_future() |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Closing in favor of #473 |
Summary
AgentRelayfacade,AgentRelayClient,Agenthandles, andModelsconstants to the Python SDK so it can spawn agents and exchange messages directly via the broker binary — matching the TypeScript SDK's APIprotocol.py,client.py,relay.py,models.pyinpackages/sdk-py/src/agent_relay/workflow(),fan_out(),pipeline(),dag()) still works unchangedTarget usage
Test plan
pytest tests/)relay-a2a-python/main.py🤖 Generated with Claude Code