feat: add Rust broker and sdk-ts (additive, no removals) by khaliqgant · Pull Request #424 · AgentWorkforce/relay

khaliqgant · 2026-02-15T06:27:39Z

Summary

Add relay-broker Rust implementation and TypeScript SDK alongside existing codebase. This is an additive-only change for incremental migration - no existing packages or code removed.

This PR extracts only the additions from #412, enabling safe merge while preserving all existing functionality.

What's Added

Cargo.toml/Cargo.lock - Rust project configuration
src/*.rs - Rust broker implementation (21 files)
packages/sdk-ts/ - TypeScript SDK for broker (12 files)
.github/workflows/rust-ci.yml - Rust CI workflow
.claude/rules/rust.md - Rust coding guidelines

What's NOT Changed

All existing @agent-relay/* packages remain intact
All existing TypeScript code unchanged
All existing workflows preserved
No breaking changes

Migration Strategy

✅ This PR: Add broker/SDK alongside existing code
🔜 Next: Gradually integrate new broker into existing flows
🔜 Later: Remove legacy code only after migration complete

Test Plan

Rust CI passes (cargo test, cargo clippy)
Existing TypeScript tests still pass
No impact on current npm package

🤖 Generated with Claude Code

Related: #412 (original migration PR - this is the safe additive subset)

Add relay-broker Rust implementation and TypeScript SDK alongside existing codebase. This is an additive-only change for incremental migration - no existing packages or code removed. Added: - Cargo.toml/Cargo.lock for Rust project - src/*.rs - Rust broker implementation (21 files) - packages/sdk-ts - TypeScript SDK for broker - .github/workflows/rust-ci.yml - Rust CI workflow - .claude/rules/rust.md - Rust coding guidelines Existing TypeScript packages remain unchanged and functional. Integration will happen gradually over time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+
+function runQualityChecks(): { passed: boolean; output: string } {
+  try {
+    const output = execSync(QUALITY_CMD, { encoding: "utf-8", stdio: "pipe" });


In general, to fix this kind of issue you want to avoid passing untrusted data as a single shell command string. Prefer APIs that either (a) do not use a shell or (b) accept an argument vector ([cmd, arg1, arg2, ...]) instead of a concatenated string, and/or restrict the allowed commands to a safe set.

In this file, QUALITY_CMD is used only to run quality checks. The safest, least-invasive fix is:

Parse QUALITY_CMD into a program name and arguments without invoking a shell.

Use child_process.execFileSync (or spawnSync) with that parsed array, which does not perform shell expansion.

Keep the default QUALITY_CMD = "npm run check" working, and also support simple overridable commands like "pnpm lint".

To achieve this while avoiding writing our own shell parser, we can use the well-known shell-quote library (as suggested in the background). shell-quote.parse will convert a string like "npm run check" into ["npm", "run", "check"] while honoring quotes. Then we can call:

const parts = shellQuote.parse(QUALITY_CMD).map(String); const [cmd, ...args] = parts; execFileSync(cmd, args, { encoding: "utf-8", stdio: "pipe" });

This removes the shell from the picture but preserves functional behavior for reasonable QUALITY_CMD values. If QUALITY_CMD is empty or malformed we can fall back to a safe default (npm, ["run", "check"]).

Concretely:

Add an import for shell-quote at the top of packages/sdk-ts/src/examples/ralph-loop.ts.

Replace the execSync(QUALITY_CMD, ...) call in runQualityChecks with:

parsing QUALITY_CMD using shellQuote.parse,

deriving cmd and args,

using execFileSync instead of execSync with those arguments.

Keep error handling the same (capture stdout from the error when possible).

This change removes the uncontrolled shell command issue without altering the semantics for normal use and respects the existing configuration mechanism.

Remove `cargo test --test 'stress_*'` since there are no stress tests in the codebase yet. This was causing CI failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The sdk-ts package doesn't have a package-lock.json since it's newly added and not yet integrated into root workspace. Use npm install instead of npm ci, and set working-directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The existing packages/sdk already has the name @agent-relay/sdk. Renamed new package to @agent-relay/broker-sdk to avoid conflict. Also updated package-lock.json to include the new workspace. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The build script was trying to run `cargo build` but Rust isn't installed in the standard Node.js CI jobs. Split the build: - `build` - TypeScript compilation only (no Rust required) - `build:full` - TypeScript + Rust binary bundling Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

The expression &mapped.text[..120] at src/main.rs:1462 uses byte-based indexing to truncate the relay message text for display. If the 120th byte falls in the middle of a multi-byte UTF-8 character (e.g., emoji, CJK, accented characters), this will panic at runtime with byte index 120 is not a char boundary.

Root Cause and Impact

The code checks mapped.text.len() > 120 (which is a byte length check) and then slices at byte position 120 with &mapped.text[..120]. For ASCII-only text this works fine, but any multi-byte character spanning the boundary will cause a panic.

The codebase already has a floor_char_boundary helper function (defined at src/main.rs:2910) that safely finds the nearest valid character boundary. The fix is to use it:

let boundary = floor_char_boundary(&mapped.text, 120);
format!("{}…", &mapped.text[..boundary])

Impact: The broker process crashes when a relay message containing multi-byte characters is received in listen mode, killing all managed agent sessions.

View 13 additional findings in Devin Review.

The broker-sdk tests use Node.js built-in test runner, not vitest. Exclude packages/sdk-ts from vitest config to prevent test failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

The expression &mapped.text[..120] at src/main.rs:1462 uses byte-based indexing to truncate the relay message text for display. If the 120th byte falls in the middle of a multi-byte UTF-8 character (e.g., emoji, CJK, accented characters), this will panic at runtime with byte index 120 is not a char boundary.

Root Cause and Impact

The code checks mapped.text.len() > 120 (which is a byte length check) and then slices at byte position 120 with &mapped.text[..120]. For ASCII-only text this works fine, but any multi-byte character spanning the boundary will cause a panic.

The codebase already has a floor_char_boundary helper function (defined at src/main.rs:2910) that safely finds the nearest valid character boundary. The fix is to use it:

let boundary = floor_char_boundary(&mapped.text, 120);
format!("{}…", &mapped.text[..boundary])

Impact: The broker process crashes when a relay message containing multi-byte characters is received in listen mode, killing all managed agent sessions.

View 13 additional findings in Devin Review.

Applied rustfmt formatting: - telemetry.rs: Break long method chains across lines - main.rs: Format function signature across multiple lines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

The expression &mapped.text[..120] at src/main.rs:1462 uses byte-based indexing to truncate the relay message text for display. If the 120th byte falls in the middle of a multi-byte UTF-8 character (e.g., emoji, CJK, accented characters), this will panic at runtime with byte index 120 is not a char boundary.

Root Cause and Impact

The code checks mapped.text.len() > 120 (which is a byte length check) and then slices at byte position 120 with &mapped.text[..120]. For ASCII-only text this works fine, but any multi-byte character spanning the boundary will cause a panic.

The codebase already has a floor_char_boundary helper function (defined at src/main.rs:2910) that safely finds the nearest valid character boundary. The fix is to use it:

let boundary = floor_char_boundary(&mapped.text, 120);
format!("{}…", &mapped.text[..boundary])

Impact: The broker process crashes when a relay message containing multi-byte characters is received in listen mode, killing all managed agent sessions.

View 13 additional findings in Devin Review.

Adds a workflow that automatically runs `cargo fmt` on PRs that touch Rust files. If formatting changes are needed, it commits them directly to the PR branch. This ensures Rust code is always properly formatted without manual intervention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

The expression &mapped.text[..120] at src/main.rs:1462 uses byte-based indexing to truncate the relay message text for display. If the 120th byte falls in the middle of a multi-byte UTF-8 character (e.g., emoji, CJK, accented characters), this will panic at runtime with byte index 120 is not a char boundary.

Root Cause and Impact

The code checks mapped.text.len() > 120 (which is a byte length check) and then slices at byte position 120 with &mapped.text[..120]. For ASCII-only text this works fine, but any multi-byte character spanning the boundary will cause a panic.

The codebase already has a floor_char_boundary helper function (defined at src/main.rs:2910) that safely finds the nearest valid character boundary. The fix is to use it:

let boundary = floor_char_boundary(&mapped.text, 120);
format!("{}…", &mapped.text[..boundary])

Impact: The broker process crashes when a relay message containing multi-byte characters is received in listen mode, killing all managed agent sessions.

View 15 additional findings in Devin Review.

Fixed potential panic when truncating relay message text for display. The code was using byte-index slicing which can panic if the index falls in the middle of a multi-byte UTF-8 character. Now uses floor_char_boundary helper to find safe character boundary. Reported by: Devin Review Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fixes from DeepThinker-2 code review: HIGH priority: - Added floor_char_boundary helper to conversation_log.rs - Fixed 3 UTF-8 panic bugs in truncate_text, pad_or_truncate, short_id - Same class of bug as main.rs fix LOW priority: - Updated README.md to reference correct package name @agent-relay/broker-sdk Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a Rust-based broker implementation and TypeScript SDK to the existing Agent Relay codebase. The changes are additive-only with no removals, enabling incremental migration from the existing TypeScript implementation to Rust. The PR includes 21 Rust source files implementing core broker functionality (authentication, WebSocket handling, process management, telemetry), a TypeScript SDK package with protocol client and high-level API facade, CI workflows for Rust testing and formatting, and Rust coding conventions documentation.

Changes:

Adds complete Rust broker implementation with authentication, WebSocket client, process spawning, message scheduling, and telemetry
Adds TypeScript SDK (@agent-relay/broker-sdk) with protocol client, high-level API, and integration tests
Adds GitHub Actions workflows for Rust CI (test, clippy, fmt) and auto-formatting

Reviewed changes

Copilot reviewed 42 out of 45 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
Cargo.toml	Rust project configuration with dependencies
src/*.rs (21 files)	Core broker modules: auth, config, spawner, telemetry, WebSocket, etc.
packages/sdk-ts/src/*.ts	TypeScript SDK with client, protocol definitions, and high-level API
packages/sdk-ts/package.json	SDK package configuration
.github/workflows/rust-ci.yml	CI workflow for Rust tests and linting
.github/workflows/rust-fmt-fix.yml	Auto-format workflow for Rust code
.claude/rules/rust.md	Rust coding conventions and guidelines
vitest.config.ts	Excludes sdk-ts from vitest (uses Node.js test runner)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T09:09:36Z

+      if (!currentPath.split(path.delimiter).includes(binDir)) {
+        env.PATH = `${binDir}${path.delimiter}${currentPath}`;


The PATH manipulation assumes relay_send is in the same directory as the agent-relay binary, but the comment mentions "relay_send" which is not present in the added files. If relay_send is a separate binary that hasn't been added yet, this could cause runtime failures when workers try to find it.

Suggested change

if (!currentPath.split(path.delimiter).includes(binDir)) {

env.PATH = `${binDir}${path.delimiter}${currentPath}`;

const relaySendPath = path.join(binDir, "relay_send");

if (fs.existsSync(relaySendPath) && !currentPath.split(path.delimiter).includes(binDir)) {

env.PATH = currentPath ? `${binDir}${path.delimiter}${currentPath}` : binDir;

Copilot · 2026-02-15T09:09:36Z

+use sha2::{Digest, Sha256};
+use tokio::sync::mpsc;
+
+const POSTHOG_API_KEY: &str = "phc_2uDu01GtnLABJpVkWw4ri1OgScLU90aEmXmDjufGdqr";


The PostHog API key is hardcoded as a public constant. While this is a write-only key for telemetry ingestion, consider documenting that this is intentional and expected to be public. If this key is exposed and abused, it could result in data pollution or quota exhaustion.

Copilot · 2026-02-15T09:09:36Z

+        cmd.arg("wrap").arg(cli);
+        for arg in extra_args {
+            cmd.arg(arg);
+        }


The cli and extra_args parameters are passed directly to Command without validation. This could allow command injection if these values come from untrusted input. Consider validating or sanitizing these parameters, especially if they originate from network requests or user input.

Copilot · 2026-02-15T09:09:36Z

+    exclude: [
+      '**/node_modules/**',
+      '**/dist/**',
+      'packages/sdk-ts/**', // Uses Node.js test runner, not vitest
+    ],


The packages/sdk-ts/** exclusion is correct for the stated reason (Node.js test runner), but consider verifying that no other TypeScript tests in the repo are being inadvertently excluded. The pattern uses a glob that matches any depth within sdk-ts.

Copilot · 2026-02-15T09:09:37Z

+#[cfg(not(unix))]
+fn set_owner_only_permissions(_path: &Path) -> Result<()> {
+    Ok(())
+}


On Windows (non-Unix), credential files are saved without permission restrictions. This could expose API keys and tokens to other users on the same system. Consider using Windows-specific APIs (e.g., file encryption or ACLs) to restrict access on Windows platforms.

Copilot · 2026-02-15T09:09:37Z

+    "demo": "node dist/examples/demo.js",
+    "ralph": "node dist/examples/ralph-loop.js",
+    "example": "node dist/examples/example.js",
+    "prepack": "npm run build"


The prepack script only runs npm run build, not npm run build:full. This means the binary won't be bundled when users install the package via npm. If the binary is intended to be included in the published package, change prepack to run build:full instead of build.

Suggested change

"prepack": "npm run build"

"prepack": "npm run build:full"

…on script Comprehensive plan to migrate CLI from daemon-based TS stack to Rust broker: - Migration plan with 7-layer delivery guarantees exceeding old orchestrator - Wave-based execution plan (9 waves, 36 short-lived agents) - Executable orchestration script using broker-sdk for agent coordination - 14 beads (agent-relay-549 through 562) tracking all work items Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Extract helpers.rs, pty_worker.rs, wrap.rs from main.rs (~1100 lines moved) (done by Worker-W0-Decompose via broker orchestration) - Add atomic state persistence using tempfile + persist pattern - Add flock guard to prevent concurrent broker instances - Update broker-migration script with binaryPath, comms instructions, exit detection - Add tempfile dep and nix "fs" feature to Cargo.toml All 135 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…erence The 7 auth tests were failing because the real RELAY_API_KEY env var leaked into mock-server tests, causing 404s against httpmock servers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…harness Fix broken agent→orchestrator communication: remove hardcoded RELAY_AGENT_NAME from .mcp.json so each spawned agent registers with its own identity via env var. Add onWorkerOutput hook to SDK for PTY output scanning, fix waitForExit(0) bug. Rewrite orchestration completion detection with event-driven exits + idle timeout. Wave 1 deliverables: broker-harness.ts, assert-helpers.ts, lifecycle/messaging/ events test suites (6 files, ~1300 lines of integration test infrastructure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The gate was failing with "cargo: command not found" because ~/.cargo/bin is not in PATH for subshells. Also gracefully skip integration phase check when run-phase.ts hasn't been created yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Codex workers need full file paths and concrete code snippets to know what to modify. Previous runs produced no code because prompts said "Read the Lead's plan" without giving actionable file locations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 41 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T14:17:10Z

+            if workers.has_worker(&payload.to) {
+                queue_and_try_delivery_raw(
+                    workers,
+                    pending_deliveries,
+                    &payload.to,
+                    &event_id,
+                    &from,
+                    &payload.to,
+                    &payload.text,
+                    payload.thread_id,
+                    priority,
+                    delivery_retry_interval(),
+                )
+                .await?;
+
+                send_ok(
+                    out_tx,
+                    frame.request_id,
+                    json!({
+                        "delivered": true,
+                        "to": payload.to,
+                        "event_id": event_id,
+                    }),
+                )
+                .await?;
+            } else if let Some(http) = relaycast_http {
+                // Target is not a local worker — forward via Relaycast REST API
+                let to = payload.to.clone();
+                let eid = event_id.clone();
+                match http.send(&to, &payload.text).await {


🟡 Broker send_message only recognizes exact local worker names, unlike other delivery paths (case/@Prefix)

The broker’s send_message local-delivery branch uses workers.has_worker(&payload.to), which only matches an exact worker name key.

Elsewhere, inbound WS direct-target delivery supports case-insensitive matching and @name aliasing via worker_names_for_direct_target().

Root Cause / Impact

In send_message, a request like { to: "@Worker1" } or { to: "worker1" } will not be treated as a local target even if the worker exists, and will instead fall through to Relaycast forwarding (src/main.rs:1639-1664). This is inconsistent with WS-driven direct message delivery (src/main.rs:841-853 uses worker_names_for_direct_target).

Impact: local agent-to-agent messaging via the SDK becomes unexpectedly slower and/or fails if Relaycast credentials/connectivity are missing, despite the worker being local.

Prompt for agents

Make send_message resolve local targets using the same logic as WS inbound delivery. In src/main.rs:1639-1664, replace the `workers.has_worker(&payload.to)` / single-worker delivery with: - `let targets = workers.worker_names_for_direct_target(&payload.to, &from);` - if targets not empty, deliver to each target This will add support for case-insensitive matching and `@name` syntax, and aligns SDK send_message with existing WS delivery behavior.

Was this helpful? React with 👍 or 👎 to provide feedback.

Wave 4 completed. Verdict: nt�[15;9HREVIEW:PASS�[15;21Hor�[15;24HREVIEW:FAIL�[15;36Hto�[15;39Hstdout.�[18;3H�[2mtab to queue message�[18;62H100% context left�[39m�[49m�[0m�[?25h�[16;3H�[?2026l Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Codex workers need full absolute paths and concrete code snippets to produce output. Updated Waves 5-8 worker, reviewer prompts with exact file locations, struct definitions, and line number hints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add delivery_queued and delivery_injected events to pty_worker.rs. Fix wrap.rs send_frame errors by replacing with tracing (wrap mode runs as a standalone process without protocol channel access). Update SDK protocol types and add onDeliveryUpdate hook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New src/spawner.rs with session leader spawning (setsid), child process tracking, graceful termination (SIGTERM→SIGKILL), reap_exited, and 5 unit tests including session leader verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Delivery persistence (Wave 6 gap): - Add save_pending_deliveries/load_pending_deliveries with atomic writes - Load pending deliveries on broker startup for crash recovery - Save pending deliveries periodically in reap tick - Clean up pending.json on graceful shutdown Tilde expansion fix: - Add expandTilde() to SDK client.ts - Expand ~ to home directory before fs.existsSync and spawn calls - Fixes "binary not found" for paths like ~/bin/agent-relay Also fix libc::setsid -> nix::libc::setsid in spawner.rs for CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Benchmarks: latency, throughput, reliability, overhead, scale-out Parity tests: broadcast, multi-worker, orch-to-worker, continuity-handoff, stability-soak All parity tests passing, 164 Rust tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 52 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T15:16:01Z

+  /**
+   * Start the broker process, wait for hello_ack.
+   * Creates both a low-level client and a high-level facade.
+   */
+  async start(): Promise<void> {
+    if (this.started) return;
+
+    const clientOpts: AgentRelayClientOptions = {
+      binaryPath: this.opts.binaryPath,
+      channels: this.opts.channels,
+      requestTimeoutMs: this.opts.requestTimeoutMs,
+      shutdownTimeoutMs: this.opts.shutdownTimeoutMs,
+      env: this.opts.env,
+    };
+
+    // Start the low-level client (spawns broker process)
+    this.client = await AgentRelayClient.start(clientOpts);
+
+    // Wire event collection
+    this.unsubEvent = this.client.onEvent((event: BrokerEvent) => {
+      this.events.push(event);
+      for (const listener of this.eventListeners) {
+        listener(event);
+      }
+    });
+
+    // Create a high-level facade sharing the same binary/options
+    this.relay = new AgentRelay({
+      binaryPath: this.opts.binaryPath,
+      channels: this.opts.channels,
+      requestTimeoutMs: this.opts.requestTimeoutMs,
+      shutdownTimeoutMs: this.opts.shutdownTimeoutMs,
+      env: this.opts.env,
+    });
+
+    this.started = true;
+  }


🟡 Integration BrokerHarness can inadvertently start a second broker process (double-lock / port conflicts)

The integration test harness constructs both a low-level AgentRelayClient (which spawns a broker process) and a high-level AgentRelay facade (which can also spawn its own broker process) but does not share a single underlying process between them.

BrokerHarness.start() always starts this.client = await AgentRelayClient.start(...) (spawns a broker) (/home/ubuntu/repos/repo-0f9455b42bf349aca671d3e83bb52228/tests/integration/broker/utils/broker-harness.ts:75-107).

It then creates this.relay = new AgentRelay({...}) with the same binaryPath etc. (/home/ubuntu/repos/repo-0f9455b42bf349aca671d3e83bb52228/tests/integration/broker/utils/broker-harness.ts:97-105). If any test later calls harness.relay.*, AgentRelay will call ensureStarted() and spawn a second broker process (packages/sdk-ts/src/relay.ts:267-279).

Actual vs Expected / Impact

Actual:

Some tests may end up with two broker processes in the same working directory. The Rust broker uses a directory lock (.agent-relay/broker.lock), so the second broker may fail to start or may cause flaky behavior depending on timing.

Expected:

The harness should either:

only use one API (client or facade) per harness instance, or

allow the facade to reuse the already-started client/process (not currently supported by the facade API).

Impact:

Flaky integration tests and potential spurious failures that look like broker bugs.

Prompt for agents

In /home/ubuntu/repos/repo-0f9455b42bf349aca671d3e83bb52228/tests/integration/broker/utils/broker-harness.ts, avoid creating both a standalone AgentRelayClient and AgentRelay facade unless you can guarantee only one starts a broker. The simplest fix: remove the AgentRelay facade from BrokerHarness entirely (and update tests to use only AgentRelayClient methods), or lazily construct relay only when needed and document that it must not be used in the same harness instance. Alternatively, enhance packages/sdk-ts/src/relay.ts to accept an existing AgentRelayClient instance (shared stdio) so no second process is spawned, then update BrokerHarness to pass it through.

Was this helpful? React with 👍 or 👎 to provide feedback.

- Delete 7 spec/proposal/migration planning docs (superseded by code) - Delete orphaned activity.rs and throttle.rs (never compiled, stress tests confirm simple ThrottleState is sufficient at 100% success rate) - Delete broken cold-start.ts (uses old relay_inbound event) - Remove pub mod spawner from lib.rs (only used by binary crate) - Remove dead delivery_lifecycle_event_series function - Remove unused request_id field from PendingActivity - Add stress test suite (burst, contention, steady-state, spawn/release) - Zero compiler warnings, 158 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- ensureStarted now clears startPromise on failure, allowing retry - send_message response includes targets array matching SDK type expectation - Both local delivery and relaycast publish paths return targets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add 8 missing BrokerEvent variants to protocol.ts (delivery_verified, delivery_failed, delivery_active, delivery_ack, worker_ready, worker_error, relaycast_published, relaycast_publish_failed) - Make reap_exited non-fatal: log warning and skip cycle instead of crashing entire broker on transient OS error - Resolve pending exitResolvers on shutdown to prevent promise leaks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix Clippy lint errors: - src/pty_worker.rs:254 - use is_some() instead of if let Some(_) - src/wrap.rs:490 - use is_some() instead of if let Some(_) This resolves the clippy::redundant-pattern-matching warnings. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 54 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T18:16:39Z

+            let body2: Value = res2.json().await?;
+            body2
+                .pointer("/data/token")
+                .or_else(|| body2.get("token"))
+                .and_then(Value::as_str)
+                .map(String::from)
+                .ok_or_else(|| anyhow::anyhow!("no token in register response"))?


🟡 Missing HTTP status check on 409-conflict retry in ensure_token hides real errors

When ensure_token in RelaycastHttpClient gets a 409 Conflict on the first agent registration, it retries with a suffixed name. However, the retry response res2 is never checked for HTTP error status.

Detailed Explanation

If the retry registration also fails (e.g., returns 401 Unauthorized or 500 Internal Server Error), the code at relaycast_ws.rs:291 attempts res2.json().await? on a potentially non-JSON error body. If the body is JSON but has no data.token field, the error message is the misleading "no token in register response" instead of the actual HTTP status code. If the body is not valid JSON (e.g., an HTML error page), the error is an opaque deserialization failure.

Compare this to the success path at line 298 (else if !status.is_success()) which correctly surfaces the HTTP status and response body.

Actual: Retry path on 409 conflict silently tries to parse any response body as a token, producing confusing error messages when the retry itself fails.
Expected: The retry should check res2.status().is_success() and bail with a clear error if it's not, similar to the primary registration path.

Impact: Users debugging Relaycast connectivity issues on the retry path would see misleading errors like "no token in register response" or JSON parse errors instead of the actual HTTP status, making diagnosis harder.

Suggested change

let body2: Value = res2.json().await?;

body2

.pointer("/data/token")

.or_else(|| body2.get("token"))

.and_then(Value::as_str)

.map(String::from)

.ok_or_else(|| anyhow::anyhow!("no token in register response"))?

let body2: Value = res2.json().await?;

if !status.is_success() {

anyhow::bail!(

"relaycast fallback register failed ({}): {}",

status,

serde_json::to_string(&body2).unwrap_or_default()

);

}

body2

.pointer("/data/token")

.or_else(|| body2.get("token"))

.and_then(Value::as_str)

.map(String::from)

.ok_or_else(|| anyhow::anyhow!("no token in register response"))?

Was this helpful? React with 👍 or 👎 to provide feedback.

Add unit tests for helpers.rs (30 tests: codex model prompt, gemini action, CLI readiness, terminal query parser, throttle, strip_ansi, format_injection) and main.rs (12 bypass flag tests). Add real CLI integration tests: stress (5 tests), edge cases (7 tests), functionality (8 tests), and extended cli-spawn (4 tests). Add utility helpers: firstAvailableCli, skipUnlessAnyCli, eventsForAgent. Add head-to-head benchmark comparing broker PTY vs relay-pty across latency, throughput, reliability, multi-agent scaling, and feature matrix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 57 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T21:03:28Z

+    try {
+      await wait;
+    } finally {
+      clearTimeout(timeout);
+      if (this.child) {
+        this.child.kill("SIGKILL");
+      }
+    }


🟡 SIGKILL escalation in SDK client shutdown is dead code — shutdown can hang indefinitely

In the AgentRelayClient.shutdown() method, the SIGKILL sent in the finally block at line 243-244 can never actually fire on a live process because this.child is always undefined by the time the finally block runs.

Root Cause

The exitPromise (line 302-316 in packages/sdk-ts/src/client.ts) calls this.disposeProcessHandles() which sets this.child = undefined before calling resolve(). So when await wait at line 240 completes, this.child is already undefined, making the if (this.child) check at line 243 always false.

This means if the broker process ignores SIGTERM (sent at line 235), there is no escalation to SIGKILL. The await wait at line 240 will hang forever because exitPromise never resolves, and the SIGKILL that was supposed to be the safety net can only run after the promise resolves (i.e., after the process has already exited).

Actual behavior: If the broker process hangs after receiving the shutdown request and SIGTERM, shutdown() blocks indefinitely with no SIGKILL escalation.

Expected behavior: After shutdownTimeoutMs, the process should receive SIGTERM, and if it still hasn't exited after a further grace period, it should be forcefully killed with SIGKILL.

Impact: Callers of shutdown() can hang indefinitely when the broker process is unresponsive to SIGTERM, potentially causing the parent process (SDK consumer) to also hang.

Prompt for agents

In packages/sdk-ts/src/client.ts, the shutdown() method at lines 220-247 needs to be restructured so that SIGKILL is sent as a timed escalation rather than in a finally block that only runs after the process has already exited. The fix should: 1. After sending SIGTERM (line 235), set a second timeout (e.g. shutdownTimeoutMs * 2 or shutdownTimeoutMs + 2000) that sends SIGKILL to the captured `child` reference if the process still hasn't exited. 2. Both timeouts should be cleared in the finally block. 3. The finally block should still call child.kill('SIGKILL') as a last resort, but use the captured `child` reference (line 231) instead of `this.child` (which gets nulled by disposeProcessHandles). Example approach: const killTimeout = setTimeout(() => { child.kill('SIGKILL'); }, this.options.shutdownTimeoutMs + 2000); try { await wait; } finally { clearTimeout(timeout); clearTimeout(killTimeout); }

Was this helpful? React with 👍 or 👎 to provide feedback.

…cho_in_output pty_worker.rs and wrap.rs import PendingVerification, check_echo_in_output, MAX_VERIFICATION_ATTEMPTS, and VERIFICATION_WINDOW from helpers.rs but they were either missing or gated behind #[cfg(test)]. This caused compilation failure. Also adds stderr logging to head-to-head benchmark for debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 56 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T21:22:11Z

+                        chars.next();
+                        if nc.is_ascii_alphabetic() || nc == '@' || nc == '`' {
+                            break;
+                        }
+                    }


🟡 strip_ansi missing ~ as CSI terminator causes character consumption past escape sequences

The strip_ansi function in helpers.rs does not recognize ~ (0x7E) as a valid CSI final byte. Per the ECMA-48 standard, CSI sequences terminate with bytes in the range 0x40–0x7E, but the code only checks for is_ascii_alphabetic() || nc == '@' || nc == '\', which misses ~, {, |, }, [, `, ], ^, and _.

Root Cause and Impact

The terminator check at src/helpers.rs:313 is:

if nc.is_ascii_alphabetic() || nc == '@' || nc == '`' { break; }

For a ~-terminated CSI sequence like \x1b[15~ (F5 key), the parser consumes [, 1, 5, and ~ — but since ~ is not recognized as a terminator, it does not break the loop. The loop continues, consuming the next real character after the escape sequence.

For example, if PTY output contains \x1b[15~Relay message from Alice [evt_1]: hello, the function strips \x1b[15~ plus the R, producing elay message from Alice [evt_1]: hello. This corrupted output is then checked by check_echo_in_output (src/helpers.rs:144), which calls strip_ansi internally — the expected echo "Relay message from Alice [evt_1]: hello" won't match, causing a false-negative verification failure.

Impact: In practice, AI CLI output rarely contains ~-terminated CSI sequences (those are mostly input sequences), so the impact is limited to edge cases. However, if such sequences appear in PTY output, delivery verification (check_echo_in_output), CLI readiness detection (detect_cli_ready), and activity detection could all produce incorrect results.

Suggested change

chars.next();

if nc.is_ascii_alphabetic() || nc == '@' || nc == '`' {

break;

}

}

while let Some(&nc) = chars.peek() {

chars.next();

if nc.is_ascii_alphabetic()

|| nc == '@'

|| nc == '`'

|| nc == '~'

|| nc == '{'

|| nc == '|'

|| nc == '}'

{

break;

}

}

Was this helpful? React with 👍 or 👎 to provide feedback.

- Gate CLI_READY_TIMEOUT and detect_cli_ready with #[cfg(test)] since they're only used in tests - Collapse identical gemini/aider/default prompt pattern branches into a single else clause (clippy::if_same_then_else) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 64 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T21:52:30Z

+  async shutdown(): Promise<void> {
+    if (this.unsubEvent) {
+      this.unsubEvent();
+      this.unsubEvent = undefined;
+    }
+    if (this.client) {
+      await this.client.shutdown();
+      this.client = undefined;
+    }
+    this.knownAgents.clear();
+    for (const resolve of this.exitResolvers.values()) {
+      resolve("released");
+    }
+    this.exitResolvers.clear();
+  }


🟡 AgentRelay.shutdown() leaks broker child process when startPromise is in-flight

When shutdown() is called on AgentRelay while ensureStarted() is still in-flight (the startPromise exists but hasn't resolved), the method checks this.client which is still undefined, skips the shutdown, and returns. When the startPromise later resolves, it sets this.client and calls wireEvents(c), leaving a running broker child process with no cleanup.

Root Cause

The shutdown() method at packages/sdk-ts/src/relay.ts:230-244 only checks this.client but does not check or await this.startPromise:

async shutdown(): Promise<void> { if (this.unsubEvent) { ... } if (this.client) { // <-- undefined while startPromise is pending await this.client.shutdown(); this.client = undefined; } ... }

Meanwhile, ensureStarted() at line 271-288 stores the in-flight promise:

this.startPromise = AgentRelayClient.start(this.clientOptions) .then((c) => { this.client = c; // sets client AFTER shutdown returned this.startPromise = undefined; this.wireEvents(c); // subscribes to events on zombie process return c; });

Trigger scenario: Any code path that calls shutdown() before ensureStarted() resolves — e.g., error handling during initialization, signal handlers, or concurrent operations.

Impact: The spawned agent-relay broker binary continues running as an orphan child process. The caller believes cleanup is complete, but the process persists until the parent Node.js process exits.

Suggested change

async shutdown(): Promise<void> {

if (this.unsubEvent) {

this.unsubEvent();

this.unsubEvent = undefined;

}

if (this.client) {

await this.client.shutdown();

this.client = undefined;

}

this.knownAgents.clear();

for (const resolve of this.exitResolvers.values()) {

resolve("released");

}

this.exitResolvers.clear();

}

async shutdown(): Promise<void> {

// If the client is still starting, await (or cancel) the in-flight start

// before proceeding with shutdown to avoid leaking the broker process.

if (this.startPromise) {

try {

await this.startPromise;

} catch {

// Start failed — nothing to shut down.

}

}

if (this.unsubEvent) {

this.unsubEvent();

this.unsubEvent = undefined;

}

if (this.client) {

await this.client.shutdown();

this.client = undefined;

}

this.knownAgents.clear();

for (const resolve of this.exitResolvers.values()) {

resolve("released");

}

this.exitResolvers.clear();

}

Was this helpful? React with 👍 or 👎 to provide feedback.

github-advanced-security AI found potential problems Feb 15, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

Agent Relay and others added 4 commits February 15, 2026 06:48

fix: remove stress test target that doesn't exist

029df19

Remove `cargo test --test 'stress_*'` since there are no stress tests in the codebase yet. This was causing CI failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration Bot reviewed Feb 15, 2026

View reviewed changes

fix: exclude sdk-ts from vitest (uses Node test runner)

a92bdd6

The broker-sdk tests use Node.js built-in test runner, not vitest. Exclude packages/sdk-ts from vitest config to prevent test failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration Bot reviewed Feb 15, 2026

View reviewed changes

fix: apply cargo fmt formatting to Rust code

5800e86

Applied rustfmt formatting: - telemetry.rs: Break long method chains across lines - main.rs: Format function signature across multiple lines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

devin-ai-integration Bot reviewed Feb 15, 2026

View reviewed changes

Agent Relay and others added 2 commits February 15, 2026 08:36

khaliqgant requested review from Copilot and willwashburn February 15, 2026 09:06

Copilot started reviewing on behalf of khaliqgant February 15, 2026 09:06 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

khaliqgant and others added 5 commits February 16, 2026 10:35

chore: add root target/ to gitignore for Rust build artifacts

bd7c22d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

broker migration

2c5c319

style: auto-format Rust code with cargo fmt

c596a33

This comment was marked as resolved.

Sign in to view

khaliqgant and others added 2 commits February 16, 2026 11:26

fix: clear RELAY_API_KEY env var in auth tests to prevent mock interf…

a6255cb

…erence The 7 auth tests were failing because the real RELAY_API_KEY env var leaked into mock-server tests, causing 404s against httpmock servers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

devin-ai-integration Bot reviewed Feb 16, 2026

View reviewed changes

khaliqgant and others added 5 commits February 16, 2026 15:23

style: auto-format Rust code with cargo fmt

6c9731b

style: auto-format Rust code with cargo fmt

647965d

This comment was marked as resolved.

Sign in to view

khaliqgant and others added 4 commits February 16, 2026 15:42

chore: update migration state - Waves 7-8 complete

ea17614

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration Bot reviewed Feb 16, 2026

View reviewed changes

khaliqgant and others added 6 commits February 16, 2026 17:38

benchmark and lock file updates and config

672cd10

style: auto-format Rust code with cargo fmt

d1e8c6c

devin-ai-integration Bot reviewed Feb 16, 2026

View reviewed changes

khaliqgant and others added 2 commits February 16, 2026 21:55

style: auto-format Rust code with cargo fmt

bce15b4

devin-ai-integration Bot reviewed Feb 16, 2026

View reviewed changes

khaliqgant and others added 2 commits February 16, 2026 22:38

reviewers test

dfe9686

devin-ai-integration Bot reviewed Feb 16, 2026

View reviewed changes

willwashburn merged commit a38e45c into main Feb 17, 2026
38 checks passed

willwashburn deleted the feat/additive-broker-sdk branch February 17, 2026 05:04

@@ -26,7 +26,8 @@
              *   https://ghuntley.com/ralph/
              */
             import fs from "node:fs";
-            import { execSync } from "node:child_process";
+            import { execFileSync } from "node:child_process";
+            import shellQuote from "shell-quote";
             import { AgentRelay, type Agent, type Message } from "../relay.js";
             // ── Types ───────────────────────────────────────────────────────────────────
@@ -81,7 +82,9 @@
             function runQualityChecks(): { passed: boolean; output: string } {
               try {
-                const output = execSync(QUALITY_CMD, { encoding: "utf-8", stdio: "pipe" });
+                const parsed = shellQuote.parse(QUALITY_CMD).map(String);
+                const [cmd, ...args] = parsed.length > 0 ? parsed : ["npm", "run", "check"];
+                const output = execFileSync(cmd, args, { encoding: "utf-8", stdio: "pipe" });
                 return { passed: true, output };
               } catch (err: unknown) {
                 const output = (err as { stdout?: string }).stdout ?? String(err);

@@ -28,6 +28,7 @@
                 "typescript": "^5.7.3"
               },
               "dependencies": {
-                "@relaycast/sdk": "^0.2.1"
+                "@relaycast/sdk": "^0.2.1",
+                "shell-quote": "^1.8.3"
               }
             }

Package	Version	Security advisories
shell-quote (npm)	1.8.3	None

		if (!currentPath.split(path.delimiter).includes(binDir)) {
		env.PATH = `${binDir}${path.delimiter}${currentPath}`;

-      if (!currentPath.split(path.delimiter).includes(binDir)) {
-        env.PATH = `${binDir}${path.delimiter}${currentPath}`;
+      const relaySendPath = path.join(binDir, "relay_send");
+      if (fs.existsSync(relaySendPath) && !currentPath.split(path.delimiter).includes(binDir)) {
+        env.PATH = currentPath ? `${binDir}${path.delimiter}${currentPath}` : binDir;

Conversation

khaliqgant commented Feb 15, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's Added

What's NOT Changed

Migration Strategy

Test Plan

Uh oh!

Check warning

Uh oh!

Copilot Autofix

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

🐛 Byte-index slicing of mapped.text panics on multi-byte UTF-8 characters (src/main.rs:1462)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

🐛 Byte-index slicing of mapped.text panics on multi-byte UTF-8 characters (src/main.rs:1462)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

🐛 Byte-index slicing of mapped.text panics on multi-byte UTF-8 characters (src/main.rs:1462)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

🐛 Byte-index slicing of mapped.text panics on multi-byte UTF-8 characters (src/main.rs:1462)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

khaliqgant commented Feb 15, 2026 •

edited by devin-ai-integration Bot

Loading

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)

🐛 Byte-index slicing of `mapped.text` panics on multi-byte UTF-8 characters (`src/main.rs:1462`)