feat(harness-driver): add lifecycle-aware SpawnedAgentHandle by willwashburn · Pull Request #1050 · AgentWorkforce/relay

willwashburn · 2026-06-04T14:46:31Z

User description

What

HarnessDriverClient.spawnPty() / spawnCli() / spawnHeadless() (and spawnClaude/spawnOpencode) now return a SpawnedAgentHandle instead of a bare SpawnAgentResult.

The handle is a structural superset of SpawnAgentResult (it still carries name / runtime / sessionId / pid), so existing callers are unaffected. On top of that it adds the promise-based lifecycle operations consumers previously had to reconstruct by hand from the raw broker event stream:

waitForExit(timeoutMs?) → { reason: 'exited' | 'timeout', code?, signal? }
waitForIdle(timeoutMs?) → { reason: 'idle' | 'exited' | 'timeout', idleSecs?, exit? }
exit / exitCode / exitSignal — synchronous view of a prior exit
release(reason?)

Why

Downstream consumers (e.g. the relayflows workflow runner being migrated off the v7 SDK) need an await agent.waitForExit() ergonomic with the process exit code. The v7 SDK Agent provided this; v8's spawnPty() returned a method-less SpawnAgentResult, forcing each consumer to re-implement the event→promise bridge — including the fiddly bits (listener-leak cleanup, exit-code capture from the agent_exited BrokerEvent vs the method-less agentExited payload, and the "already exited before you awaited" replay case).

The driver is the right home for this: it owns the ordered broker event stream and the event history (getLastEvent/queryEvents), so it can implement these replay-correctly — awaiting after a prior exit resolves immediately from history instead of hanging.

Notes

Exit code/signal are sourced from the agent_exited BrokerEvent (the named agentExited event payload is a method-less DriverAgent without them).
waitForIdle is edge-triggered (resolves on the next idle signal), matching how runners poll-then-nudge.

Verification

harness-driver typechecks (tsc --noEmit) and builds.
harness-driver test suite passes (26 tests).
Direct dependents @agent-relay/harnesses and @agent-relay/sdk typecheck clean (0 errors) against the widened return type.

🤖 Generated with Claude Code

CodeAnt-AI Description

Add lifecycle-aware agent handles and publish harnesses safely

What Changed

Spawning an agent now returns a handle that can wait for exit, wait for the next idle signal, read the last exit code or signal, and release the agent.
Exit and idle waits now resolve from prior event history too, so they work even if the agent already exited before the wait started.
@agent-relay/harnesses is now published to npm, so external users can install the prebuilt PTY harnesses and harness-authoring helpers.
Release creation now waits for harness publishing to succeed, so a failed harness publish blocks the release instead of being missed.

Impact

✅ Easier agent shutdown handling
✅ Fewer hangs when checking exited agents
✅ Installable harness package on npm

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@agent-relay/harnesses was set up as a public package (no private flag, publishConfig.access public, versioned in lockstep) but was never wired into the publish workflow, so it never reached npm. External SDK consumers (e.g. relayflows) need it for the prebuilt PTY harnesses and the definePtyHarness/createHuman author helpers. Add a publish-harnesses job to the package=all path. It runs after publish-packages — where its exact-version workspace deps (@agent-relay/sdk, @agent-relay/harness-driver) land on the registry — so an external `npm install @agent-relay/harnesses@<v>` can always resolve its dependencies. Mirrors the existing provenance + skip-if-exists publish pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@v4

Address PR review: publish-harnesses ran outside the release gate, so a tag/release could be cut even if harness publishing failed. - create-release now needs publish-harnesses and its `if` requires the job to not have failed. It tolerates `skipped` so package=main releases (where publish-harnesses does not run) are not blocked. - summary job lists the harness publish result. - Trim the changelog bullet to impact-first per the repo changelog rule. Leaving the new job's actions on @v4 tags to match the rest of the workflow (the repo uses tag refs throughout; SHA-pinning would be inconsistent and is not the enforced policy here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`spawnPty`/`spawnCli`/`spawnHeadless` now return a `SpawnedAgentHandle` instead of a bare `SpawnAgentResult`. The handle is a structural superset (still carries name/runtime/sessionId/pid) so existing callers are unaffected, and it adds the promise-based lifecycle operations consumers previously had to reconstruct from the raw broker event stream: - waitForExit() -> { reason, code, signal } - waitForIdle() -> { reason, idleSecs } - exit / exitCode / exitSignal (synchronous view of a prior exit) - release() All operations are backed by the client's broker event stream and its event history, so they are replay-correct: awaiting after the agent has already exited resolves immediately from history rather than hanging. Typechecks clean; harness-driver tests pass; dependents (harnesses, sdk) unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-06-04T14:46:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

codeant-ai · 2026-06-04T14:46:35Z

CodeAnt AI is reviewing your PR.

coderabbitai · 2026-06-04T14:46:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cf62a625-2034-42c1-a73e-3f3ae0a9e2bc

📥 Commits

Reviewing files that changed from the base of the PR and between 639e8a9 and 991086d.

📒 Files selected for processing (1)

packages/harness-driver/src/agent-handle.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/harness-driver/src/agent-handle.ts

📝 Walkthrough

Walkthrough

This PR introduces a lifecycle-aware SpawnedAgentHandle wrapper for spawned agents with promise-based exit and idle waiting methods, integrates it into client spawn APIs, and establishes npm publication for the @agent-relay/harnesses package via a new workflow job with conditional release gating.

Changes

Agent lifecycle wrapper and harnesses publication

Layer / File(s)	Summary
Agent lifecycle types and SpawnedAgentHandle implementation `packages/harness-driver/src/agent-handle.ts`	`AgentExitInfo` and `AgentIdleInfo` types standardize broker exit and idle outcomes. `SpawnedAgentHandle` wraps `SpawnAgentResult` and exposes synchronous exit state via broker history, plus async `waitForExit(timeoutMs?)` and `waitForIdle(timeoutMs?)` methods with listener/timer cleanup on resolution. `matchExit` normalizes broker exit events into `AgentExitInfo`.
Client spawn methods return SpawnedAgentHandle `packages/harness-driver/src/client.ts`	`spawnPty`, `spawnCli`, `spawnHeadless`, `spawnClaude`, and `spawnOpencode` now return `Promise<SpawnedAgentHandle>` instead of `Promise<SpawnAgentResult>`, wrapping successful spawn results in the new handle.
Public API export of agent-handle module `packages/harness-driver/src/index.ts`	`./agent-handle.js` is re-exported from the harness-driver public surface, exposing `SpawnedAgentHandle`, `AgentExitInfo`, and `AgentIdleInfo`.
npm publication workflow and changelog `.github/workflows/publish.yml`, `CHANGELOG.md`	New `publish-harnesses` job publishes `@agent-relay/harnesses` after `publish-packages` completes (gated to `package=all`, supports `dry_run`, skips if version exists). `create-release` and `summary` jobs updated to depend on and report `publish-harnesses` status, with release blocked if publish fails. Changelog documents the new package availability.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

AgentWorkforce/relay#1049: Companion PR that introduces the publish-harnesses job and publication workflow configuration referenced in this PR's workflow updates.

Suggested labels

size:M

Suggested reviewers

khaliqgant
barryollama

Poem

🐰 I stitched a handle, neat and light,
To watch the agent through the night.
If idle comes or processes fall,
I’ll whisper events and answer the call.
Now harnesses publish, ready for flight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a lifecycle-aware SpawnedAgentHandle to the harness-driver, which is the primary feature across multiple modified files.
Description check	✅ Passed	The description includes comprehensive 'What' and 'Why' sections explaining the changes and motivation, but lacks the 'Test Plan' section from the template with test status checkboxes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/harness-driver-agent-lifecycle-handle

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0924259c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T14:49:56Z

+      const unsubIdle = this.client.addListener('agentIdle', (payload) => {
+        if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs });


Listen to broker idle events directly

In the normal client path, broker websocket frames are delivered through client.onEvent; I couldn't find any bridge that emits agentIdle on this EventBus from incoming agent_idle frames. Since the broker protocol sends idle as kind: 'agent_idle' with idle_secs, waitForIdle() will never resolve as idle for real broker idle signals unless some external caller manually emits this lifecycle event, so harnesses using the new handle will hang until timeout or exit.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T14:49:56Z

+  if (event.kind === 'agent_exit' && event.name === name) {
+    return { reason: 'exited' };


Stop treating agent_exit as process exit

For PTY/headless agents, the broker emits agent_exit when the worker reports an exit/close reason, and then maintenance emits the real agent_exited event with code/signal; the Python/Swift SDKs also model agent_exit separately as an exit-requested event. Resolving waitForExit() on agent_exit means callers waiting before shutdown commonly get { reason: 'exited' } before the process has actually reached agent_exited, losing the exit code/signal this API promises and potentially continuing cleanup too early.

Useful? React with 👍 / 👎.

codeant-ai · 2026-06-04T14:51:51Z

+      const unsubIdle = this.client.addListener('agentIdle', (payload) => {
+        if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs });
+      });


🔴 Architect Review — CRITICAL

waitForIdle() relies on client.addListener('agentIdle', ...), but no code in HarnessDriverClient (or elsewhere in this repo) ever emits agentIdle onto its eventBus, so in direct-client usage idle waits will never see idle signals and can hang indefinitely (or only ever resolve via exit/timeout).

Suggestion: Either bridge broker agent_idle events into the client's eventBus (emitting agentIdle with an AgentIdlePayload), or change waitForIdle() to derive idleness directly from the broker event stream/history in the same way exit is handled via onEvent and getLastEvent.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood. **Path:** packages/harness-driver/src/agent-handle.ts **Line:** 122:124 **Comment:** *CRITICAL: `waitForIdle()` relies on `client.addListener('agentIdle', ...)`, but no code in `HarnessDriverClient` (or elsewhere in this repo) ever emits `agentIdle` onto its `eventBus`, so in direct-client usage idle waits will never see idle signals and can hang indefinitely (or only ever resolve via exit/timeout). Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/publish.yml:
- Around line 980-990: Pin the GH Actions used in the `publish-harnesses` job
and disable checkout credential persistence: replace the floating uses of
actions/checkout@v4, actions/setup-node@v4, and actions/download-artifact@v4
with the corresponding commit SHAs (use the latest verified commit SHAs for each
action) and update the Checkout step (the step named "Checkout code") to include
persist-credentials: false so runner credentials are not left on the workspace;
ensure you update the steps that reference those actions (Checkout code, Setup
Node.js, Download build artifacts) to use the SHA-pinned references.

In `@packages/harness-driver/src/agent-handle.ts`:
- Around line 84-132: Both waitForExit and waitForIdle currently subscribe to
future events but don't ensure the client's event stream or per-worker
subscription are active; make each method async, and before creating the new
Promise call await this.client.connectEvents() and await
this.client.subscribeWorkerStream(this.name) (or the equivalent
ensure-subscription API on HarnessDriverClient) so the WS/event stream and
worker-specific stream are started; update the signatures of waitForExit and
waitForIdle to async Promise<AgentExitInfo> / async Promise<AgentIdleInfo> so
callers still get a promise.
- Around line 122-128: waitForIdle() is subscribing to
client.addListener('agentIdle', ...) which listens to the internal eventBus that
never receives broker agent_idle events; change the idle subscription to use the
broker stream handler client.onEvent(...) (the same path used by
connectEvents/onEvent) and detect the broker agent_idle event there (similar to
how matchExit(event, this.name) is used) so that settle({ reason: 'idle',
idleSecs: ... }) is called when the broker agent_idle for this.name arrives;
keep the existing exit handling via client.onEvent and remove or replace the
client.addListener('agentIdle', ...) subscription.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1863fb47-d6bd-44d8-a1e2-4b1f3d5fdd51

📥 Commits

Reviewing files that changed from the base of the PR and between a14f65f and b092425.

📒 Files selected for processing (5)

.github/workflows/publish.yml
CHANGELOG.md
packages/harness-driver/src/agent-handle.ts
packages/harness-driver/src/client.ts
packages/harness-driver/src/index.ts

coderabbitai · 2026-06-04T14:55:17Z

+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: '22.14.0'
+          registry-url: 'https://registry.npmjs.org'
+
+      - name: Download build artifacts
+        uses: actions/download-artifact@v4


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking changed publish-harnesses block for unpinned actions..." awk ' /publish-harnesses:/, /^ [a-zA-Z0-9_-]+:/ { if ($0 ~ /uses:\s*[^@]+@v[0-9]+/) print NR ":" $0 } ' .github/workflows/publish.yml echo echo "Checking whether checkout in publish-harnesses sets persist-credentials: false..." awk ' /publish-harnesses:/, /^ [a-zA-Z0-9_-]+:/ { print NR ":" $0 } ' .github/workflows/publish.yml | sed -n '/Checkout code/,/Setup Node.js/p'

Repository: AgentWorkforce/relay

Length of output: 212

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/publish.yml" echo "Locating publish-harnesses in $FILE..." rg -n "^\s*publish-harnesses:" "$FILE" || true echo echo "Showing around lines 950-1030 (per review comment reference)..." nl -ba "$FILE" | sed -n '950,1030p' echo echo "Searching for persist-credentials in publish.yml..." rg -n "persist-credentials" "$FILE" || true echo echo "Searching for uses: within the publish-harnesses job (best-effort block scan)..." # Block scan: start at publish-harnesses: and stop when indentation returns to the job level. python3 - <<'PY' import re, itertools, sys, pathlib path = pathlib.Path(".github/workflows/publish.yml") lines = path.read_text().splitlines(True) start = None for i,l in enumerate(lines): if re.match(r'^\s*publish-harnesses:\s*$', l): start = i break if start is None: print("publish-harnesses job not found with exact-line match; falling back to substring search.") for i,l in enumerate(lines): if "publish-harnesses:" in l: start=i break if start is None: print("Could not find publish-harnesses at all.") sys.exit(0) # Determine indentation level of the job key m = re.match(r'^(\s*)publish-harnesses:\s*$', lines[start]) indent = len(m.group(1)) if m else 0 # Collect until next top-level sibling at same indent end = len(lines) for j in range(start+1, len(lines)): if re.match(rf'^\s{{0,{indent}}}[A-Za-z0-9_-]+:\s*$', lines[j]) and j!=start: end = j break block = lines[start:end] for idx,l in enumerate(block, start=start+1): if "uses:" in l: print(f"{idx}: {l.rstrip()}") PY

Repository: AgentWorkforce/relay

Length of output: 279

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/publish.yml" echo "Locating publish-harnesses in $FILE..." rg -n "^\s*publish-harnesses:" "$FILE" || true echo echo "Printing lines 950-1030 with line numbers (no nl available)..." awk 'NR>=950 && NR<=1030 { printf "%d:%s\n", NR, $0 }' "$FILE" || true echo echo "Searching for persist-credentials in $FILE..." rg -n "persist-credentials" "$FILE" || true echo echo "Scanning publish-harnesses job block for uses: entries and persist-credentials..." python3 - <<'PY' import re, pathlib path = pathlib.Path(".github/workflows/publish.yml") lines = path.read_text().splitlines() start = None for i,l in enumerate(lines): if re.match(r'^\s*publish-harnesses:\s*$', l): start = i break if start is None: print("publish-harnesses not found") raise SystemExit(0) indent = len(re.match(r'^(\s*)publish-harnesses:\s*$', lines[start]).group(1)) def job_key_line(s): # next sibling at same indent level return bool(re.match(rf'^\s{{0,{indent}}}[A-Za-z0-9_-]+:\s*$', s)) end = len(lines) for j in range(start+1, len(lines)): if job_key_line(lines[j]): end = j break block = lines[start:end] for offset, l in enumerate(block, start=start+1): if re.search(r'\buses:\s*', l) or re.search(r'\bpersist-credentials:\s*', l): print(f"{offset}:{l}") PY

Repository: AgentWorkforce/relay

Length of output: 4237

🌐 Web query:

actions/checkout@v4 persist-credentials default value

💡 Result:

The default value for the persist-credentials input in actions/checkout@v4 is true [1][2][3]. When set to true, the authentication token or SSH key used for the checkout is persisted in the local git configuration, which allows subsequent git commands in your workflow to remain authenticated [3][4]. If you do not want these credentials to be persisted, you can explicitly set persist-credentials to false [3][4].

Citations:

1: https://github.com/actions/checkout/blob/v4.1.1/action.yml

2: https://github.com/actions/checkout/blob/v4/action.yml

3: https://github.com/actions/checkout/blob/v4/README.md

4: https://github.com/actions/checkout/?tab=readme-ov-file

Harden publish-harnesses by pinning action SHAs and disabling checkout credential persistence.

In .github/workflows/publish.yml, the publish-harnesses job uses unpinned third-party actions (actions/checkout@v4, actions/setup-node@v4, actions/download-artifact@v4) and checkout does not set persist-credentials: false (default is true), which can persist credentials on the runner.

🔒 Suggested workflow hardening

- name: Checkout code - uses: actions/checkout@v4 + uses: actions/checkout@<full_commit_sha_for_v4> + with: + persist-credentials: false - name: Setup Node.js - uses: actions/setup-node@v4 + uses: actions/setup-node@<full_commit_sha_for_v4> with: node-version: '22.14.0' registry-url: 'https://registry.npmjs.org' - name: Download build artifacts - uses: actions/download-artifact@v4 + uses: actions/download-artifact@<full_commit_sha_for_v4> with: name: build-output path: .

🧰 Tools

🪛 zizmor (1.25.2)

[warning] 980-981: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

[error] 981-981: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

[error] 984-984: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

[error] 990-990: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/publish.yml around lines 980 - 990, Pin the GH Actions used in the `publish-harnesses` job and disable checkout credential persistence: replace the floating uses of actions/checkout@v4, actions/setup-node@v4, and actions/download-artifact@v4 with the corresponding commit SHAs (use the latest verified commit SHAs for each action) and update the Checkout step (the step named "Checkout code") to include persist-credentials: false so runner credentials are not left on the workspace; ensure you update the steps that reference those actions (Checkout code, Setup Node.js, Download build artifacts) to use the SHA-pinned references.

coderabbitai · 2026-06-04T14:55:17Z

+  waitForExit(timeoutMs?: number): Promise<AgentExitInfo> {
+    const already = this.exit;
+    if (already) return Promise.resolve(already);
+
+    return new Promise<AgentExitInfo>((resolve) => {
+      let timer: ReturnType<typeof setTimeout> | undefined;
+      const settle = (info: AgentExitInfo) => {
+        if (timer) clearTimeout(timer);
+        unsub();
+        resolve(info);
+      };
+      const unsub = this.client.onEvent((event: BrokerEvent) => {
+        const exit = matchExit(event, this.name);
+        if (exit) settle(exit);
+      });
+      if (timeoutMs !== undefined) {
+        timer = setTimeout(() => settle({ reason: 'timeout' }), timeoutMs);
+      }
+    });
+  }
+
+  /**
+   * Resolve on the next idle signal for this agent (edge-triggered: a fresh
+   * signal after the call, matching how runners poll-then-nudge). Also resolves
+   * if the agent exits first, or with `{ reason: 'timeout' }` after `timeoutMs`.
+   */
+  waitForIdle(timeoutMs?: number): Promise<AgentIdleInfo> {
+    const already = this.exit;
+    if (already) return Promise.resolve({ reason: 'exited', exit: already });
+
+    return new Promise<AgentIdleInfo>((resolve) => {
+      let timer: ReturnType<typeof setTimeout> | undefined;
+      const settle = (info: AgentIdleInfo) => {
+        if (timer) clearTimeout(timer);
+        unsubIdle();
+        unsubExit();
+        resolve(info);
+      };
+      const unsubIdle = this.client.addListener('agentIdle', (payload) => {
+        if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs });
+      });
+      const unsubExit = this.client.onEvent((event: BrokerEvent) => {
+        const exit = matchExit(event, this.name);
+        if (exit) settle({ reason: 'exited', exit });
+      });
+      if (timeoutMs !== undefined) {
+        timer = setTimeout(() => settle({ reason: 'timeout' }), timeoutMs);
+      }
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Start the broker event stream before waiting.

waitForExit() and waitForIdle() subscribe to future lifecycle events, but neither method ensures the client's WS stream is connected. HarnessDriverClient leaves that as a separate step (connectEvents() at packages/harness-driver/src/client.ts, Lines 572-574), and subscribeWorkerStream() has to opt in explicitly before listening (Lines 843-845). On a client created via new HarnessDriverClient(...) or connect(), these waits will otherwise only ever resolve from already-cached history and can hang indefinitely for new exits/idles.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/harness-driver/src/agent-handle.ts` around lines 84 - 132, Both waitForExit and waitForIdle currently subscribe to future events but don't ensure the client's event stream or per-worker subscription are active; make each method async, and before creating the new Promise call await this.client.connectEvents() and await this.client.subscribeWorkerStream(this.name) (or the equivalent ensure-subscription API on HarnessDriverClient) so the WS/event stream and worker-specific stream are started; update the signatures of waitForExit and waitForIdle to async Promise<AgentExitInfo> / async Promise<AgentIdleInfo> so callers still get a promise.

The client `eventBus` only carries call-site hook events; broker events like `agentIdle` are never emitted onto it in direct-client usage, so `addListener('agentIdle', ...)` never fired and `waitForIdle()` could hang. Match the `agent_idle` BrokerEvent on the transport stream via `onEvent` instead — consistent with how `waitForExit()` already works. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codeant-ai · 2026-06-04T14:58:08Z

+    const exit = this.client.getLastEvent('agent_exit', this.name);
+    if (exit && exit.kind === 'agent_exit') {
+      return { reason: 'exited' };
+    }


Suggestion: agent_exit is an exit-request signal (not a confirmed process termination), but this code treats it as a completed exit. That can make waitForExit() and exit resolve too early, before the process has actually terminated and before a real exit code/signal is known. Only resolve exit completion from the confirmed agent_exited event. [api mismatch]

Severity Level: Critical 🚨

- ❌ Callers may treat agents as terminated while still running. - ⚠️ Exit codes and signals may never be observed or recorded. - ⚠️ Workflow runners may start dependent steps prematurely.

Steps of Reproduction ✅

1. In a scenario where the broker protocol distinguishes between an `agent_exit` request and a confirmed `agent_exited` event (definitions live in `packages/harness-driver/src/protocol.ts` and are consumed via `BrokerEvent` imported at line 21 in `agent-handle.ts`), configure or observe an agent where `agent_exit` is emitted before the underlying process has actually terminated. 2. Spawn this agent via `HarnessDriverClient.spawnPty()` or related APIs (implemented in `packages/harness-driver/src/client.ts`), which now return a `SpawnedAgentHandle` that uses the `exit` getter at `packages/harness-driver/src/agent-handle.ts:59-68` to derive its lifecycle state. 3. Call `await handle.waitForExit()` shortly after the exit is requested; inside `waitForExit()` at lines 84-103, `const already = this.exit;` runs the `exit` getter, which first checks `getLastEvent('agent_exited', this.name)` and then falls back to `getLastEvent('agent_exit', this.name)`. 4. Because `agent_exit` has been recorded but `agent_exited` has not yet occurred, `this.client.getLastEvent('agent_exit', this.name)` returns the pending exit-request event, the `exit` getter returns `{ reason: 'exited' }`, and `waitForExit()` resolves as if the process has fully exited even though the real termination (and exit code/signal) has not yet been observed.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** packages/harness-driver/src/agent-handle.ts **Line:** 64:67 **Comment:** *Api Mismatch: `agent_exit` is an exit-request signal (not a confirmed process termination), but this code treats it as a completed exit. That can make `waitForExit()` and `exit` resolve too early, before the process has actually terminated and before a real exit code/signal is known. Only resolve exit completion from the confirmed `agent_exited` event. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-04T14:58:08Z

+      const unsubIdle = this.client.addListener('agentIdle', (payload) => {
+        if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs });
+      });


Suggestion: waitForIdle() subscribes to client.addListener('agentIdle', ...), but HarnessDriverClient does not emit broker WS events through eventBus (it only emits call-site lifecycle hooks). In direct-client usage this listener never fires, so idle waits can only end via timeout or exit. Subscribe to the broker stream (onEvent + agent_idle) instead. [api mismatch]

Severity Level: Critical 🚨

- ❌ Idle waits never resolve on genuine broker idle events. - ⚠️ Runners relying on idle edge-trigger may hang until timeout. - ⚠️ Workflow orchestration may mis-handle agent liveness and pacing.

Steps of Reproduction ✅

1. In `packages/harness-driver/src/agent-handle.ts`, note that `waitForIdle()` at lines 110-132 subscribes to idle via `this.client.addListener('agentIdle', (payload) => { ... })` while also subscribing to broker events via `this.client.onEvent((event: BrokerEvent) => { ... })`. 2. Inspect `HarnessDriverClient` implementation in `packages/harness-driver/src/client.ts` (which defines `addListener` and `onEvent`): `addListener` is used for client-level lifecycle hooks (e.g., spawn callbacks), whereas broker WebSocket traffic is surfaced through `onEvent` with `BrokerEvent` kinds such as `agent_idle`, `agent_exited`, etc. 3. Observe that no code in `HarnessDriverClient` forwards broker `agent_idle` events into the `'agentIdle'` event bus channel used by `addListener`, meaning `waitForIdle()`'s `addListener('agentIdle', ...)` callback is never invoked in direct-client usage even when broker `agent_idle` events are flowing. 4. Run a harness flow that spawns an agent, lets it go idle (broker emits `agent_idle`), and then calls `await handle.waitForIdle()`; the promise will only resolve if the agent exits (via `onEvent` subscription) or the timeout fires, but will never resolve for the idle case because the `agentIdle` listener is never triggered.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** packages/harness-driver/src/agent-handle.ts **Line:** 122:124 **Comment:** *Api Mismatch: `waitForIdle()` subscribes to `client.addListener('agentIdle', ...)`, but `HarnessDriverClient` does not emit broker WS events through `eventBus` (it only emits call-site lifecycle hooks). In direct-client usage this listener never fires, so idle waits can only end via timeout or exit. Subscribe to the broker stream (`onEvent` + `agent_idle`) instead. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-04T14:58:08Z

      const result = SpawnAgentResultSchema.parse(rawResult);
      await this.emitAfterSpawn(beforeCtx, resolvedInput, t0, result, undefined);
-      return result;
+      return new SpawnedAgentHandle(result, this);


Suggestion: This now returns a class instance that carries an internal client reference, so callers that spread or serialize the spawn result can unintentionally expose deep internal state (including transport credentials) instead of just agent metadata. Keep internals non-enumerable/opaque (for example with true private fields and controlled serialization) before returning this object shape publicly. [security]

Severity Level: Critical 🚨

- ❌ JSON serializing SpawnedAgentHandle leaks BrokerTransport.apiKey credential. - ⚠️ Logs may contain workspace_key and broker base URL.

Steps of Reproduction ✅

1. `HarnessDriverClient.spawnPty()` in `packages/harness-driver/src/client.ts:46-63` parses the broker response into `result` using `SpawnAgentResultSchema` and now returns `new SpawnedAgentHandle(result, this)` (line 63), instead of returning the plain `result` object. 2. `SpawnedAgentHandle` in `packages/harness-driver/src/agent-handle.ts:42-56` stores the passed client reference as a `private readonly client: HarnessDriverClient` constructor parameter (lines 48-51); in the emitted JavaScript this becomes a normal instance property (not an ES `#private` field), so it is an own, enumerable property of the handle object at runtime. 3. The `HarnessDriverClient` instance stored on the handle holds a `BrokerTransport` with an `apiKey` field (`packages/harness-driver/src/transport.ts:132-151`) and a `workspaceKey` field on the client itself (`packages/harness-driver/src/client.ts:139-144`), both of which are normal object properties; nothing marks them as non-enumerable or excludes them from serialization. 4. When a downstream consumer obtains a handle from `spawnPty()` (for example, by adapting the existing CLI entry `spawnAgentWithClient` in `packages/cli/src/cli/lib/client-factory.ts:10-14` to `const handle = await client.spawnPty(options);`) and then logs or transmits it via `JSON.stringify(handle)` or `{ ...handle }`, JSON serialization walks enumerable own properties and includes the nested `client` object. That nested object in turn contains the `BrokerTransport.apiKey` and any `workspaceKey`, so the serialized spawn result now leaks internal connection credentials and driver state instead of just the intended metadata (`name`, `runtime`, `sessionId`, `pid`).

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** packages/harness-driver/src/client.ts **Line:** 642:642 **Comment:** *Security: This now returns a class instance that carries an internal `client` reference, so callers that spread or serialize the spawn result can unintentionally expose deep internal state (including transport credentials) instead of just agent metadata. Keep internals non-enumerable/opaque (for example with true private fields and controlled serialization) before returning this object shape publicly. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-04T14:58:14Z

CodeAnt AI finished reviewing your PR.

`transport.onEvent()` only registers a listener; events arrive only once the stream is connected. Call the idempotent `connectEvents()` at the start of `waitForExit`/`waitForIdle` so they receive events even if the consumer never connected the stream, instead of hanging until timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

willwashburn · 2026-06-04T14:59:38Z

Addressed the review feedback on agent-handle.ts:

waitForIdle listened on the dead agentIdle event bus (architect + CodeRabbit): broker events are never emitted onto the client eventBus in direct-client usage, so it could hang. Now matches the agent_idle BrokerEvent on the transport stream via onEvent, consistent with waitForExit. (639e8a99)
Event stream not guaranteed connected (CodeRabbit): onEvent() only registers a listener; both waits now call the idempotent connectEvents() first so events flow even if the consumer never connected. Skipped the suggested subscribeWorkerStream() — that streams PTY output, not lifecycle events, so it's not relevant to exit/idle. (991086de)

Not addressing the publish.yml SHA-pinning / persist-credentials comment here — that file comes from the base ci/publish-harnesses branch (companion PR #1049), not this change.

harness-driver typechecks clean and all 26 tests pass.

willwashburn and others added 3 commits June 4, 2026 09:18

willwashburn requested a review from khaliqgant as a code owner June 4, 2026 14:46

codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jun 4, 2026

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

codeant-ai Bot reviewed Jun 4, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

codeant-ai Bot reviewed Jun 4, 2026

View reviewed changes

willwashburn merged commit ac43ef4 into main Jun 4, 2026
40 checks passed

willwashburn deleted the feat/harness-driver-agent-lifecycle-handle branch June 4, 2026 15:08

willwashburn mentioned this pull request Jun 4, 2026

feat: build & test relayflows core against @agent-relay 8.x AgentWorkforce/relayflows#2

Open

		const unsubIdle = this.client.addListener('agentIdle', (payload) => {
		if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs });

		if (event.kind === 'agent_exit' && event.name === name) {
		return { reason: 'exited' };

Conversation

willwashburn commented Jun 4, 2026 • edited by codeant-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

What

Why

Notes

Verification

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

gemini-code-assist Bot commented Jun 4, 2026

Uh oh!

codeant-ai Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Review ran into problems

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codeant-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Jun 4, 2026

Uh oh!

willwashburn commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willwashburn commented Jun 4, 2026 •

edited by codeant-ai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading