feat(harness-driver): add lifecycle-aware SpawnedAgentHandle#1050
Conversation
@agent-relay/harnesses was set up as a public package (no private flag, publishConfig.access public, versioned in lockstep) but was never wired into the publish workflow, so it never reached npm. External SDK consumers (e.g. relayflows) need it for the prebuilt PTY harnesses and the definePtyHarness/createHuman author helpers. Add a publish-harnesses job to the package=all path. It runs after publish-packages — where its exact-version workspace deps (@agent-relay/sdk, @agent-relay/harness-driver) land on the registry — so an external `npm install @agent-relay/harnesses@<v>` can always resolve its dependencies. Mirrors the existing provenance + skip-if-exists publish pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address PR review: publish-harnesses ran outside the release gate, so a tag/release could be cut even if harness publishing failed. - create-release now needs publish-harnesses and its `if` requires the job to not have failed. It tolerates `skipped` so package=main releases (where publish-harnesses does not run) are not blocked. - summary job lists the harness publish result. - Trim the changelog bullet to impact-first per the repo changelog rule. Leaving the new job's actions on @v4 tags to match the rest of the workflow (the repo uses tag refs throughout; SHA-pinning would be inconsistent and is not the enforced policy here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`spawnPty`/`spawnCli`/`spawnHeadless` now return a `SpawnedAgentHandle`
instead of a bare `SpawnAgentResult`. The handle is a structural superset
(still carries name/runtime/sessionId/pid) so existing callers are
unaffected, and it adds the promise-based lifecycle operations consumers
previously had to reconstruct from the raw broker event stream:
- waitForExit() -> { reason, code, signal }
- waitForIdle() -> { reason, idleSecs }
- exit / exitCode / exitSignal (synchronous view of a prior exit)
- release()
All operations are backed by the client's broker event stream and its
event history, so they are replay-correct: awaiting after the agent has
already exited resolves immediately from history rather than hanging.
Typechecks clean; harness-driver tests pass; dependents (harnesses, sdk)
unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
CodeAnt AI is reviewing your PR. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR introduces a lifecycle-aware ChangesAgent lifecycle wrapper and harnesses publication
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0924259c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const unsubIdle = this.client.addListener('agentIdle', (payload) => { | ||
| if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs }); |
There was a problem hiding this comment.
Listen to broker idle events directly
In the normal client path, broker websocket frames are delivered through client.onEvent; I couldn't find any bridge that emits agentIdle on this EventBus from incoming agent_idle frames. Since the broker protocol sends idle as kind: 'agent_idle' with idle_secs, waitForIdle() will never resolve as idle for real broker idle signals unless some external caller manually emits this lifecycle event, so harnesses using the new handle will hang until timeout or exit.
Useful? React with 👍 / 👎.
| if (event.kind === 'agent_exit' && event.name === name) { | ||
| return { reason: 'exited' }; |
There was a problem hiding this comment.
Stop treating agent_exit as process exit
For PTY/headless agents, the broker emits agent_exit when the worker reports an exit/close reason, and then maintenance emits the real agent_exited event with code/signal; the Python/Swift SDKs also model agent_exit separately as an exit-requested event. Resolving waitForExit() on agent_exit means callers waiting before shutdown commonly get { reason: 'exited' } before the process has actually reached agent_exited, losing the exit code/signal this API promises and potentially continuing cleanup too early.
Useful? React with 👍 / 👎.
| const unsubIdle = this.client.addListener('agentIdle', (payload) => { | ||
| if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs }); | ||
| }); |
There was a problem hiding this comment.
🔴 Architect Review — CRITICAL
waitForIdle() relies on client.addListener('agentIdle', ...), but no code in HarnessDriverClient (or elsewhere in this repo) ever emits agentIdle onto its eventBus, so in direct-client usage idle waits will never see idle signals and can hang indefinitely (or only ever resolve via exit/timeout).
Suggestion: Either bridge broker agent_idle events into the client's eventBus (emitting agentIdle with an AgentIdlePayload), or change waitForIdle() to derive idleness directly from the broker event stream/history in the same way exit is handled via onEvent and getLastEvent.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.
**Path:** packages/harness-driver/src/agent-handle.ts
**Line:** 122:124
**Comment:**
*CRITICAL: `waitForIdle()` relies on `client.addListener('agentIdle', ...)`, but no code in `HarnessDriverClient` (or elsewhere in this repo) ever emits `agentIdle` onto its `eventBus`, so in direct-client usage idle waits will never see idle signals and can hang indefinitely (or only ever resolve via exit/timeout).
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/publish.yml:
- Around line 980-990: Pin the GH Actions used in the `publish-harnesses` job
and disable checkout credential persistence: replace the floating uses of
actions/checkout@v4, actions/setup-node@v4, and actions/download-artifact@v4
with the corresponding commit SHAs (use the latest verified commit SHAs for each
action) and update the Checkout step (the step named "Checkout code") to include
persist-credentials: false so runner credentials are not left on the workspace;
ensure you update the steps that reference those actions (Checkout code, Setup
Node.js, Download build artifacts) to use the SHA-pinned references.
In `@packages/harness-driver/src/agent-handle.ts`:
- Around line 84-132: Both waitForExit and waitForIdle currently subscribe to
future events but don't ensure the client's event stream or per-worker
subscription are active; make each method async, and before creating the new
Promise call await this.client.connectEvents() and await
this.client.subscribeWorkerStream(this.name) (or the equivalent
ensure-subscription API on HarnessDriverClient) so the WS/event stream and
worker-specific stream are started; update the signatures of waitForExit and
waitForIdle to async Promise<AgentExitInfo> / async Promise<AgentIdleInfo> so
callers still get a promise.
- Around line 122-128: waitForIdle() is subscribing to
client.addListener('agentIdle', ...) which listens to the internal eventBus that
never receives broker agent_idle events; change the idle subscription to use the
broker stream handler client.onEvent(...) (the same path used by
connectEvents/onEvent) and detect the broker agent_idle event there (similar to
how matchExit(event, this.name) is used) so that settle({ reason: 'idle',
idleSecs: ... }) is called when the broker agent_idle for this.name arrives;
keep the existing exit handling via client.onEvent and remove or replace the
client.addListener('agentIdle', ...) subscription.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1863fb47-d6bd-44d8-a1e2-4b1f3d5fdd51
📒 Files selected for processing (5)
.github/workflows/publish.ymlCHANGELOG.mdpackages/harness-driver/src/agent-handle.tspackages/harness-driver/src/client.tspackages/harness-driver/src/index.ts
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '22.14.0' | ||
| registry-url: 'https://registry.npmjs.org' | ||
|
|
||
| - name: Download build artifacts | ||
| uses: actions/download-artifact@v4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking changed publish-harnesses block for unpinned actions..."
awk '
/publish-harnesses:/, /^ [a-zA-Z0-9_-]+:/ {
if ($0 ~ /uses:\s*[^@]+@v[0-9]+/) print NR ":" $0
}
' .github/workflows/publish.yml
echo
echo "Checking whether checkout in publish-harnesses sets persist-credentials: false..."
awk '
/publish-harnesses:/, /^ [a-zA-Z0-9_-]+:/ { print NR ":" $0 }
' .github/workflows/publish.yml | sed -n '/Checkout code/,/Setup Node.js/p'Repository: AgentWorkforce/relay
Length of output: 212
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE=".github/workflows/publish.yml"
echo "Locating publish-harnesses in $FILE..."
rg -n "^\s*publish-harnesses:" "$FILE" || true
echo
echo "Showing around lines 950-1030 (per review comment reference)..."
nl -ba "$FILE" | sed -n '950,1030p'
echo
echo "Searching for persist-credentials in publish.yml..."
rg -n "persist-credentials" "$FILE" || true
echo
echo "Searching for uses: within the publish-harnesses job (best-effort block scan)..."
# Block scan: start at publish-harnesses: and stop when indentation returns to the job level.
python3 - <<'PY'
import re, itertools, sys, pathlib
path = pathlib.Path(".github/workflows/publish.yml")
lines = path.read_text().splitlines(True)
start = None
for i,l in enumerate(lines):
if re.match(r'^\s*publish-harnesses:\s*$', l):
start = i
break
if start is None:
print("publish-harnesses job not found with exact-line match; falling back to substring search.")
for i,l in enumerate(lines):
if "publish-harnesses:" in l:
start=i
break
if start is None:
print("Could not find publish-harnesses at all.")
sys.exit(0)
# Determine indentation level of the job key
m = re.match(r'^(\s*)publish-harnesses:\s*$', lines[start])
indent = len(m.group(1)) if m else 0
# Collect until next top-level sibling at same indent
end = len(lines)
for j in range(start+1, len(lines)):
if re.match(rf'^\s{{0,{indent}}}[A-Za-z0-9_-]+:\s*$', lines[j]) and j!=start:
end = j
break
block = lines[start:end]
for idx,l in enumerate(block, start=start+1):
if "uses:" in l:
print(f"{idx}: {l.rstrip()}")
PYRepository: AgentWorkforce/relay
Length of output: 279
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE=".github/workflows/publish.yml"
echo "Locating publish-harnesses in $FILE..."
rg -n "^\s*publish-harnesses:" "$FILE" || true
echo
echo "Printing lines 950-1030 with line numbers (no nl available)..."
awk 'NR>=950 && NR<=1030 { printf "%d:%s\n", NR, $0 }' "$FILE" || true
echo
echo "Searching for persist-credentials in $FILE..."
rg -n "persist-credentials" "$FILE" || true
echo
echo "Scanning publish-harnesses job block for uses: entries and persist-credentials..."
python3 - <<'PY'
import re, pathlib
path = pathlib.Path(".github/workflows/publish.yml")
lines = path.read_text().splitlines()
start = None
for i,l in enumerate(lines):
if re.match(r'^\s*publish-harnesses:\s*$', l):
start = i
break
if start is None:
print("publish-harnesses not found")
raise SystemExit(0)
indent = len(re.match(r'^(\s*)publish-harnesses:\s*$', lines[start]).group(1))
def job_key_line(s):
# next sibling at same indent level
return bool(re.match(rf'^\s{{0,{indent}}}[A-Za-z0-9_-]+:\s*$', s))
end = len(lines)
for j in range(start+1, len(lines)):
if job_key_line(lines[j]):
end = j
break
block = lines[start:end]
for offset, l in enumerate(block, start=start+1):
if re.search(r'\buses:\s*', l) or re.search(r'\bpersist-credentials:\s*', l):
print(f"{offset}:{l}")
PYRepository: AgentWorkforce/relay
Length of output: 4237
🌐 Web query:
actions/checkout@v4 persist-credentials default value
💡 Result:
The default value for the persist-credentials input in actions/checkout@v4 is true [1][2][3]. When set to true, the authentication token or SSH key used for the checkout is persisted in the local git configuration, which allows subsequent git commands in your workflow to remain authenticated [3][4]. If you do not want these credentials to be persisted, you can explicitly set persist-credentials to false [3][4].
Citations:
- 1: https://github.com/actions/checkout/blob/v4.1.1/action.yml
- 2: https://github.com/actions/checkout/blob/v4/action.yml
- 3: https://github.com/actions/checkout/blob/v4/README.md
- 4: https://github.com/actions/checkout/?tab=readme-ov-file
Harden publish-harnesses by pinning action SHAs and disabling checkout credential persistence.
In .github/workflows/publish.yml, the publish-harnesses job uses unpinned third-party actions (actions/checkout@v4, actions/setup-node@v4, actions/download-artifact@v4) and checkout does not set persist-credentials: false (default is true), which can persist credentials on the runner.
🔒 Suggested workflow hardening
- name: Checkout code
- uses: actions/checkout@v4
+ uses: actions/checkout@<full_commit_sha_for_v4>
+ with:
+ persist-credentials: false
- name: Setup Node.js
- uses: actions/setup-node@v4
+ uses: actions/setup-node@<full_commit_sha_for_v4>
with:
node-version: '22.14.0'
registry-url: 'https://registry.npmjs.org'
- name: Download build artifacts
- uses: actions/download-artifact@v4
+ uses: actions/download-artifact@<full_commit_sha_for_v4>
with:
name: build-output
path: .🧰 Tools
🪛 zizmor (1.25.2)
[warning] 980-981: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 981-981: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 984-984: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 990-990: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/publish.yml around lines 980 - 990, Pin the GH Actions
used in the `publish-harnesses` job and disable checkout credential persistence:
replace the floating uses of actions/checkout@v4, actions/setup-node@v4, and
actions/download-artifact@v4 with the corresponding commit SHAs (use the latest
verified commit SHAs for each action) and update the Checkout step (the step
named "Checkout code") to include persist-credentials: false so runner
credentials are not left on the workspace; ensure you update the steps that
reference those actions (Checkout code, Setup Node.js, Download build artifacts)
to use the SHA-pinned references.
| waitForExit(timeoutMs?: number): Promise<AgentExitInfo> { | ||
| const already = this.exit; | ||
| if (already) return Promise.resolve(already); | ||
|
|
||
| return new Promise<AgentExitInfo>((resolve) => { | ||
| let timer: ReturnType<typeof setTimeout> | undefined; | ||
| const settle = (info: AgentExitInfo) => { | ||
| if (timer) clearTimeout(timer); | ||
| unsub(); | ||
| resolve(info); | ||
| }; | ||
| const unsub = this.client.onEvent((event: BrokerEvent) => { | ||
| const exit = matchExit(event, this.name); | ||
| if (exit) settle(exit); | ||
| }); | ||
| if (timeoutMs !== undefined) { | ||
| timer = setTimeout(() => settle({ reason: 'timeout' }), timeoutMs); | ||
| } | ||
| }); | ||
| } | ||
|
|
||
| /** | ||
| * Resolve on the next idle signal for this agent (edge-triggered: a fresh | ||
| * signal after the call, matching how runners poll-then-nudge). Also resolves | ||
| * if the agent exits first, or with `{ reason: 'timeout' }` after `timeoutMs`. | ||
| */ | ||
| waitForIdle(timeoutMs?: number): Promise<AgentIdleInfo> { | ||
| const already = this.exit; | ||
| if (already) return Promise.resolve({ reason: 'exited', exit: already }); | ||
|
|
||
| return new Promise<AgentIdleInfo>((resolve) => { | ||
| let timer: ReturnType<typeof setTimeout> | undefined; | ||
| const settle = (info: AgentIdleInfo) => { | ||
| if (timer) clearTimeout(timer); | ||
| unsubIdle(); | ||
| unsubExit(); | ||
| resolve(info); | ||
| }; | ||
| const unsubIdle = this.client.addListener('agentIdle', (payload) => { | ||
| if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs }); | ||
| }); | ||
| const unsubExit = this.client.onEvent((event: BrokerEvent) => { | ||
| const exit = matchExit(event, this.name); | ||
| if (exit) settle({ reason: 'exited', exit }); | ||
| }); | ||
| if (timeoutMs !== undefined) { | ||
| timer = setTimeout(() => settle({ reason: 'timeout' }), timeoutMs); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Start the broker event stream before waiting.
waitForExit() and waitForIdle() subscribe to future lifecycle events, but neither method ensures the client's WS stream is connected. HarnessDriverClient leaves that as a separate step (connectEvents() at packages/harness-driver/src/client.ts, Lines 572-574), and subscribeWorkerStream() has to opt in explicitly before listening (Lines 843-845). On a client created via new HarnessDriverClient(...) or connect(), these waits will otherwise only ever resolve from already-cached history and can hang indefinitely for new exits/idles.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/harness-driver/src/agent-handle.ts` around lines 84 - 132, Both
waitForExit and waitForIdle currently subscribe to future events but don't
ensure the client's event stream or per-worker subscription are active; make
each method async, and before creating the new Promise call await
this.client.connectEvents() and await
this.client.subscribeWorkerStream(this.name) (or the equivalent
ensure-subscription API on HarnessDriverClient) so the WS/event stream and
worker-specific stream are started; update the signatures of waitForExit and
waitForIdle to async Promise<AgentExitInfo> / async Promise<AgentIdleInfo> so
callers still get a promise.
The client `eventBus` only carries call-site hook events; broker events
like `agentIdle` are never emitted onto it in direct-client usage, so
`addListener('agentIdle', ...)` never fired and `waitForIdle()` could
hang. Match the `agent_idle` BrokerEvent on the transport stream via
`onEvent` instead — consistent with how `waitForExit()` already works.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| const exit = this.client.getLastEvent('agent_exit', this.name); | ||
| if (exit && exit.kind === 'agent_exit') { | ||
| return { reason: 'exited' }; | ||
| } |
There was a problem hiding this comment.
Suggestion: agent_exit is an exit-request signal (not a confirmed process termination), but this code treats it as a completed exit. That can make waitForExit() and exit resolve too early, before the process has actually terminated and before a real exit code/signal is known. Only resolve exit completion from the confirmed agent_exited event. [api mismatch]
Severity Level: Critical 🚨
- ❌ Callers may treat agents as terminated while still running.
- ⚠️ Exit codes and signals may never be observed or recorded.
- ⚠️ Workflow runners may start dependent steps prematurely.Steps of Reproduction ✅
1. In a scenario where the broker protocol distinguishes between an `agent_exit` request
and a confirmed `agent_exited` event (definitions live in
`packages/harness-driver/src/protocol.ts` and are consumed via `BrokerEvent` imported at
line 21 in `agent-handle.ts`), configure or observe an agent where `agent_exit` is emitted
before the underlying process has actually terminated.
2. Spawn this agent via `HarnessDriverClient.spawnPty()` or related APIs (implemented in
`packages/harness-driver/src/client.ts`), which now return a `SpawnedAgentHandle` that
uses the `exit` getter at `packages/harness-driver/src/agent-handle.ts:59-68` to derive
its lifecycle state.
3. Call `await handle.waitForExit()` shortly after the exit is requested; inside
`waitForExit()` at lines 84-103, `const already = this.exit;` runs the `exit` getter,
which first checks `getLastEvent('agent_exited', this.name)` and then falls back to
`getLastEvent('agent_exit', this.name)`.
4. Because `agent_exit` has been recorded but `agent_exited` has not yet occurred,
`this.client.getLastEvent('agent_exit', this.name)` returns the pending exit-request
event, the `exit` getter returns `{ reason: 'exited' }`, and `waitForExit()` resolves as
if the process has fully exited even though the real termination (and exit code/signal)
has not yet been observed.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** packages/harness-driver/src/agent-handle.ts
**Line:** 64:67
**Comment:**
*Api Mismatch: `agent_exit` is an exit-request signal (not a confirmed process termination), but this code treats it as a completed exit. That can make `waitForExit()` and `exit` resolve too early, before the process has actually terminated and before a real exit code/signal is known. Only resolve exit completion from the confirmed `agent_exited` event.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| const unsubIdle = this.client.addListener('agentIdle', (payload) => { | ||
| if (payload.name === this.name) settle({ reason: 'idle', idleSecs: payload.idleSecs }); | ||
| }); |
There was a problem hiding this comment.
Suggestion: waitForIdle() subscribes to client.addListener('agentIdle', ...), but HarnessDriverClient does not emit broker WS events through eventBus (it only emits call-site lifecycle hooks). In direct-client usage this listener never fires, so idle waits can only end via timeout or exit. Subscribe to the broker stream (onEvent + agent_idle) instead. [api mismatch]
Severity Level: Critical 🚨
- ❌ Idle waits never resolve on genuine broker idle events.
- ⚠️ Runners relying on idle edge-trigger may hang until timeout.
- ⚠️ Workflow orchestration may mis-handle agent liveness and pacing.Steps of Reproduction ✅
1. In `packages/harness-driver/src/agent-handle.ts`, note that `waitForIdle()` at lines
110-132 subscribes to idle via `this.client.addListener('agentIdle', (payload) => { ...
})` while also subscribing to broker events via `this.client.onEvent((event: BrokerEvent)
=> { ... })`.
2. Inspect `HarnessDriverClient` implementation in `packages/harness-driver/src/client.ts`
(which defines `addListener` and `onEvent`): `addListener` is used for client-level
lifecycle hooks (e.g., spawn callbacks), whereas broker WebSocket traffic is surfaced
through `onEvent` with `BrokerEvent` kinds such as `agent_idle`, `agent_exited`, etc.
3. Observe that no code in `HarnessDriverClient` forwards broker `agent_idle` events into
the `'agentIdle'` event bus channel used by `addListener`, meaning `waitForIdle()`'s
`addListener('agentIdle', ...)` callback is never invoked in direct-client usage even when
broker `agent_idle` events are flowing.
4. Run a harness flow that spawns an agent, lets it go idle (broker emits `agent_idle`),
and then calls `await handle.waitForIdle()`; the promise will only resolve if the agent
exits (via `onEvent` subscription) or the timeout fires, but will never resolve for the
idle case because the `agentIdle` listener is never triggered.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** packages/harness-driver/src/agent-handle.ts
**Line:** 122:124
**Comment:**
*Api Mismatch: `waitForIdle()` subscribes to `client.addListener('agentIdle', ...)`, but `HarnessDriverClient` does not emit broker WS events through `eventBus` (it only emits call-site lifecycle hooks). In direct-client usage this listener never fires, so idle waits can only end via timeout or exit. Subscribe to the broker stream (`onEvent` + `agent_idle`) instead.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| const result = SpawnAgentResultSchema.parse(rawResult); | ||
| await this.emitAfterSpawn(beforeCtx, resolvedInput, t0, result, undefined); | ||
| return result; | ||
| return new SpawnedAgentHandle(result, this); |
There was a problem hiding this comment.
Suggestion: This now returns a class instance that carries an internal client reference, so callers that spread or serialize the spawn result can unintentionally expose deep internal state (including transport credentials) instead of just agent metadata. Keep internals non-enumerable/opaque (for example with true private fields and controlled serialization) before returning this object shape publicly. [security]
Severity Level: Critical 🚨
- ❌ JSON serializing SpawnedAgentHandle leaks BrokerTransport.apiKey credential.
- ⚠️ Logs may contain workspace_key and broker base URL.Steps of Reproduction ✅
1. `HarnessDriverClient.spawnPty()` in `packages/harness-driver/src/client.ts:46-63`
parses the broker response into `result` using `SpawnAgentResultSchema` and now returns
`new SpawnedAgentHandle(result, this)` (line 63), instead of returning the plain `result`
object.
2. `SpawnedAgentHandle` in `packages/harness-driver/src/agent-handle.ts:42-56` stores the
passed client reference as a `private readonly client: HarnessDriverClient` constructor
parameter (lines 48-51); in the emitted JavaScript this becomes a normal instance property
(not an ES `#private` field), so it is an own, enumerable property of the handle object at
runtime.
3. The `HarnessDriverClient` instance stored on the handle holds a `BrokerTransport` with
an `apiKey` field (`packages/harness-driver/src/transport.ts:132-151`) and a
`workspaceKey` field on the client itself
(`packages/harness-driver/src/client.ts:139-144`), both of which are normal object
properties; nothing marks them as non-enumerable or excludes them from serialization.
4. When a downstream consumer obtains a handle from `spawnPty()` (for example, by adapting
the existing CLI entry `spawnAgentWithClient` in
`packages/cli/src/cli/lib/client-factory.ts:10-14` to `const handle = await
client.spawnPty(options);`) and then logs or transmits it via `JSON.stringify(handle)` or
`{ ...handle }`, JSON serialization walks enumerable own properties and includes the
nested `client` object. That nested object in turn contains the `BrokerTransport.apiKey`
and any `workspaceKey`, so the serialized spawn result now leaks internal connection
credentials and driver state instead of just the intended metadata (`name`, `runtime`,
`sessionId`, `pid`).Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** packages/harness-driver/src/client.ts
**Line:** 642:642
**Comment:**
*Security: This now returns a class instance that carries an internal `client` reference, so callers that spread or serialize the spawn result can unintentionally expose deep internal state (including transport credentials) instead of just agent metadata. Keep internals non-enumerable/opaque (for example with true private fields and controlled serialization) before returning this object shape publicly.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |
`transport.onEvent()` only registers a listener; events arrive only once the stream is connected. Call the idempotent `connectEvents()` at the start of `waitForExit`/`waitForIdle` so they receive events even if the consumer never connected the stream, instead of hanging until timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the review feedback on
Not addressing the harness-driver typechecks clean and all 26 tests pass. |
User description
What
HarnessDriverClient.spawnPty()/spawnCli()/spawnHeadless()(andspawnClaude/spawnOpencode) now return aSpawnedAgentHandleinstead of a bareSpawnAgentResult.The handle is a structural superset of
SpawnAgentResult(it still carriesname/runtime/sessionId/pid), so existing callers are unaffected. On top of that it adds the promise-based lifecycle operations consumers previously had to reconstruct by hand from the raw broker event stream:waitForExit(timeoutMs?)→{ reason: 'exited' | 'timeout', code?, signal? }waitForIdle(timeoutMs?)→{ reason: 'idle' | 'exited' | 'timeout', idleSecs?, exit? }exit/exitCode/exitSignal— synchronous view of a prior exitrelease(reason?)Why
Downstream consumers (e.g. the relayflows workflow runner being migrated off the v7 SDK) need an
await agent.waitForExit()ergonomic with the process exit code. The v7 SDKAgentprovided this; v8'sspawnPty()returned a method-lessSpawnAgentResult, forcing each consumer to re-implement the event→promise bridge — including the fiddly bits (listener-leak cleanup, exit-code capture from theagent_exitedBrokerEventvs the method-lessagentExitedpayload, and the "already exited before you awaited" replay case).The driver is the right home for this: it owns the ordered broker event stream and the event history (
getLastEvent/queryEvents), so it can implement these replay-correctly — awaiting after a prior exit resolves immediately from history instead of hanging.Notes
agent_exitedBrokerEvent(the namedagentExitedevent payload is a method-lessDriverAgentwithout them).waitForIdleis edge-triggered (resolves on the next idle signal), matching how runners poll-then-nudge.Verification
harness-drivertypechecks (tsc --noEmit) and builds.harness-drivertest suite passes (26 tests).@agent-relay/harnessesand@agent-relay/sdktypecheck clean (0 errors) against the widened return type.🤖 Generated with Claude Code
CodeAnt-AI Description
Add lifecycle-aware agent handles and publish harnesses safely
What Changed
@agent-relay/harnessesis now published to npm, so external users can install the prebuilt PTY harnesses and harness-authoring helpers.Impact
✅ Easier agent shutdown handling✅ Fewer hangs when checking exited agents✅ Installable harness package on npm💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.