Skip to content

fix(agent-tools): forward proxied sub-agent tool events stuck at input-available (#1589)#1827

Merged
threepointone merged 3 commits into
mainfrom
fix/1589-proxy-sub-agent-tool-events
Jun 28, 2026
Merged

fix(agent-tools): forward proxied sub-agent tool events stuck at input-available (#1589)#1827
threepointone merged 3 commits into
mainfrom
fix/1589-proxy-sub-agent-tool-events

Conversation

@threepointone

@threepointone threepointone commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1589. When an agent-tool child proxies a remote toUIMessageStreamResponse(), the sub-agent's tool invocations got stuck at input-available in the parent UI (useAgentToolEvents) — tool-output-available and other frames never arrived.

Root cause is a race in tailAgentToolRun (present in both AIChatAgent and Think):

  1. It drained the child's stored chunk backlog (getAgentToolChunks), then
  2. only afterwards attached its live forwarder,

with await boundaries in between. Any chunk the child stored AND broadcast during that drain↔register window was neither in the drained snapshot nor live-forwarded — it silently vanished from the parent's stream. A network-paced proxied remote stream lands chunks in this window constantly (a fast local child mostly skips through it), which is why this only reproduced for the remote-proxy setup in the issue.

What changed

  • Register the live forwarder BEFORE draining the backlog. Live chunks that arrive during the drain are buffered and then flushed in order; the high-water sequence dedupes the replay→live handoff so each chunk is emitted exactly once. (packages/ai-chat/src/index.ts, packages/think/src/think.ts)
  • Think realigns its live sequence to the true high-water mark after the drain, gated on a terminal check, so a post-restart re-attach (cold in-memory counter) can't collide with already-emitted chunks or re-heat the broadcast idle-guard for a terminal run.
  • Hardened teardown: deleting the last forwarder/sequence map entry now removes the map key, so an empty set can't keep the broadcast idle-guard "hot" for the DO's lifetime; a drain/read failure detaches the forwarder before surfacing the error.
  • Documented the dual-counter design (packages/agents/src/chat/agent-tools.ts): the per-run live sequence counter is deliberately separate from the resumable store's chunk_index. reportProgress() progress/milestone frames are forwarded on the chat-response wire but are not durably stored, so they depend on the live counter. Sourcing forward sequencing from the stored count would collide these non-stored frames with the last stored chunk and the tail's dedupe would silently drop them. The comment exists to prevent a "simplification" that would reintroduce that regression.

Tests

Added to both @cloudflare/ai-chat and @cloudflare/think agent-tool suites:

Test plan

  • ai-chat agent-tool suite (30 tests) pass
  • think agent-tool suite (33 tests) pass
  • Repo typecheck (113 projects) clean
  • oxlint + oxfmt clean on all touched files
  • CI green on PR

Made with Cursor


Open in Devin Review

…t-available (#1589)

`tailAgentToolRun` (in both `AIChatAgent` and `Think`) drained the stored
chunk backlog and only afterwards attached its live forwarder, with `await`
boundaries in between. Any chunk the child stored AND broadcast during that
window was neither in the drained snapshot nor live-forwarded, so it silently
vanished from the parent's stream — leaving tool parts (notably
`tool-output-available`) stuck at `input-available` in `useAgentToolEvents`.
A network-paced proxied remote stream (a sub-agent returning a remote
`toUIMessageStreamResponse()`) hits this window constantly; a fast local child
mostly avoids it.

The forwarder is now registered BEFORE the backlog is drained, with live
chunks buffered and replayed in order and deduped by sequence. Think also
realigns its live sequence to the true high-water mark so a post-restart
re-attach can't collide. Hardened forwarder teardown so an empty forwarder/
sequence map entry can't keep the broadcast idle-guard hot for the DO's
lifetime, and a drain/read failure detaches before surfacing.

Also documents why the per-run live sequence counter is deliberately separate
from the resumable store's chunk_index: progress/milestone frames
(`reportProgress`) are forwarded but NOT durably stored, so they depend on the
live counter — sequencing forwards off the stored count would collide them and
the tail's dedupe would silently drop them.

Tests: attach-window forwarding (#1589) and non-stored progress/milestone
forwarding through the replay->live handoff, for both ai-chat and think.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 9e1a835

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@cloudflare/ai-chat Patch
@cloudflare/think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment thread packages/ai-chat/src/index.ts
@pkg-pr-new

pkg-pr-new Bot commented Jun 28, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1827

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1827

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1827

create-think

npm i https://pkg.pr.new/create-think@1827

hono-agents

npm i https://pkg.pr.new/hono-agents@1827

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1827

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1827

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1827

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1827

commit: 9e1a835

threepointone and others added 2 commits June 28, 2026 08:23
…estart)

AIChatAgent's `tailAgentToolRun` drained the stored backlog but never
realigned the in-memory live sequence counter afterwards, unlike Think.
On a re-attach after the child's Durable Object restarts or wakes from
hibernation, `_agentToolLiveSequences` is cold (empty) while the durable
backlog sits at N, and chat-recovery resumes the turn via `tailAgentToolRun`
without re-running `startAgentToolRun` (which seeds the counter). The
broadcast snoop then handed the recovered turn's new chunks sequences from 0
— all <= N — and `emit`'s high-water dedupe silently dropped every one,
leaving the parent permanently stuck with no post-restart chunks.

Realign the counter to the stored high-water mark after the drain (matching
Think). On a warm attach the counter is already in lockstep, so this is a
no-op; it only bites the cold post-restart path. Adds a deterministic
regression test that seeds a running run with a backlog, wipes the live
counter, re-attaches, and asserts a subsequent broadcast forwards at N+1
instead of being dropped.

Caught by Devin review on #1827.

Co-authored-by: Cursor <cursoragent@cursor.com>
Three defensive gaps in AIChatAgent's `tailAgentToolRun` (all present in the
Think version) surfaced now that the forwarder is registered before the drain,
which widened the window in which a consumer can detach:

- Empty `cancel` handler left a zombie forwarder registered after a consumer
  cancelled its reader. `closed`/`forward`/cleanup are now hoisted out of
  `start` so `cancel` can reach them, and `cancel` marks the tail dead and
  detaches the forwarder.
- Unguarded `controller.close()` in `close()` (and the `emit` catch calling
  `close()`) could throw on an already-cancelled stream; that throw propagated
  out of `interceptAgentToolBroadcast`'s forward loop and starved the run's
  sibling tailers of the chunk. `controller.close()` is now guarded and the
  `emit` catch just detaches instead of closing.
- Unguarded `controller.error()` in the catch path could throw if the stream
  was already torn down; now wrapped.

Extracted a shared `detach()` (mirrors Think) so the forwarder-set cleanup is
no longer duplicated across close/error paths. Adds a regression test that
tails a run from two consumers, cancels one, and asserts the sibling still
receives a subsequent broadcast (verified it fails without the fix).

Caught by Devin review on #1827.

Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone threepointone merged commit e5e6b57 into main Jun 28, 2026
7 checks passed
@threepointone threepointone deleted the fix/1589-proxy-sub-agent-tool-events branch June 28, 2026 17:34
@github-actions github-actions Bot mentioned this pull request Jun 28, 2026
@Suman085

Copy link
Copy Markdown

Awesome @threepointone .. Thanks!!

I can see 1 more issue that i've created here -- #1835
If the issue looks right, Happy to create a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sub-agent tool invocation states stuck at input-available when proxying a remote toUIMessageStreamResponse()

2 participants