Skip to content

fix(think,ai-chat): wait for all parallel client-tool results before auto-continuing (#1649)#1651

Merged
threepointone merged 3 commits into
mainfrom
fix/parallel-client-tool-continuation-1649
Jun 2, 2026
Merged

fix(think,ai-chat): wait for all parallel client-tool results before auto-continuing (#1649)#1651
threepointone merged 3 commits into
mainfrom
fix/parallel-client-tool-continuation-1649

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Jun 2, 2026

Fixes #1649.

Problem

When the model emits multiple tool calls in a single step and the client resolves them independently (each addToolOutput sends cf_agent_tool_result with autoContinue: true), a fast tool's result could trigger the next inference while a slower sibling was still input-available.

The server's auto-continuation fired on a short coalesce timer (think 50ms, ai-chat 10ms), not on "is the model's tool-call batch actually complete." So the first result's autoContinue would continue the turn with an incomplete tool-result set:

  • On a build without transcript repair → AI_MissingToolResultsError (the crash the reporter saw).
  • On current main (Think repairs + convertToModelMessages(..., { ignoreIncompleteToolCalls: true })) → arguably worse: the still-in-flight sibling is silently flipped to errored and a spurious extra continuation runs, then the real result lands and churns the transcript.

Repro shape (from the issue):

onToolCall: async ({ toolCall, addToolOutput }) => {
  if (toolCall.toolName === "fastTool") {
    addToolOutput({ toolCallId: toolCall.toolCallId, output: { ok: true } }); // sync
  }
  if (toolCall.toolName === "slowTool") {
    const result = await agent.call("someCallable", [toolCall.input]); // >50ms
    addToolOutput({ toolCallId: toolCall.toolCallId, output: result });
  }
}

Fix

Add a server-side barrier: auto-continuation now holds until the model's parallel tool batch is fully answered, then runs a single continuation — regardless of result arrival order. This uses the transcript state the framework already tracks, so it covers tool results and approvals and needs no counting.

Scoped to the leaf (the key design choice)

The barrier fires only when the latest assistant message is mid-batch — it carries both a settled tool result and a still-pending (input-available/approval-requested) sibling (_hasIncompleteToolBatch()). That is precisely the #1649 signature (the model fanned out parallel calls and only some are answered).

It is intentionally not "any pending interaction anywhere in the transcript." A broad barrier over-blocked a legitimate, pre-existing path — a follow-up continuation when a result lands in a separate assistant message before a stream starts (chat-turn-serialization test). Leaf-scoping fixes #1649 without disturbing that.

Mechanics

  • think (packages/think/src/think.ts): _scheduleAutoContinuation's timer now calls _fireAutoContinuationWhenStable, which gates _fireAutoContinuation behind _awaitToolBatch. The wait runs before the turn is enqueued, so it never occupies the turn queue, and is wrapped in keepAliveWhile so the (typically sub-second) remainder of the batch can land. It awaits the in-flight tool-result apply (_pendingInteractionPromise) so a sibling that lands mid-wait is observed promptly.
  • ai-chat (packages/ai-chat/src/index.ts): _awaitPendingInteractionBarrier runs inside the queued continuation (after _awaitPendingAutoContinuationPrerequisite), because that path runs in-queue and so can't reuse the queue-idle waitUntilStable (it would deadlock). Same _hasIncompleteToolBatch predicate.
  • Bounded by AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (60s): on timeout the continuation proceeds — Think's repair backstop errors the unanswered call, ai-chat surfaces the provider error — so a genuinely orphaned tool (client disconnected mid-batch) never pins the continuation open. A console.warn makes that fall-through visible.

Hardening found during review

  • Double-fire race: during a barrier wait, each sibling result re-arms the 50ms coalesce timer; when the barrier fired the continuation, that leftover timer could enqueue a second continuation in the window before pastCoalesce was set. _fireAutoContinuation now cancels any still-armed timer when it fires, and _fireAutoContinuationWhenStable bails if the continuation is already pastCoalesce. (ai-chat is structurally immune — a sibling during a pending continuation hits the merge branch, not a re-queue.)
  • Stale barrier on clear: resetTurnState now clears the barrier flag, and both wait loops bail when _continuation.pending is gone, so a chat clear / turn reset can't strand the wait holding the isolate.

Why server-side (not the client-side option from the issue)

The issue floated "client only sends autoContinue on the last result." That's fragile (tools resolve independently in onToolCall; the client isn't the transcript's source of truth) and wouldn't protect custom / non-useAgentChat clients. The server already owns the transcript and the completeness invariant, so the barrier belongs there.

Tests

  • New parallel fast/slow coverage in both packages: assert exactly one continuation fires after both results land, and the slow tool is never flipped to errored by a premature repair.
  • Updated the think deferred continuation test and the ai-chat continuation-stream-error test — both previously encoded the old partial-continuation behavior (continue after the first of two parallel tools).
  • Full suites green: think 483, ai-chat 492. Typecheck, oxlint, oxfmt clean.

Follow-up

#1650 tracks making the Think barrier event-driven (return-and-wait-for-the-next-result-event instead of a bounded timer) to avoid the 60s fire-through for human-in-the-loop tools (e.g. ask_user/display_ui emitted in parallel) and the orphan keepAlive hold. Kept symmetric (bounded-wait) with ai-chat here on purpose, since #1642 (N3) wants to unify the duplicated think↔ai-chat layer.

Test plan

Made with Cursor


Open in Devin Review

…auto-continuing (#1649)

When the model emitted multiple tool calls in one step and the client
resolved them independently via `addToolOutput`, a fast result's
`autoContinue` could trigger the next inference while a slower sibling was
still `input-available`. That fed the provider an incomplete tool-result
set (`MissingToolResultsError`) or — via the transcript-repair backstop —
silently flipped the in-flight sibling to errored and ran a spurious extra
continuation.

Auto-continuation now holds on a barrier until the model's parallel tool
batch is fully answered. The barrier is scoped to the *leaf* assistant
message — it fires only when that step is mid-batch (carries both a settled
tool result and a still-pending `input-available`/`approval-requested`
sibling), so an unrelated dangling tool in an earlier message can't block a
legitimate follow-up continuation. The wait awaits the in-flight tool-result
apply (so a sibling landing mid-wait is observed promptly) and is bounded by
`AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS` (60s) so a genuinely orphaned
tool call (client disconnected mid-batch) still falls through to the existing
backstop instead of pinning the continuation open.

- think: gate `_fireAutoContinuation` behind `_fireAutoContinuationWhenStable`
  (runs before enqueue, so it never occupies the turn queue) + `_awaitToolBatch`
  + `_hasIncompleteToolBatch`. Hardened against a double-fire race where a
  sibling result re-arms the coalesce timer during the wait: `_fireAutoContinuation`
  cancels any still-armed timer, and `_fireAutoContinuationWhenStable` bails if
  the continuation is already `pastCoalesce`. `resetTurnState` clears the barrier
  flag so a chat clear can't strand it.
- ai-chat: gate inside the queued continuation via `_awaitPendingInteractionBarrier`
  + `_hasIncompleteToolBatch` (it runs in-queue, so it can't reuse the
  queue-idle `waitUntilStable`).
- Both: a `console.warn` on timeout fall-through so a continuation proceeding
  with an unanswered tool isn't mysterious.

Tests: new parallel fast/slow coverage in both packages (exactly one
continuation after both results land, no premature repair of the slow tool);
updated the think "deferred continuation" and ai-chat stream-error tests,
which encoded the old partial-continuation behavior.

Follow-up tracked in #1650 (event-driven barrier to avoid the bounded
fire-through for human-in-the-loop tools + orphan keepAlive).

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: f7bb27c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@cloudflare/think Patch
@cloudflare/ai-chat Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@threepointone
Copy link
Copy Markdown
Contributor Author

/bonk do a deep review here, look for edge cases and things we might've missed

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 2, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1651

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1651

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1651

hono-agents

npm i https://pkg.pr.new/hono-agents@1651

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1651

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1651

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1651

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1651

commit: f7bb27c

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review — Parallel Tool-Call Barrier (#1649 fix)

Went through the full diff, the surrounding implementation (tool-result application, _pendingInteractionPromise lifecycle, continuation state machine, ContinuationState, keepAliveWhile, turn queue), and analyzed edge cases. Overall this is a well-designed fix — the leaf-scoped barrier predicate is the right call, the bounded wait with timeout fallback is solid, and the double-fire hardening in Think is necessary and correct. Tests are thorough.

A few things I want to flag:


1. _pendingInteractionPromise is a single slot — N≥3 tool calls with near-simultaneous arrivals

Think path (_awaitToolBatch): When the barrier is in its 100ms poll sleep and two tool results arrive ~simultaneously (say medium and slow for a 3-tool batch), the second overwrites _pendingInteractionPromise before the first resolves. The barrier loop only awaits whichever value is current when it wakes up.

This is actually fine in practice because:

  • _applyToolResult is microtask-deferred (Promise.resolve().then(...)) so both results' apply promises are created synchronously during the same event-loop turn
  • The second one overwrites the first, but the first still runs and patches the cache
  • After awaiting the second, _hasIncompleteToolBatch() sees both updates since both patched _cachedMessages
  • The only risk is if the first apply throws and the barrier never observes it — but the error is irrelevant since we continue to re-check _hasIncompleteToolBatch() anyway

However, this is worth noting: the barrier's correctness relies on _hasIncompleteToolBatch() being the true gate, not on awaiting every individual _pendingInteractionPromise. The single-slot pattern is a "wake up sooner" optimization, not a correctness invariant. If that's the intent, a brief comment on the single-slot nature would help future readers understand why this isn't a bug.


2. _hasIncompleteToolBatch[...this.messages].reverse() allocation per poll tick

Both Think and ai-chat do:

const leaf = [...this.messages].reverse().find(m => m.role === "assistant");

This copies + reverses the entire message array on every 100ms poll tick and on every _pendingInteractionPromise resolution during the barrier wait. For long conversations this is wasteful. A backward for loop would be zero-allocation:

const msgs = this.messages;
let leaf: typeof msgs[number] | undefined;
for (let i = msgs.length - 1; i >= 0; i--) {
  if (msgs[i].role === "assistant") { leaf = msgs[i]; break; }
}

Not a correctness issue, but given this runs in a tight loop during a potentially latent barrier, it's worth optimizing — especially since this.messages in Think returns the raw _cachedMessages reference so the array can be large.


3. ai-chat: no _continuationBarrierActive guard — is double-entry possible?

Think has an explicit _continuationBarrierActive flag to prevent multiple concurrent barrier waits. ai-chat does not — its barrier runs inside the turn queue, so the turn queue serialization prevents concurrent entry.

But: the ai-chat barrier is called from _awaitPendingAutoContinuationPrerequisite, which is called inside _queueAutoContinuation's turn. If a second tool result arrives while the first continuation is waiting on the barrier (still in the turn queue), it hits the merge branch in _enqueueAutoContinuation (updating pending.prerequisite), not enqueueing a new turn. So the turn queue + merge semantics structurally prevent double-entry.

This is correct but subtle. The PR description mentions "ai-chat is structurally immune" — accurate, but a brief comment in _awaitPendingInteractionBarrier noting why a guard isn't needed (turn queue serialization) would make the asymmetry between Think and ai-chat more obvious.


4. Timeout + keepAliveWhile holds the isolate for up to 60 seconds on orphan

When a client disconnects mid-batch and a tool result never arrives, the barrier holds for AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (60s) with keepAliveWhile active. This pins the DO alive on the alarm cycle for a full minute before falling through to the repair backstop.

The PR description acknowledges this and #1650 tracks making it event-driven. Just confirming: is 60s a reasonable worst-case hold? For human-in-the-loop tools (mentioned in the follow-up), 60s is quite short — but those tools presumably don't use autoContinue: true from the client, so they shouldn't trigger the barrier at all. Worth verifying that assumption holds.


5. Edge case: all tool calls in the batch fail/error

If the model emits 2 tool calls and the client sends both back as state: "output-error", _hasIncompleteToolBatch correctly treats output-error as settled (it's in the settled set). Good — no issue here.

6. Edge case: tool result arrives after a resetTurnState / chat clear

Think: resetTurnState sets _continuationBarrierActive = false and _continuation.pending = null. The _awaitToolBatch loop checks !this._continuation.pending and bails. The .finally() in _fireAutoContinuationWhenStable checks pending before firing. All correct.

ai-chat: resetTurnState clears _continuation.pending. The barrier checks !this._continuation.pending and returns. The continuation then checks for connection and bails at the if (!connection) guard. Correct.

7. The updated test in client-tools-continuation.test.ts — both results sent ~synchronously

The updated stream-error test now sends both tool results back-to-back before waiting for the continuation. This is the correct behavior under the new barrier, but it also means the test no longer exercises the "second result arrives during an active continuation" path. The old test had a specific ordering (first result → continuation starts → second result arrives during stream). The new ordering (both results → single continuation) is what the fix mandates, but the "result during active stream" path is still a valid scenario for separate assistant messages. Worth confirming that path has test coverage elsewhere (the chat-turn-serialization test mentioned in the PR description).


Summary

The fix is correct. The leaf-scoped _hasIncompleteToolBatch() predicate is the right boundary — it precisely targets the #1649 signature without over-blocking. The double-fire hardening in Think (clearing the timer in _fireAutoContinuation, the pastCoalesce + _continuationBarrierActive guards) is necessary and well-implemented. The ai-chat path is structurally simpler because it runs in-queue.

The two minor suggestions (backward loop instead of spread+reverse, comments on single-slot _pendingInteractionPromise and ai-chat's implicit guard) are polish, not blockers.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +8797 to +8805
} else if (
typeof record.type === "string" &&
record.type.startsWith("tool-") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 _hasIncompleteToolBatch omits dynamic-tool from settled check, breaking barrier for dynamic tool batches

In Think's _hasIncompleteToolBatch, the settled-state check only matches record.type.startsWith("tool-"), but "dynamic-tool".startsWith("tool-") is false. Every other tool-part identification site in Think consistently checks record.type.startsWith("tool-") || record.type === "dynamic-tool" — see _incompleteToolCallIds at packages/think/src/think.ts:1663, _repairInterruptedMessages at packages/think/src/think.ts:1724, and _partialHasSettledToolResults at packages/think/src/think.ts:8030. Meanwhile, the pending side of _hasIncompleteToolBatch has no type filter at all, so a dynamic-tool in input-available IS counted as pending. This asymmetry means if a parallel batch contains a settled dynamic-tool result alongside a still-pending tool-* call, hasSettled stays false and the method returns false — the barrier doesn't hold and the continuation fires prematurely, which is exactly the #1649 scenario this PR is supposed to fix.

Suggested change
} else if (
typeof record.type === "string" &&
record.type.startsWith("tool-") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
} else if (
typeof record.type === "string" &&
(record.type.startsWith("tool-") || record.type === "dynamic-tool") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +2184 to +2192
} else if (
typeof record.type === "string" &&
record.type.startsWith("tool-") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 _hasIncompleteToolBatch in AIChatAgent also omits dynamic-tool from settled check

Same issue as in Think. AIChatAgent's _hasIncompleteToolBatch checks record.type.startsWith("tool-") for settled parts but "dynamic-tool" doesn't match this prefix. AIChatAgent recognizes dynamic-tool as a tool type elsewhere (see packages/ai-chat/src/index.ts:3337: type.startsWith("tool-") || type === "dynamic-tool"). If a dynamic-tool settles and triggers autoContinue while a sibling tool-* call is still input-available, the barrier won't detect the batch as incomplete and the continuation fires prematurely.

Suggested change
} else if (
typeof record.type === "string" &&
record.type.startsWith("tool-") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
} else if (
typeof record.type === "string" &&
(record.type.startsWith("tool-") || record.type === "dynamic-tool") &&
(state === "output-available" ||
state === "output-error" ||
state === "output-denied" ||
state === "approval-responded")
) {
hasSettled = true;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread packages/think/src/think.ts Outdated
*/
private _hasIncompleteToolBatch(): boolean {
const leaf = [...this.messages]
.reverse()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: [...this.messages].reverse().find(...) allocates a full copy + reverse on every 100ms poll tick during the barrier wait. For long conversations this is wasteful. A backward for-loop would be zero-allocation:

const msgs = this.messages;
let leaf: UIMessage | undefined;
for (let i = msgs.length - 1; i >= 0; i--) {
  if (msgs[i].role === "assistant") { leaf = msgs[i]; break; }
}

Not blocking, but this is a hot path during the barrier poll.

Comment thread packages/ai-chat/src/index.ts Outdated
*/
private _hasIncompleteToolBatch(): boolean {
const leaf = [...this.messages]
.reverse()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same spread+reverse nit as in Think — a backward for-loop avoids the allocation.

* continuation turn, so — unlike `waitUntilStable` — it must not wait on the
* turn queue (that would deadlock).
*/
private async _awaitPendingInteractionBarrier(): Promise<void> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a brief comment that this barrier does NOT need a _continuationBarrierActive guard (unlike Think's) because it runs inside the turn queue — the queue serialization structurally prevents concurrent entry. The asymmetry is correct but easy to miss when reading the two implementations side by side.

// The pending continuation was cleared (chat clear / turn reset) — nothing
// to wait for; bail so the isolate isn't held by a stale wait.
if (!this._continuation.pending) return;
const pending = this._pendingInteractionPromise;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_pendingInteractionPromise is a single slot — when N≥3 tool results arrive in quick succession, later arrivals overwrite earlier ones before the barrier observes them. This is correct because the true gate is _hasIncompleteToolBatch() (which reads the patched cache), and the single-slot await is a "wake up sooner" optimization, not a correctness invariant. A brief comment noting that would help future readers understand why the overwrite is not a bug.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Jun 2, 2026

Done. I posted a comprehensive review comment and 4 inline comments on PR #1651. Here's a summary of what I found:

Overall: The fix is correct and well-designed. The leaf-scoped _hasIncompleteToolBatch() predicate precisely targets the #1649 race without over-blocking legitimate continuation paths. The double-fire hardening in Think is necessary and correctly implemented.

Key findings:

  1. _pendingInteractionPromise single-slot with N≥3 tools — Later arrivals overwrite earlier ones, but this is fine because the true correctness gate is _hasIncompleteToolBatch() reading the patched message cache, not awaiting every promise. Suggested a clarifying comment.

  2. [...this.messages].reverse().find(...) allocation — Copies + reverses the full message array on every 100ms poll tick. A backward for-loop would be zero-allocation. Minor but easy fix since this is a hot path.

  3. Think vs ai-chat barrier guard asymmetry — Think needs _continuationBarrierActive because its barrier runs outside the turn queue; ai-chat doesn't because the turn queue serialization prevents concurrent entry. Correct but worth documenting the "why".

  4. 60s keepAliveWhile on orphaned tools — Acknowledged as a known trade-off; Auto-continuation parallel-tool barrier: make Think event-driven (avoid timeout fire-through for human-in-the-loop tools + orphan keepAlive) — follow-up to #1649 #1650 tracks the event-driven improvement. Confirmed human-in-the-loop tools shouldn't trigger this since they don't use autoContinue: true.

  5. Edge cases verified as handled: all-errored batches, resetTurnState during barrier wait, deferred continuation after barrier-gated continuation.

github run

threepointone and others added 2 commits June 2, 2026 09:40
- `_hasIncompleteToolBatch`: replace `[...this.messages].reverse().find(...)`
  with a zero-allocation backward scan — it runs on every barrier poll tick and
  `this.messages` can be large.
- Document that `_pendingInteractionPromise` is a single slot: awaiting it is a
  "wake sooner" optimization, not the correctness gate (`_hasIncompleteToolBatch`
  re-checked each loop is). Clarifies the N>=3 near-simultaneous-arrival case.
- ai-chat: note why no `_continuationBarrierActive`-style guard is needed (the
  barrier runs inside the exclusive continuation turn; a sibling result hits the
  merge branch of `_enqueueAutoContinuation`, so the turn queue serializes it).

No behavior change.

Co-authored-by: Cursor <cursoragent@cursor.com>
…barrier (#1649)

`_hasIncompleteToolBatch` only matched `type.startsWith("tool-")` on the
settled side, but the pending side has no type filter — so a `dynamic-tool`
in `input-available` counted as pending while a settled `dynamic-tool`
(`output-available`/etc.) did NOT count as settled. A parallel batch with a
settled `dynamic-tool` beside a pending `tool-*` would therefore read as
not-mid-batch and the continuation would fire prematurely — the exact #1649
failure, for dynamic-tool batches.

Match `dynamic-tool` alongside `tool-*` on the settled side, consistent with
every other tool-part check in both packages (`_incompleteToolCallIds`,
`_repairInterruptedMessages`, `_partialHasSettledToolResults`, ai-chat's
sanitizer). Regression test added in each package: a settled `dynamic-tool`
beside a pending `tool-*` holds the barrier until the regular tool resolves.

Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone
Copy link
Copy Markdown
Contributor Author

/bonk please review again

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — Parallel Tool-Call Barrier (#1649 fix)

Went through all three commits on the branch (e65bdf0 initial fix, bddfa14 review polish, f7bb27c dynamic-tool fix), the full diff against main, and verified the issues from the first review round.

Previous review items — status

Item Status
dynamic-tool omitted from settled check (Devin flagged) Fixed in f7bb27c — both packages now have record.type.startsWith("tool-") || record.type === "dynamic-tool" on the settled side, with regression tests
[...this.messages].reverse().find(...) allocation (bonk flagged) Fixed in bddfa14 — replaced with zero-allocation backward for loop in both packages
_pendingInteractionPromise single-slot comment Fixed — clear comment added in both _awaitToolBatch (Think) and _awaitPendingInteractionBarrier (ai-chat) explaining the single-slot is a wake-up optimization, not the correctness gate
ai-chat missing _continuationBarrierActive guard comment Fixed — JSDoc on _awaitPendingInteractionBarrier explains why no guard is needed (turn queue serialization)

All four items are resolved.

Architecture verification

Confirmed the following structural properties:

  1. No barrier bypass paths. Every entry into _fireAutoContinuation in Think goes through _fireAutoContinuationWhenStable — the timer callback (line ~8698) and _activateDeferredContinuation (line ~8897) both route through the stable wrapper. No direct calls exist outside the wrapper itself.

  2. Double-fire prevention in Think is correct. _fireAutoContinuationWhenStable has three guards: (a) !pending early return, (b) pastCoalesce check (continuation already running), (c) _continuationBarrierActive check (barrier already waiting). _fireAutoContinuation clears the timer synchronously before any async work. resetTurnState clears _continuationBarrierActive.

  3. ai-chat barrier placement is correct. _awaitPendingInteractionBarrier() is called after the prerequisite loop exits in _awaitPendingAutoContinuationPrerequisite, not inside it. The break change (line ~2101, ~2114) replaces return true to allow fall-through to the barrier. The return true after the barrier (line ~2127) preserves the method contract.

  4. resetTurnState teardown is complete in both packages. Think clears _continuationBarrierActive, _continuationTimer, _pendingInteractionPromise, and calls _continuation.clearAll(). ai-chat clears _pendingInteractionPromise, resets the turn queue, and calls _continuation.clearAll(). Both barrier loops check !this._continuation.pending at the top of each iteration, so a chat clear during a barrier wait causes immediate bail-out.

Test coverage

The new tests are well-structured:

  • Think: 3 new test cases — fast/slow parallel batch, dynamic-tool mixed batch, and deferred-continuation coalescing (updated from the old "deferred tool result runs after active continuation" test that encoded the pre-barrier behavior).
  • ai-chat: 2 new test cases — fast/slow parallel batch, dynamic-tool mixed batch. Plus the stream-error continuation test updated to send both results before expecting the continuation.

All tests verify: (a) zero continuations after the first result + delay, (b) exactly one continuation after the batch completes, (c) no spurious repair of the slow sibling.

Test coverage gaps (non-blocking)

Three scenarios lack test coverage. None are blocking for this PR but are worth noting for follow-up:

  1. 60s timeout path. No test exercises AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS firing when a tool result genuinely never arrives. The console.warn + fall-through-to-repair path is untested. Testing with a reduced timeout constant would be straightforward and would catch regressions where the timeout breaks (e.g., the barrier hangs forever on orphaned tools).

  2. resetTurnState / chat clear during active barrier wait. The barrier loop's !this._continuation.pending bail-out is the safety valve for this, but no test sends a chat clear while the barrier is actively polling between two tool results. A regression here would cause the barrier to hold the isolate after a clear.

  3. N >= 3 parallel tool calls. All barrier tests use exactly 2 tools. The barrier logic is a simple "any pending AND any settled = hold" scan so 2 tools is likely sufficient, but a 3-tool test would catch potential off-by-one issues with the settled/pending detection (e.g., "all tools must be settled" vs "at least one settled" confusion).

Verdict

The fix is correct and complete. The leaf-scoped _hasIncompleteToolBatch() predicate precisely targets the #1649 race without over-blocking. The three-layer guard in Think (pastCoalesce / barrierActive / timer clearing) prevents double-fires. The ai-chat design correctly relies on turn-queue serialization instead. The dynamic-tool fix and allocation optimization from review round 1 are both landed.

The test gaps above are minor — consider them for follow-up alongside #1650 (event-driven barrier).

Looks good.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Jun 2, 2026

Review posted on PR #1651. Here's the summary:

All four items from the first review round are resolved — the dynamic-tool fix, the allocation optimization, and the two documentation comments are all landed across the three commits.

The fix is correct. The leaf-scoped _hasIncompleteToolBatch() barrier precisely targets the #1649 race (fast tool result triggering continuation while slow sibling is still in-flight) without over-blocking legitimate continuation paths. The double-fire prevention in Think (three-layer guard + timer clearing) and ai-chat's reliance on turn-queue serialization are both sound.

Three minor test gaps noted (non-blocking): no coverage for the 60s timeout path, no resetTurnState during active barrier wait test, and no N>=3 parallel tool calls test. These are follow-up material alongside #1650.

github run

@threepointone threepointone merged commit d118d11 into main Jun 2, 2026
4 checks passed
@threepointone threepointone deleted the fix/parallel-client-tool-continuation-1649 branch June 2, 2026 08:55
threepointone added a commit that referenced this pull request Jun 2, 2026
…1649 follow-up) (#1657)

* fix(think): serialize parallel client-tool result/approval applies (#1649 follow-up)

## Problem

The auto-continuation barrier added in #1651 stopped *premature*
continuation, but a deeper data race remained in Think.

Each `cf_agent_tool_result` / `cf_agent_tool_approval` WebSocket message
triggers a read-modify-write of the whole assistant message via
`_applyToolUpdateToMessages`, which awaits a storage read
(`await this.session.getHistory()`) before writing. When the model fans
out parallel tool calls in a single assistant step, the per-result
applies run concurrently:

  1. apply A and apply B both `await` the storage read and resolve with
     the same snapshot — every sibling still `input-available`.
  2. A patches only its own part and writes the full message back.
  3. B (holding the stale snapshot) patches only *its* part and writes
     the full message back, clobbering A's result to `input-available`.

The clobber lands in both durable storage and the `this.messages` cache,
so `_hasIncompleteToolBatch()` keeps seeing an unanswered sibling. The
continuation barrier then times out and the transcript-repair backstop
errors the lost call with "The tool call was interrupted before a result
was recorded."

## Fix

Introduce a per-instance serialization tail (`_interactionApplyTail`)
and route every result/approval apply through `_enqueueInteractionApply`,
which chains each apply off the previous one. This makes each
read-modify-write atomic and ordered, so siblings can no longer clobber
each other.

- `.then(run, run)` is used so a rejected apply can't poison the rest of
  the batch.
- `_pendingInteractionPromise` still tracks the newest link; because the
  chain is serial, the barrier's single-slot wake-up transitively waits
  for every predecessor.
- `resetTurnState` drops the tail so a new turn never queues behind a
  stale (possibly hung) apply from a reset turn.

## ai-chat (defensive)

The same serialization is added to `@cloudflare/ai-chat` for symmetry.
ai-chat does **not** exhibit this bug today: `_findAndUpdateToolPart`
performs the read-modify-write synchronously (no await between reading
`this.messages` and the synchronous SQLite write), so concurrent applies
cannot interleave. The queue is a forward-guard that keeps the invariant
safe if that apply ever gains an await (e.g. async storage).

## Tests

- Added a deterministic serialization test in both packages
  (`testInteractionApplySerialization`) that drives two overlapping
  read-modify-writes through the queue. Without serialization the result
  is 1 (clobbered); serialized it is 2. Verified the Think test FAILS
  without the fix (`expected 1 to be 2`).
- Added an integration test in Think asserting three concurrently-sent
  parallel results all settle to `output-available`, none is repaired,
  and exactly one continuation runs.
- `_enqueueInteractionApply` is `protected` + `@internal` so the test
  agents can drive it deterministically (the vitest-workers harness
  can't reproduce the real storage interleave because DO SQLite is
  synchronous).

All checks pass: format, oxlint, typecheck (91/91), think (486) and
ai-chat (566) suites.

* fix(think): apply client-tool results/approvals that arrive mid-stream (#1649 follow-up)

## Problem

The serialization fix (prior commit) stopped parallel client-tool results
from clobbering each other, but the reporter still hit "The tool call was
interrupted before a result was recorded." on every client-resolved tool
call — even single (non-parallel) ones.

Root cause is an ordering inversion between streaming and persistence:

  - During a streaming turn the assistant message lives ONLY in the
    in-flight `StreamAccumulator`. It is not written to durable storage
    (or added to `this.messages`) until `_persistAssistantMessage` runs at
    the end of the stream.
  - The `tool-input-available` chunk, however, is broadcast to the client
    mid-stream. A fast client resolves the tool and sends
    `cf_agent_tool_result` BEFORE the message is ever persisted.
  - `_applyToolUpdateToMessages` only scanned durable storage, so the
    apply found no matching message and silently no-op'd.
  - The end-of-stream persist then wrote the tool part as
    `input-available`, and the auto-continuation's transcript-repair
    backstop errored it as "interrupted".

This is why "parallel sometimes works": a slow sibling (e.g. an RPC tool)
resolves AFTER the stream has finished and persisted, so its apply finds
the message in storage.

## Fix

`_applyToolUpdateToMessages` now applies the update to BOTH:

  1. the in-flight `StreamAccumulator` (in place), so the result rides into
     the eventual end-of-stream persist instead of being dropped; and
  2. durable storage (unchanged), which still handles already-persisted
     messages — including partials written mid-stream by stall recovery and
     cross-message tool results that target an earlier message.

Applying to both is deliberately monotonic: a later persist of
`accumulator.toMessage()` (e.g. stall-recovery partial persist) carries the
settled result and cannot downgrade it back to `input-available`.

The accumulator is exposed via a new `_streamingAssistant` field for the
duration of each streaming turn and cleared on every exit path
(`_streamResult` normal + stall-scheduled/exhausted early returns,
`_streamResultToRpcCallback` finally) and in `resetTurnState`. The field is
intentionally ephemeral and never persisted; it is only ever set while the
isolate is held alive by `keepAliveWhile`, so hibernation cannot strand it.

This mirrors `@cloudflare/ai-chat`'s existing `_streamingMessage` handling,
generalized to also cover the post-persist path. It fixes both client tool
RESULTS and APPROVALS, since both flow through `_applyToolUpdateToMessages`.

## Why correctness holds for the auto-continuation barrier

Mid-stream, the in-flight message is not yet in `this.messages`, so
`_hasIncompleteToolBatch()` inspects a stale (prior) leaf. This does NOT
break correctness: the continuation is enqueued behind the active streaming
turn on the turn queue, and that turn persists the now-settled result before
the continuation's transcript repair runs. The ordering guarantee is the
turn queue, not the barrier's cache view. Documented inline.

## Tests

- Deterministic reproduction (`simulateMidStreamClientToolResult`): a tool
  call exists only in the accumulator when the result is applied, then the
  message is persisted. Asserts the part ends `output-available`. Fails on
  the pre-fix code (persists `input-available`).
- Deterministic approval variant (`simulateMidStreamClientToolApproval`):
  an `approval-requested` part reaches `approval-responded` via the same
  path.
- Full-pipeline integration test over the real WS protocol: a new slow
  client-tool mock model finalizes the tool call (emits `tool-call`) then
  holds the stream open with trailing text. The test waits via
  `streamingToolCallState` until the call is exposed, sends
  `cf_agent_tool_result` while the stream is still open, then asserts the
  part is `output-available`, carries the output, and that NO
  `chat:transcript:repaired` fired for it.

E2E (wrangler-spawning) was assessed and deferred: the integration test
already exercises the full server pipeline in the real Workers runtime and
times the mid-stream send deterministically; the e2e harness here is built
for process-kill/eviction scenarios where a timing-based test would be
flaky and largely redundant.

## Verification

- format, oxlint (0/0), typecheck (91/91 projects)
- full think suite: 488 passed (pre-additions); client-tools: 61 passed
  including the 3 new tests
@github-actions github-actions Bot mentioned this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_scheduleAutoContinuation 50ms timer fires before all parallel client-tool results arrive → MissingToolResultsError

1 participant