fix(think,ai-chat): wait for all parallel client-tool results before auto-continuing (#1649)#1651
Conversation
…auto-continuing (#1649) When the model emitted multiple tool calls in one step and the client resolved them independently via `addToolOutput`, a fast result's `autoContinue` could trigger the next inference while a slower sibling was still `input-available`. That fed the provider an incomplete tool-result set (`MissingToolResultsError`) or — via the transcript-repair backstop — silently flipped the in-flight sibling to errored and ran a spurious extra continuation. Auto-continuation now holds on a barrier until the model's parallel tool batch is fully answered. The barrier is scoped to the *leaf* assistant message — it fires only when that step is mid-batch (carries both a settled tool result and a still-pending `input-available`/`approval-requested` sibling), so an unrelated dangling tool in an earlier message can't block a legitimate follow-up continuation. The wait awaits the in-flight tool-result apply (so a sibling landing mid-wait is observed promptly) and is bounded by `AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS` (60s) so a genuinely orphaned tool call (client disconnected mid-batch) still falls through to the existing backstop instead of pinning the continuation open. - think: gate `_fireAutoContinuation` behind `_fireAutoContinuationWhenStable` (runs before enqueue, so it never occupies the turn queue) + `_awaitToolBatch` + `_hasIncompleteToolBatch`. Hardened against a double-fire race where a sibling result re-arms the coalesce timer during the wait: `_fireAutoContinuation` cancels any still-armed timer, and `_fireAutoContinuationWhenStable` bails if the continuation is already `pastCoalesce`. `resetTurnState` clears the barrier flag so a chat clear can't strand it. - ai-chat: gate inside the queued continuation via `_awaitPendingInteractionBarrier` + `_hasIncompleteToolBatch` (it runs in-queue, so it can't reuse the queue-idle `waitUntilStable`). - Both: a `console.warn` on timeout fall-through so a continuation proceeding with an unanswered tool isn't mysterious. Tests: new parallel fast/slow coverage in both packages (exactly one continuation after both results land, no premature repair of the slow tool); updated the think "deferred continuation" and ai-chat stream-error tests, which encoded the old partial-continuation behavior. Follow-up tracked in #1650 (event-driven barrier to avoid the bounded fire-through for human-in-the-loop tools + orphan keepAlive). Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: f7bb27c The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
/bonk do a deep review here, look for edge cases and things we might've missed |
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
There was a problem hiding this comment.
Deep Review — Parallel Tool-Call Barrier (#1649 fix)
Went through the full diff, the surrounding implementation (tool-result application, _pendingInteractionPromise lifecycle, continuation state machine, ContinuationState, keepAliveWhile, turn queue), and analyzed edge cases. Overall this is a well-designed fix — the leaf-scoped barrier predicate is the right call, the bounded wait with timeout fallback is solid, and the double-fire hardening in Think is necessary and correct. Tests are thorough.
A few things I want to flag:
1. _pendingInteractionPromise is a single slot — N≥3 tool calls with near-simultaneous arrivals
Think path (_awaitToolBatch): When the barrier is in its 100ms poll sleep and two tool results arrive ~simultaneously (say medium and slow for a 3-tool batch), the second overwrites _pendingInteractionPromise before the first resolves. The barrier loop only awaits whichever value is current when it wakes up.
This is actually fine in practice because:
_applyToolResultis microtask-deferred (Promise.resolve().then(...)) so both results' apply promises are created synchronously during the same event-loop turn- The second one overwrites the first, but the first still runs and patches the cache
- After awaiting the second,
_hasIncompleteToolBatch()sees both updates since both patched_cachedMessages - The only risk is if the first apply throws and the barrier never observes it — but the error is irrelevant since we
continueto re-check_hasIncompleteToolBatch()anyway
However, this is worth noting: the barrier's correctness relies on _hasIncompleteToolBatch() being the true gate, not on awaiting every individual _pendingInteractionPromise. The single-slot pattern is a "wake up sooner" optimization, not a correctness invariant. If that's the intent, a brief comment on the single-slot nature would help future readers understand why this isn't a bug.
2. _hasIncompleteToolBatch — [...this.messages].reverse() allocation per poll tick
Both Think and ai-chat do:
const leaf = [...this.messages].reverse().find(m => m.role === "assistant");This copies + reverses the entire message array on every 100ms poll tick and on every _pendingInteractionPromise resolution during the barrier wait. For long conversations this is wasteful. A backward for loop would be zero-allocation:
const msgs = this.messages;
let leaf: typeof msgs[number] | undefined;
for (let i = msgs.length - 1; i >= 0; i--) {
if (msgs[i].role === "assistant") { leaf = msgs[i]; break; }
}Not a correctness issue, but given this runs in a tight loop during a potentially latent barrier, it's worth optimizing — especially since this.messages in Think returns the raw _cachedMessages reference so the array can be large.
3. ai-chat: no _continuationBarrierActive guard — is double-entry possible?
Think has an explicit _continuationBarrierActive flag to prevent multiple concurrent barrier waits. ai-chat does not — its barrier runs inside the turn queue, so the turn queue serialization prevents concurrent entry.
But: the ai-chat barrier is called from _awaitPendingAutoContinuationPrerequisite, which is called inside _queueAutoContinuation's turn. If a second tool result arrives while the first continuation is waiting on the barrier (still in the turn queue), it hits the merge branch in _enqueueAutoContinuation (updating pending.prerequisite), not enqueueing a new turn. So the turn queue + merge semantics structurally prevent double-entry.
This is correct but subtle. The PR description mentions "ai-chat is structurally immune" — accurate, but a brief comment in _awaitPendingInteractionBarrier noting why a guard isn't needed (turn queue serialization) would make the asymmetry between Think and ai-chat more obvious.
4. Timeout + keepAliveWhile holds the isolate for up to 60 seconds on orphan
When a client disconnects mid-batch and a tool result never arrives, the barrier holds for AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (60s) with keepAliveWhile active. This pins the DO alive on the alarm cycle for a full minute before falling through to the repair backstop.
The PR description acknowledges this and #1650 tracks making it event-driven. Just confirming: is 60s a reasonable worst-case hold? For human-in-the-loop tools (mentioned in the follow-up), 60s is quite short — but those tools presumably don't use autoContinue: true from the client, so they shouldn't trigger the barrier at all. Worth verifying that assumption holds.
5. Edge case: all tool calls in the batch fail/error
If the model emits 2 tool calls and the client sends both back as state: "output-error", _hasIncompleteToolBatch correctly treats output-error as settled (it's in the settled set). Good — no issue here.
6. Edge case: tool result arrives after a resetTurnState / chat clear
Think: resetTurnState sets _continuationBarrierActive = false and _continuation.pending = null. The _awaitToolBatch loop checks !this._continuation.pending and bails. The .finally() in _fireAutoContinuationWhenStable checks pending before firing. All correct.
ai-chat: resetTurnState clears _continuation.pending. The barrier checks !this._continuation.pending and returns. The continuation then checks for connection and bails at the if (!connection) guard. Correct.
7. The updated test in client-tools-continuation.test.ts — both results sent ~synchronously
The updated stream-error test now sends both tool results back-to-back before waiting for the continuation. This is the correct behavior under the new barrier, but it also means the test no longer exercises the "second result arrives during an active continuation" path. The old test had a specific ordering (first result → continuation starts → second result arrives during stream). The new ordering (both results → single continuation) is what the fix mandates, but the "result during active stream" path is still a valid scenario for separate assistant messages. Worth confirming that path has test coverage elsewhere (the chat-turn-serialization test mentioned in the PR description).
Summary
The fix is correct. The leaf-scoped _hasIncompleteToolBatch() predicate is the right boundary — it precisely targets the #1649 signature without over-blocking. The double-fire hardening in Think (clearing the timer in _fireAutoContinuation, the pastCoalesce + _continuationBarrierActive guards) is necessary and well-implemented. The ai-chat path is structurally simpler because it runs in-queue.
The two minor suggestions (backward loop instead of spread+reverse, comments on single-slot _pendingInteractionPromise and ai-chat's implicit guard) are polish, not blockers.
| } else if ( | ||
| typeof record.type === "string" && | ||
| record.type.startsWith("tool-") && | ||
| (state === "output-available" || | ||
| state === "output-error" || | ||
| state === "output-denied" || | ||
| state === "approval-responded") | ||
| ) { | ||
| hasSettled = true; |
There was a problem hiding this comment.
🟡 _hasIncompleteToolBatch omits dynamic-tool from settled check, breaking barrier for dynamic tool batches
In Think's _hasIncompleteToolBatch, the settled-state check only matches record.type.startsWith("tool-"), but "dynamic-tool".startsWith("tool-") is false. Every other tool-part identification site in Think consistently checks record.type.startsWith("tool-") || record.type === "dynamic-tool" — see _incompleteToolCallIds at packages/think/src/think.ts:1663, _repairInterruptedMessages at packages/think/src/think.ts:1724, and _partialHasSettledToolResults at packages/think/src/think.ts:8030. Meanwhile, the pending side of _hasIncompleteToolBatch has no type filter at all, so a dynamic-tool in input-available IS counted as pending. This asymmetry means if a parallel batch contains a settled dynamic-tool result alongside a still-pending tool-* call, hasSettled stays false and the method returns false — the barrier doesn't hold and the continuation fires prematurely, which is exactly the #1649 scenario this PR is supposed to fix.
| } else if ( | |
| typeof record.type === "string" && | |
| record.type.startsWith("tool-") && | |
| (state === "output-available" || | |
| state === "output-error" || | |
| state === "output-denied" || | |
| state === "approval-responded") | |
| ) { | |
| hasSettled = true; | |
| } else if ( | |
| typeof record.type === "string" && | |
| (record.type.startsWith("tool-") || record.type === "dynamic-tool") && | |
| (state === "output-available" || | |
| state === "output-error" || | |
| state === "output-denied" || | |
| state === "approval-responded") | |
| ) { | |
| hasSettled = true; |
Was this helpful? React with 👍 or 👎 to provide feedback.
| } else if ( | ||
| typeof record.type === "string" && | ||
| record.type.startsWith("tool-") && | ||
| (state === "output-available" || | ||
| state === "output-error" || | ||
| state === "output-denied" || | ||
| state === "approval-responded") | ||
| ) { | ||
| hasSettled = true; |
There was a problem hiding this comment.
🟡 _hasIncompleteToolBatch in AIChatAgent also omits dynamic-tool from settled check
Same issue as in Think. AIChatAgent's _hasIncompleteToolBatch checks record.type.startsWith("tool-") for settled parts but "dynamic-tool" doesn't match this prefix. AIChatAgent recognizes dynamic-tool as a tool type elsewhere (see packages/ai-chat/src/index.ts:3337: type.startsWith("tool-") || type === "dynamic-tool"). If a dynamic-tool settles and triggers autoContinue while a sibling tool-* call is still input-available, the barrier won't detect the batch as incomplete and the continuation fires prematurely.
| } else if ( | |
| typeof record.type === "string" && | |
| record.type.startsWith("tool-") && | |
| (state === "output-available" || | |
| state === "output-error" || | |
| state === "output-denied" || | |
| state === "approval-responded") | |
| ) { | |
| hasSettled = true; | |
| } else if ( | |
| typeof record.type === "string" && | |
| (record.type.startsWith("tool-") || record.type === "dynamic-tool") && | |
| (state === "output-available" || | |
| state === "output-error" || | |
| state === "output-denied" || | |
| state === "approval-responded") | |
| ) { | |
| hasSettled = true; |
Was this helpful? React with 👍 or 👎 to provide feedback.
| */ | ||
| private _hasIncompleteToolBatch(): boolean { | ||
| const leaf = [...this.messages] | ||
| .reverse() |
There was a problem hiding this comment.
Nit: [...this.messages].reverse().find(...) allocates a full copy + reverse on every 100ms poll tick during the barrier wait. For long conversations this is wasteful. A backward for-loop would be zero-allocation:
const msgs = this.messages;
let leaf: UIMessage | undefined;
for (let i = msgs.length - 1; i >= 0; i--) {
if (msgs[i].role === "assistant") { leaf = msgs[i]; break; }
}Not blocking, but this is a hot path during the barrier poll.
| */ | ||
| private _hasIncompleteToolBatch(): boolean { | ||
| const leaf = [...this.messages] | ||
| .reverse() |
There was a problem hiding this comment.
Same spread+reverse nit as in Think — a backward for-loop avoids the allocation.
| * continuation turn, so — unlike `waitUntilStable` — it must not wait on the | ||
| * turn queue (that would deadlock). | ||
| */ | ||
| private async _awaitPendingInteractionBarrier(): Promise<void> { |
There was a problem hiding this comment.
Worth a brief comment that this barrier does NOT need a _continuationBarrierActive guard (unlike Think's) because it runs inside the turn queue — the queue serialization structurally prevents concurrent entry. The asymmetry is correct but easy to miss when reading the two implementations side by side.
| // The pending continuation was cleared (chat clear / turn reset) — nothing | ||
| // to wait for; bail so the isolate isn't held by a stale wait. | ||
| if (!this._continuation.pending) return; | ||
| const pending = this._pendingInteractionPromise; |
There was a problem hiding this comment.
_pendingInteractionPromise is a single slot — when N≥3 tool results arrive in quick succession, later arrivals overwrite earlier ones before the barrier observes them. This is correct because the true gate is _hasIncompleteToolBatch() (which reads the patched cache), and the single-slot await is a "wake up sooner" optimization, not a correctness invariant. A brief comment noting that would help future readers understand why the overwrite is not a bug.
|
Done. I posted a comprehensive review comment and 4 inline comments on PR #1651. Here's a summary of what I found: Overall: The fix is correct and well-designed. The leaf-scoped Key findings:
|
- `_hasIncompleteToolBatch`: replace `[...this.messages].reverse().find(...)` with a zero-allocation backward scan — it runs on every barrier poll tick and `this.messages` can be large. - Document that `_pendingInteractionPromise` is a single slot: awaiting it is a "wake sooner" optimization, not the correctness gate (`_hasIncompleteToolBatch` re-checked each loop is). Clarifies the N>=3 near-simultaneous-arrival case. - ai-chat: note why no `_continuationBarrierActive`-style guard is needed (the barrier runs inside the exclusive continuation turn; a sibling result hits the merge branch of `_enqueueAutoContinuation`, so the turn queue serializes it). No behavior change. Co-authored-by: Cursor <cursoragent@cursor.com>
…barrier (#1649) `_hasIncompleteToolBatch` only matched `type.startsWith("tool-")` on the settled side, but the pending side has no type filter — so a `dynamic-tool` in `input-available` counted as pending while a settled `dynamic-tool` (`output-available`/etc.) did NOT count as settled. A parallel batch with a settled `dynamic-tool` beside a pending `tool-*` would therefore read as not-mid-batch and the continuation would fire prematurely — the exact #1649 failure, for dynamic-tool batches. Match `dynamic-tool` alongside `tool-*` on the settled side, consistent with every other tool-part check in both packages (`_incompleteToolCallIds`, `_repairInterruptedMessages`, `_partialHasSettledToolResults`, ai-chat's sanitizer). Regression test added in each package: a settled `dynamic-tool` beside a pending `tool-*` holds the barrier until the regular tool resolves. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/bonk please review again |
There was a problem hiding this comment.
Re-review — Parallel Tool-Call Barrier (#1649 fix)
Went through all three commits on the branch (e65bdf0 initial fix, bddfa14 review polish, f7bb27c dynamic-tool fix), the full diff against main, and verified the issues from the first review round.
Previous review items — status
| Item | Status |
|---|---|
dynamic-tool omitted from settled check (Devin flagged) |
Fixed in f7bb27c — both packages now have record.type.startsWith("tool-") || record.type === "dynamic-tool" on the settled side, with regression tests |
[...this.messages].reverse().find(...) allocation (bonk flagged) |
Fixed in bddfa14 — replaced with zero-allocation backward for loop in both packages |
_pendingInteractionPromise single-slot comment |
Fixed — clear comment added in both _awaitToolBatch (Think) and _awaitPendingInteractionBarrier (ai-chat) explaining the single-slot is a wake-up optimization, not the correctness gate |
ai-chat missing _continuationBarrierActive guard comment |
Fixed — JSDoc on _awaitPendingInteractionBarrier explains why no guard is needed (turn queue serialization) |
All four items are resolved.
Architecture verification
Confirmed the following structural properties:
-
No barrier bypass paths. Every entry into
_fireAutoContinuationin Think goes through_fireAutoContinuationWhenStable— the timer callback (line ~8698) and_activateDeferredContinuation(line ~8897) both route through the stable wrapper. No direct calls exist outside the wrapper itself. -
Double-fire prevention in Think is correct.
_fireAutoContinuationWhenStablehas three guards: (a)!pendingearly return, (b)pastCoalescecheck (continuation already running), (c)_continuationBarrierActivecheck (barrier already waiting)._fireAutoContinuationclears the timer synchronously before any async work.resetTurnStateclears_continuationBarrierActive. -
ai-chat barrier placement is correct.
_awaitPendingInteractionBarrier()is called after the prerequisite loop exits in_awaitPendingAutoContinuationPrerequisite, not inside it. Thebreakchange (line ~2101, ~2114) replacesreturn trueto allow fall-through to the barrier. Thereturn trueafter the barrier (line ~2127) preserves the method contract. -
resetTurnStateteardown is complete in both packages. Think clears_continuationBarrierActive,_continuationTimer,_pendingInteractionPromise, and calls_continuation.clearAll(). ai-chat clears_pendingInteractionPromise, resets the turn queue, and calls_continuation.clearAll(). Both barrier loops check!this._continuation.pendingat the top of each iteration, so a chat clear during a barrier wait causes immediate bail-out.
Test coverage
The new tests are well-structured:
- Think: 3 new test cases — fast/slow parallel batch, dynamic-tool mixed batch, and deferred-continuation coalescing (updated from the old "deferred tool result runs after active continuation" test that encoded the pre-barrier behavior).
- ai-chat: 2 new test cases — fast/slow parallel batch, dynamic-tool mixed batch. Plus the stream-error continuation test updated to send both results before expecting the continuation.
All tests verify: (a) zero continuations after the first result + delay, (b) exactly one continuation after the batch completes, (c) no spurious repair of the slow sibling.
Test coverage gaps (non-blocking)
Three scenarios lack test coverage. None are blocking for this PR but are worth noting for follow-up:
-
60s timeout path. No test exercises
AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MSfiring when a tool result genuinely never arrives. Theconsole.warn+ fall-through-to-repair path is untested. Testing with a reduced timeout constant would be straightforward and would catch regressions where the timeout breaks (e.g., the barrier hangs forever on orphaned tools). -
resetTurnState/ chat clear during active barrier wait. The barrier loop's!this._continuation.pendingbail-out is the safety valve for this, but no test sends a chat clear while the barrier is actively polling between two tool results. A regression here would cause the barrier to hold the isolate after a clear. -
N >= 3 parallel tool calls. All barrier tests use exactly 2 tools. The barrier logic is a simple "any pending AND any settled = hold" scan so 2 tools is likely sufficient, but a 3-tool test would catch potential off-by-one issues with the settled/pending detection (e.g., "all tools must be settled" vs "at least one settled" confusion).
Verdict
The fix is correct and complete. The leaf-scoped _hasIncompleteToolBatch() predicate precisely targets the #1649 race without over-blocking. The three-layer guard in Think (pastCoalesce / barrierActive / timer clearing) prevents double-fires. The ai-chat design correctly relies on turn-queue serialization instead. The dynamic-tool fix and allocation optimization from review round 1 are both landed.
The test gaps above are minor — consider them for follow-up alongside #1650 (event-driven barrier).
Looks good.
|
Review posted on PR #1651. Here's the summary: All four items from the first review round are resolved — the dynamic-tool fix, the allocation optimization, and the two documentation comments are all landed across the three commits. The fix is correct. The leaf-scoped Three minor test gaps noted (non-blocking): no coverage for the 60s timeout path, no |
…1649 follow-up) (#1657) * fix(think): serialize parallel client-tool result/approval applies (#1649 follow-up) ## Problem The auto-continuation barrier added in #1651 stopped *premature* continuation, but a deeper data race remained in Think. Each `cf_agent_tool_result` / `cf_agent_tool_approval` WebSocket message triggers a read-modify-write of the whole assistant message via `_applyToolUpdateToMessages`, which awaits a storage read (`await this.session.getHistory()`) before writing. When the model fans out parallel tool calls in a single assistant step, the per-result applies run concurrently: 1. apply A and apply B both `await` the storage read and resolve with the same snapshot — every sibling still `input-available`. 2. A patches only its own part and writes the full message back. 3. B (holding the stale snapshot) patches only *its* part and writes the full message back, clobbering A's result to `input-available`. The clobber lands in both durable storage and the `this.messages` cache, so `_hasIncompleteToolBatch()` keeps seeing an unanswered sibling. The continuation barrier then times out and the transcript-repair backstop errors the lost call with "The tool call was interrupted before a result was recorded." ## Fix Introduce a per-instance serialization tail (`_interactionApplyTail`) and route every result/approval apply through `_enqueueInteractionApply`, which chains each apply off the previous one. This makes each read-modify-write atomic and ordered, so siblings can no longer clobber each other. - `.then(run, run)` is used so a rejected apply can't poison the rest of the batch. - `_pendingInteractionPromise` still tracks the newest link; because the chain is serial, the barrier's single-slot wake-up transitively waits for every predecessor. - `resetTurnState` drops the tail so a new turn never queues behind a stale (possibly hung) apply from a reset turn. ## ai-chat (defensive) The same serialization is added to `@cloudflare/ai-chat` for symmetry. ai-chat does **not** exhibit this bug today: `_findAndUpdateToolPart` performs the read-modify-write synchronously (no await between reading `this.messages` and the synchronous SQLite write), so concurrent applies cannot interleave. The queue is a forward-guard that keeps the invariant safe if that apply ever gains an await (e.g. async storage). ## Tests - Added a deterministic serialization test in both packages (`testInteractionApplySerialization`) that drives two overlapping read-modify-writes through the queue. Without serialization the result is 1 (clobbered); serialized it is 2. Verified the Think test FAILS without the fix (`expected 1 to be 2`). - Added an integration test in Think asserting three concurrently-sent parallel results all settle to `output-available`, none is repaired, and exactly one continuation runs. - `_enqueueInteractionApply` is `protected` + `@internal` so the test agents can drive it deterministically (the vitest-workers harness can't reproduce the real storage interleave because DO SQLite is synchronous). All checks pass: format, oxlint, typecheck (91/91), think (486) and ai-chat (566) suites. * fix(think): apply client-tool results/approvals that arrive mid-stream (#1649 follow-up) ## Problem The serialization fix (prior commit) stopped parallel client-tool results from clobbering each other, but the reporter still hit "The tool call was interrupted before a result was recorded." on every client-resolved tool call — even single (non-parallel) ones. Root cause is an ordering inversion between streaming and persistence: - During a streaming turn the assistant message lives ONLY in the in-flight `StreamAccumulator`. It is not written to durable storage (or added to `this.messages`) until `_persistAssistantMessage` runs at the end of the stream. - The `tool-input-available` chunk, however, is broadcast to the client mid-stream. A fast client resolves the tool and sends `cf_agent_tool_result` BEFORE the message is ever persisted. - `_applyToolUpdateToMessages` only scanned durable storage, so the apply found no matching message and silently no-op'd. - The end-of-stream persist then wrote the tool part as `input-available`, and the auto-continuation's transcript-repair backstop errored it as "interrupted". This is why "parallel sometimes works": a slow sibling (e.g. an RPC tool) resolves AFTER the stream has finished and persisted, so its apply finds the message in storage. ## Fix `_applyToolUpdateToMessages` now applies the update to BOTH: 1. the in-flight `StreamAccumulator` (in place), so the result rides into the eventual end-of-stream persist instead of being dropped; and 2. durable storage (unchanged), which still handles already-persisted messages — including partials written mid-stream by stall recovery and cross-message tool results that target an earlier message. Applying to both is deliberately monotonic: a later persist of `accumulator.toMessage()` (e.g. stall-recovery partial persist) carries the settled result and cannot downgrade it back to `input-available`. The accumulator is exposed via a new `_streamingAssistant` field for the duration of each streaming turn and cleared on every exit path (`_streamResult` normal + stall-scheduled/exhausted early returns, `_streamResultToRpcCallback` finally) and in `resetTurnState`. The field is intentionally ephemeral and never persisted; it is only ever set while the isolate is held alive by `keepAliveWhile`, so hibernation cannot strand it. This mirrors `@cloudflare/ai-chat`'s existing `_streamingMessage` handling, generalized to also cover the post-persist path. It fixes both client tool RESULTS and APPROVALS, since both flow through `_applyToolUpdateToMessages`. ## Why correctness holds for the auto-continuation barrier Mid-stream, the in-flight message is not yet in `this.messages`, so `_hasIncompleteToolBatch()` inspects a stale (prior) leaf. This does NOT break correctness: the continuation is enqueued behind the active streaming turn on the turn queue, and that turn persists the now-settled result before the continuation's transcript repair runs. The ordering guarantee is the turn queue, not the barrier's cache view. Documented inline. ## Tests - Deterministic reproduction (`simulateMidStreamClientToolResult`): a tool call exists only in the accumulator when the result is applied, then the message is persisted. Asserts the part ends `output-available`. Fails on the pre-fix code (persists `input-available`). - Deterministic approval variant (`simulateMidStreamClientToolApproval`): an `approval-requested` part reaches `approval-responded` via the same path. - Full-pipeline integration test over the real WS protocol: a new slow client-tool mock model finalizes the tool call (emits `tool-call`) then holds the stream open with trailing text. The test waits via `streamingToolCallState` until the call is exposed, sends `cf_agent_tool_result` while the stream is still open, then asserts the part is `output-available`, carries the output, and that NO `chat:transcript:repaired` fired for it. E2E (wrangler-spawning) was assessed and deferred: the integration test already exercises the full server pipeline in the real Workers runtime and times the mid-stream send deterministically; the e2e harness here is built for process-kill/eviction scenarios where a timing-based test would be flaky and largely redundant. ## Verification - format, oxlint (0/0), typecheck (91/91 projects) - full think suite: 488 passed (pre-additions); client-tools: 61 passed including the 3 new tests
Fixes #1649.
Problem
When the model emits multiple tool calls in a single step and the client resolves them independently (each
addToolOutputsendscf_agent_tool_resultwithautoContinue: true), a fast tool's result could trigger the next inference while a slower sibling was stillinput-available.The server's auto-continuation fired on a short coalesce timer (think
50ms, ai-chat10ms), not on "is the model's tool-call batch actually complete." So the first result'sautoContinuewould continue the turn with an incomplete tool-result set:AI_MissingToolResultsError(the crash the reporter saw).main(Think repairs +convertToModelMessages(..., { ignoreIncompleteToolCalls: true })) → arguably worse: the still-in-flight sibling is silently flipped to errored and a spurious extra continuation runs, then the real result lands and churns the transcript.Repro shape (from the issue):
Fix
Add a server-side barrier: auto-continuation now holds until the model's parallel tool batch is fully answered, then runs a single continuation — regardless of result arrival order. This uses the transcript state the framework already tracks, so it covers tool results and approvals and needs no counting.
Scoped to the leaf (the key design choice)
The barrier fires only when the latest assistant message is mid-batch — it carries both a settled tool result and a still-pending (
input-available/approval-requested) sibling (_hasIncompleteToolBatch()). That is precisely the #1649 signature (the model fanned out parallel calls and only some are answered).It is intentionally not "any pending interaction anywhere in the transcript." A broad barrier over-blocked a legitimate, pre-existing path — a follow-up continuation when a result lands in a separate assistant message before a stream starts (
chat-turn-serializationtest). Leaf-scoping fixes #1649 without disturbing that.Mechanics
packages/think/src/think.ts):_scheduleAutoContinuation's timer now calls_fireAutoContinuationWhenStable, which gates_fireAutoContinuationbehind_awaitToolBatch. The wait runs before the turn is enqueued, so it never occupies the turn queue, and is wrapped inkeepAliveWhileso the (typically sub-second) remainder of the batch can land. It awaits the in-flight tool-result apply (_pendingInteractionPromise) so a sibling that lands mid-wait is observed promptly.packages/ai-chat/src/index.ts):_awaitPendingInteractionBarrierruns inside the queued continuation (after_awaitPendingAutoContinuationPrerequisite), because that path runs in-queue and so can't reuse the queue-idlewaitUntilStable(it would deadlock). Same_hasIncompleteToolBatchpredicate.AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS(60s): on timeout the continuation proceeds — Think's repair backstop errors the unanswered call, ai-chat surfaces the provider error — so a genuinely orphaned tool (client disconnected mid-batch) never pins the continuation open. Aconsole.warnmakes that fall-through visible.Hardening found during review
50mscoalesce timer; when the barrier fired the continuation, that leftover timer could enqueue a second continuation in the window beforepastCoalescewas set._fireAutoContinuationnow cancels any still-armed timer when it fires, and_fireAutoContinuationWhenStablebails if the continuation is alreadypastCoalesce. (ai-chat is structurally immune — a sibling during a pending continuation hits the merge branch, not a re-queue.)resetTurnStatenow clears the barrier flag, and both wait loops bail when_continuation.pendingis gone, so a chat clear / turn reset can't strand the wait holding the isolate.Why server-side (not the client-side option from the issue)
The issue floated "client only sends
autoContinueon the last result." That's fragile (tools resolve independently inonToolCall; the client isn't the transcript's source of truth) and wouldn't protect custom / non-useAgentChatclients. The server already owns the transcript and the completeness invariant, so the barrier belongs there.Tests
deferred continuationtest and the ai-chat continuation-stream-error test — both previously encoded the old partial-continuation behavior (continue after the first of two parallel tools).Follow-up
#1650 tracks making the Think barrier event-driven (return-and-wait-for-the-next-result-event instead of a bounded timer) to avoid the 60s fire-through for human-in-the-loop tools (e.g.
ask_user/display_uiemitted in parallel) and the orphan keepAlive hold. Kept symmetric (bounded-wait) with ai-chat here on purpose, since #1642 (N3) wants to unify the duplicated think↔ai-chat layer.Test plan
npm run build+npm run check+ affected tests)MissingToolResultsError, no spurious repairMade with Cursor