Context
#1649 fixed auto-continuation firing before all of a step's parallel client-tool results arrive. The fix adds a barrier in both @cloudflare/think and @cloudflare/ai-chat: when the model emits parallel tool calls and the client answers them independently (each addToolOutput sends cf_agent_tool_result with autoContinue: true), the server now holds the continuation until the step's batch is fully answered instead of firing on the first result.
The barrier is detected via _hasIncompleteToolBatch() — the leaf assistant message has both a settled tool result and a still-input-available/approval-requested sibling — and the wait is bounded by AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (currently 60s). On timeout it fires through (Think repairs the unanswered call to errored; ai-chat surfaces the provider error), and logs a console.warn.
Key code (post-#1649):
- Think:
_fireAutoContinuationWhenStable, _awaitToolBatch, _hasIncompleteToolBatch in packages/think/src/think.ts. The barrier runs before enqueuing the continuation turn (so it does not occupy the turn queue), wrapped in keepAliveWhile.
- ai-chat:
_awaitPendingInteractionBarrier, _hasIncompleteToolBatch in packages/ai-chat/src/index.ts. The barrier runs inside the queued continuation turn (after _awaitPendingAutoContinuationPrerequisite).
Problem with the bounded-wait approach
The fixed 60s timeout + fire-through is correct for the common case (parallel data-fetch tools resolving at different sub-second/second latencies) but has two rough edges:
-
Human-in-the-loop tools emitted in parallel. A client-resolved tool with no execute (e.g. an ask_user/display_ui-style prompt) parks at input-available until a human answers — potentially minutes. If the model emits such a tool in parallel with another tool that resolves quickly, the barrier waits 60s and then fires through, repairing the still-open human tool to errored — even though the user is legitimately still answering. (ask_user is usually emitted alone, so this is an edge, but display_ui-heavy apps like the g3 customer make it plausible.)
-
Orphan keepAlive (Think). If a client disconnects mid-batch (a sibling result never arrives), Think holds the isolate alive via keepAliveWhile for the full 60s before firing through. Bounded and rare, but wasteful.
Both stem from the same thing: a fixed timer is the wrong primary mechanism for "wait until the client finishes answering the batch," because legitimate client latency is unbounded (human/long RPC) while a true orphan should not pin resources.
Proposed: event-driven barrier for Think
Auto-continuation is only ever triggered by a tool-result/approval event. So instead of a timed wait, Think can be purely event-driven:
- When the coalesce timer fires and the leaf batch is incomplete (
_hasIncompleteToolBatch()), do not fire and do not hold — just return, leaving _continuation.pending in place.
- Await only any in-flight apply (
_pendingInteractionPromise) plus a short settle grace (~hundreds of ms) to absorb the concurrent-apply race for results that have already arrived; then re-check.
- If still incomplete, return without firing. The next sibling's
tool-result event calls _scheduleAutoContinuation (pending exists, not pastCoalesce → re-arms the timer), which re-runs the check. When the last sibling lands, the batch is complete and the single continuation fires.
Benefits:
- No fire-through that errors a legitimately-pending human tool — the continuation simply waits for the human's answer event, however long it takes.
- No 60s keepAlive hold for orphans — Think isn't holding the isolate between results; it hibernates normally and the next result wakes it. (If the in-memory
_continuation.pending is lost to eviction, the final result re-creates it and fires with a complete batch — self-healing.)
- A true orphan (sibling never arrives) just never auto-continues, which is correct: there is nothing valid to continue, and a later user turn / chat recovery handles the transcript.
This removes AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (and its console.warn) from the Think path entirely.
Why ai-chat is different
ai-chat's barrier runs inside the queued continuation turn (after _awaitPendingAutoContinuationPrerequisite, before pastCoalesce). It can't "return and wait for re-trigger" the way Think can, because:
- A sibling result arriving during a pending (pre-
pastCoalesce) continuation hits the merge branch of _enqueueAutoContinuation (updates the prerequisite) — it does not re-queue a fresh turn.
- Blocking the turn for the full human-response duration would occupy the exclusive chat-turn queue and stall new user messages.
To make ai-chat event-driven we'd need to move the batch gate before queueing (gate _enqueueAutoContinuation / defer _queueAutoContinuation until the batch is complete), so an incomplete batch doesn't occupy a turn slot and a later sibling re-queues. That's a more invasive change to ai-chat's continuation flow.
Parity / sequencing note
#1649 intentionally kept Think and ai-chat symmetric (bounded-wait) because #1642 (N3) wants to unify the duplicated think↔ai-chat recovery/continuation layer. Diverging the barrier now (event-driven Think, bounded ai-chat) increases that drift. Options:
Alternative (smaller) mitigations
If the full redesign isn't worth it yet:
- Make
AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS configurable (a knob — but this cuts against the defaults-over-APIs principle).
- Raise the default to better cover human responses (trades a longer orphan hold).
Neither is as clean as event-driven; filing this to track the proper fix.
Acceptance criteria
References
Context
#1649 fixed auto-continuation firing before all of a step's parallel client-tool results arrive. The fix adds a barrier in both
@cloudflare/thinkand@cloudflare/ai-chat: when the model emits parallel tool calls and the client answers them independently (eachaddToolOutputsendscf_agent_tool_resultwithautoContinue: true), the server now holds the continuation until the step's batch is fully answered instead of firing on the first result.The barrier is detected via
_hasIncompleteToolBatch()— the leaf assistant message has both a settled tool result and a still-input-available/approval-requestedsibling — and the wait is bounded byAUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS(currently 60s). On timeout it fires through (Think repairs the unanswered call to errored; ai-chat surfaces the provider error), and logs aconsole.warn.Key code (post-#1649):
_fireAutoContinuationWhenStable,_awaitToolBatch,_hasIncompleteToolBatchinpackages/think/src/think.ts. The barrier runs before enqueuing the continuation turn (so it does not occupy the turn queue), wrapped inkeepAliveWhile._awaitPendingInteractionBarrier,_hasIncompleteToolBatchinpackages/ai-chat/src/index.ts. The barrier runs inside the queued continuation turn (after_awaitPendingAutoContinuationPrerequisite).Problem with the bounded-wait approach
The fixed 60s timeout + fire-through is correct for the common case (parallel data-fetch tools resolving at different sub-second/second latencies) but has two rough edges:
Human-in-the-loop tools emitted in parallel. A client-resolved tool with no
execute(e.g. anask_user/display_ui-style prompt) parks atinput-availableuntil a human answers — potentially minutes. If the model emits such a tool in parallel with another tool that resolves quickly, the barrier waits 60s and then fires through, repairing the still-open human tool to errored — even though the user is legitimately still answering. (ask_useris usually emitted alone, so this is an edge, butdisplay_ui-heavy apps like the g3 customer make it plausible.)Orphan keepAlive (Think). If a client disconnects mid-batch (a sibling result never arrives), Think holds the isolate alive via
keepAliveWhilefor the full 60s before firing through. Bounded and rare, but wasteful.Both stem from the same thing: a fixed timer is the wrong primary mechanism for "wait until the client finishes answering the batch," because legitimate client latency is unbounded (human/long RPC) while a true orphan should not pin resources.
Proposed: event-driven barrier for Think
Auto-continuation is only ever triggered by a tool-result/approval event. So instead of a timed wait, Think can be purely event-driven:
_hasIncompleteToolBatch()), do not fire and do not hold — just return, leaving_continuation.pendingin place._pendingInteractionPromise) plus a short settle grace (~hundreds of ms) to absorb the concurrent-apply race for results that have already arrived; then re-check.tool-resultevent calls_scheduleAutoContinuation(pending exists, notpastCoalesce→ re-arms the timer), which re-runs the check. When the last sibling lands, the batch is complete and the single continuation fires.Benefits:
_continuation.pendingis lost to eviction, the final result re-creates it and fires with a complete batch — self-healing.)This removes
AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS(and itsconsole.warn) from the Think path entirely.Why ai-chat is different
ai-chat's barrier runs inside the queued continuation turn (after
_awaitPendingAutoContinuationPrerequisite, beforepastCoalesce). It can't "return and wait for re-trigger" the way Think can, because:pastCoalesce) continuation hits the merge branch of_enqueueAutoContinuation(updates the prerequisite) — it does not re-queue a fresh turn.To make ai-chat event-driven we'd need to move the batch gate before queueing (gate
_enqueueAutoContinuation/ defer_queueAutoContinuationuntil the batch is complete), so an incomplete batch doesn't occupy a turn slot and a later sibling re-queues. That's a more invasive change to ai-chat's continuation flow.Parity / sequencing note
#1649 intentionally kept Think and ai-chat symmetric (bounded-wait) because #1642 (N3) wants to unify the duplicated think↔ai-chat recovery/continuation layer. Diverging the barrier now (event-driven Think, bounded ai-chat) increases that drift. Options:
Alternative (smaller) mitigations
If the full redesign isn't worth it yet:
AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MSconfigurable (a knob — but this cuts against the defaults-over-APIs principle).Neither is as clean as event-driven; filing this to track the proper fix.
Acceptance criteria
executeemitted in parallel with a fast tool: the human takes >60s to answer; the continuation does not fire through / error the human tool — it fires once the human answers, producing exactly one continuation with both results settled.References
ask_user/repairInterruptedToolPartwork (feat(think): add repairInterruptedToolPart hook for client-resolved tools (#1631) #1635).