Auto-continuation parallel-tool barrier: make Think event-driven (avoid timeout fire-through for human-in-the-loop tools + orphan keepAlive) — follow-up to #1649

## Context

#1649 fixed auto-continuation firing before all of a step's parallel client-tool results arrive. The fix adds a **barrier** in both `@cloudflare/think` and `@cloudflare/ai-chat`: when the model emits parallel tool calls and the client answers them independently (each `addToolOutput` sends `cf_agent_tool_result` with `autoContinue: true`), the server now holds the continuation until the step's batch is fully answered instead of firing on the first result.

The barrier is detected via `_hasIncompleteToolBatch()` — the **leaf** assistant message has both a settled tool result *and* a still-`input-available`/`approval-requested` sibling — and the wait is **bounded** by `AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS` (currently 60s). On timeout it fires through (Think repairs the unanswered call to errored; ai-chat surfaces the provider error), and logs a `console.warn`.

Key code (post-#1649):
- Think: `_fireAutoContinuationWhenStable`, `_awaitToolBatch`, `_hasIncompleteToolBatch` in `packages/think/src/think.ts`. The barrier runs **before** enqueuing the continuation turn (so it does not occupy the turn queue), wrapped in `keepAliveWhile`.
- ai-chat: `_awaitPendingInteractionBarrier`, `_hasIncompleteToolBatch` in `packages/ai-chat/src/index.ts`. The barrier runs **inside** the queued continuation turn (after `_awaitPendingAutoContinuationPrerequisite`).

## Problem with the bounded-wait approach

The fixed 60s timeout + fire-through is correct for the common case (parallel data-fetch tools resolving at different sub-second/second latencies) but has two rough edges:

1. **Human-in-the-loop tools emitted in parallel.** A client-resolved tool with no `execute` (e.g. an `ask_user`/`display_ui`-style prompt) parks at `input-available` until a human answers — potentially minutes. If the model emits such a tool **in parallel** with another tool that resolves quickly, the barrier waits 60s and then **fires through, repairing the still-open human tool to errored** — even though the user is legitimately still answering. (`ask_user` is usually emitted alone, so this is an edge, but `display_ui`-heavy apps like the g3 customer make it plausible.)

2. **Orphan keepAlive (Think).** If a client disconnects mid-batch (a sibling result never arrives), Think holds the isolate alive via `keepAliveWhile` for the full 60s before firing through. Bounded and rare, but wasteful.

Both stem from the same thing: a *fixed timer* is the wrong primary mechanism for \"wait until the client finishes answering the batch,\" because legitimate client latency is unbounded (human/long RPC) while a true orphan should not pin resources.

## Proposed: event-driven barrier for Think

Auto-continuation is only ever triggered **by** a tool-result/approval event. So instead of a timed wait, Think can be purely event-driven:

- When the coalesce timer fires and the leaf batch is incomplete (`_hasIncompleteToolBatch()`), **do not fire and do not hold** — just return, leaving `_continuation.pending` in place.
- Await only any **in-flight apply** (`_pendingInteractionPromise`) plus a short settle grace (~hundreds of ms) to absorb the concurrent-apply race for results that have *already arrived*; then re-check.
- If still incomplete, return without firing. The **next** sibling's `tool-result` event calls `_scheduleAutoContinuation` (pending exists, not `pastCoalesce` → re-arms the timer), which re-runs the check. When the last sibling lands, the batch is complete and the single continuation fires.

Benefits:
- **No fire-through that errors a legitimately-pending human tool** — the continuation simply waits for the human's answer event, however long it takes.
- **No 60s keepAlive hold** for orphans — Think isn't holding the isolate between results; it hibernates normally and the next result wakes it. (If the in-memory `_continuation.pending` is lost to eviction, the final result re-creates it and fires with a complete batch — self-healing.)
- A true orphan (sibling never arrives) just never auto-continues, which is correct: there is nothing valid to continue, and a later user turn / chat recovery handles the transcript.

This removes `AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS` (and its `console.warn`) from the Think path entirely.

### Why ai-chat is different

ai-chat's barrier runs **inside** the queued continuation turn (after `_awaitPendingAutoContinuationPrerequisite`, before `pastCoalesce`). It can't \"return and wait for re-trigger\" the way Think can, because:
- A sibling result arriving during a pending (pre-`pastCoalesce`) continuation hits the **merge** branch of `_enqueueAutoContinuation` (updates the prerequisite) — it does **not** re-queue a fresh turn.
- Blocking the turn for the full human-response duration would occupy the exclusive chat-turn queue and stall new user messages.

To make ai-chat event-driven we'd need to move the batch gate **before** queueing (gate `_enqueueAutoContinuation` / defer `_queueAutoContinuation` until the batch is complete), so an incomplete batch doesn't occupy a turn slot and a later sibling re-queues. That's a more invasive change to ai-chat's continuation flow.

### Parity / sequencing note

#1649 intentionally kept Think and ai-chat **symmetric** (bounded-wait) because #1642 (N3) wants to unify the duplicated think↔ai-chat recovery/continuation layer. Diverging the barrier now (event-driven Think, bounded ai-chat) increases that drift. Options:
- (a) Do the event-driven redesign for **both** as part of (or just before) #1642's unification, so the unified layer ships the better mechanism once.
- (b) Make Think event-driven now (clear win, lower risk since its barrier is already pre-enqueue) and follow with ai-chat's pre-queue gate separately.

## Alternative (smaller) mitigations

If the full redesign isn't worth it yet:
- Make `AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS` configurable (a knob — but this cuts against the defaults-over-APIs principle).
- Raise the default to better cover human responses (trades a longer orphan hold).

Neither is as clean as event-driven; filing this to track the proper fix.

## Acceptance criteria

- A client-resolved tool with no `execute` emitted **in parallel** with a fast tool: the human takes >60s to answer; the continuation does **not** fire through / error the human tool — it fires once the human answers, producing exactly one continuation with both results settled.
- A client disconnect mid-batch does **not** hold the Think isolate alive on a fixed timer.
- The #1649 common case is unchanged: staggered parallel results coalesce into exactly one continuation, no duplicate fire (regression-guard the double-fire race #1649 hardened: a sibling re-arming the coalesce timer must not enqueue a second continuation).
- Decide and document the Think/ai-chat parity path (with #1642).

## References

- Originating fix: #1649 (parallel client-tool continuation barrier).
- Related: #1642 (unify the duplicated think↔ai-chat layer), #1627 (server-approval continuations), the `ask_user`/`repairInterruptedToolPart` work (#1635).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-continuation parallel-tool barrier: make Think event-driven (avoid timeout fire-through for human-in-the-loop tools + orphan keepAlive) — follow-up to #1649 #1650

Context

Problem with the bounded-wait approach

Proposed: event-driven barrier for Think

Why ai-chat is different

Parity / sequencing note

Alternative (smaller) mitigations

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auto-continuation parallel-tool barrier: make Think event-driven (avoid timeout fire-through for human-in-the-loop tools + orphan keepAlive) — follow-up to #1649 #1650

Description

Context

Problem with the bounded-wait approach

Proposed: event-driven barrier for Think

Why ai-chat is different

Parity / sequencing note

Alternative (smaller) mitigations

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions