@cloudflare/think@0.9.0

github-actions released this 12 Jun 16:46

@cloudflare/think@0.9.0

ef85f0a

Minor Changes

#1656 4c2d1a7 Thanks @cjol! - Rebuild the Think execute tool on the codemode connector runtime, with built-in human-in-the-loop approvals.

Unified execute tool. createExecuteTool now builds on createCodemodeRuntime with connectors instead of a bare executor: state.* (the agent's workspace filesystem via @cloudflare/shell's StateConnector), cdp.* (browser automation via agents/browser's BrowserConnector, included automatically when env.BROWSER is bound), and tools.* (any AI SDK ToolSet adapted via @cloudflare/codemode's ToolSetConnector). Executions are durable — recorded on a CodemodeRuntime facet with abort-and-replay — and completed results are truncated for the model while the full value stays on the execution record.
- Agent one-liner — createExecuteTool(this) infers ctx, env.LOADER, env.BROWSER, and the workspace-backed state backend from the Think agent, and accepts an overrides object for custom tools and options. createExecuteRuntime(this) returns the underlying { runtime, connectors, tool } for host-side wiring. The runtime handle is exposed on the agent as this.codemode.
- Human-in-the-loop. Tools with needsApproval: true pause the execution durably. The paused tool output (with bounded pending-call args) flows to the model, which reports and waits. Think gains built-in callables — pendingExecutions(), approveExecution(executionId), rejectExecution(executionId, reason?) — that resolve the pause on the codemode runtime, replace the paused output in the transcript via pausedExecutionUpdate, and auto-continue the conversation so the model sees the outcome. Approval UIs must render args from pendingExecutions() (authoritative, full) rather than the transcript's pending (a truncated preview bounded for model context). Approvals survive Durable Object restarts and are safe against double-approval, expiry (expirePaused), and stale UIs. If the paused tool part is no longer in the transcript when the approval lands (e.g. compacted away), the outcome is appended as a system note instead of being dropped.
- The Think framework's generated worker entry exports the CodemodeRuntime facet class automatically (also re-exported from @cloudflare/think/server-entry).
- Think's createBrowserTools follows the rebuilt agents/browser connector model (single durable browser_execute tool, session modes, stable attach handles) — see the agents changeset.
- Model-facing guidance. createExecuteTool now renders per-namespace usage hints in the execute tool description (state.* object-argument filesystem calls, the actual tools.* method names, cdp.*), so models stop inventing a host.*/fs.* API. The load_extension description clarifies that its host bridge exists only inside extension source. The workspace bash tool description now states the workspace is mounted at / (no /workspace), and the bash sandbox no longer persists its synthetic /bin, /usr, /dev, /proc paths into the workspace (previously the first bash call wrote ~160 shell-builtin stubs into the user's workspace and flooded changedFiles).

Patch Changes

#1740 6c9de59 Thanks @threepointone! - Defer one-shot scheduled callbacks (and chat-recovery give-ups) on platform transients instead of consuming them mid-deploy (#1730).

A mid-execution Durable Object code-update reset surfaces storage failures in two shapes: the verbatim reset/supersede messages (already deferred) and SqlError: SQL query failed: Network connection lost. — a wrapper that drops the CF retryable flag and dodges the reset matcher. The second shape burned the in-process retry budget inside the same few-seconds reset window (which outlasts the retry schedule by design) and then consumed the one-shot row on exhaustion, freezing the turn for minutes until incident re-detection — in the reported production capture, storage was healthy again 15 ms after the final attempt.
- agents — new cause-aware isPlatformTransientError classifier (exported, alongside isDurableObjectCodeUpdateReset): reset/supersede messages, retryable-flagged platform errors (excluding overloaded), and "Network connection lost.", looked up through wrapper cause chains. _executeScheduleCallback keeps in-process retries for connection-lost transients (a genuine blip heals fast) but on exhaustion of a one-shot row it now re-throws instead of swallowing, so the row survives and the alarm re-runs it in the healthy window that follows. Genuine application errors are still abandoned after maxAttempts exactly as before.
- @cloudflare/think — _handleRecoveryCallbackError now defers (re-throws) on any platform transient instead of terminalizing through a give-up whose own seal needs the storage that is down; the bookkeeping write on the defer path is best-effort. The defer path no longer marks the recovered submission error (which made the deferred re-run skip with submission_not_running — a self-defeating defer); it stays running for the re-run to pick up. The give-up now seals the incident exhausted only after the terminal writes succeed, so a transient mid-seal defers the whole give-up for an idempotent re-run instead of half-sealing.
- @cloudflare/ai-chat — same give-up seal ordering: the incident is sealed only after _exhaustChatRecovery (incl. the durable terminal record) succeeds, so a transient mid-seal preserves the one-shot row and the give-up re-runs in full on a healthy isolate.
#1737 bc43133 Thanks @cjol! - Fix the two remaining #1575 gaps in how in-band stream errors ({type: "error", errorText} chunks inside an otherwise-healthy provider stream) are observed after the fact.

Errored-stream replay (partial content was lost on reconnect). A client reconnecting after an in-band error received the terminal error frame (#1645) but not the content the model streamed before the error — the replay path only served status = 'completed' streams, so an errored stream's buffered chunks were unreachable, and the server pushes no messages on connect. ResumableStream gains replayErroredChunksByRequestId, and the resume-ACK terminal replay (_replayTerminalOnAck in both AIChatAgent and Think) now replays the errored stream's stored chunks before the done: true, error: true frame, so a reconnecting client observes the same sequence a live client did. No wire-format or schema changes: replayed chunks reuse the existing replay: true frame shape and the error text still comes from the durable terminal record.

Agent-tool error attribution (cross-run contamination). When an in-band error frame was broadcast on a child agent and the active run was unknown, the error was stamped onto every tailed run — so an unrelated turn's failure (or one of several overlapping runs) could mark healthy runs as error, and capture depended on a tailer being attached at the right moment. Frames are now attributed by the request id they carry: each agent-tool run is bound to its turn's request id when the turn starts (persisted on the run row at start rather than at terminal, so attribution survives a DO restart mid-run), and only the owning run's error/progress state is updated. Frame inspection also no longer requires an attached tailer, so error capture is independent of tailer timing.
#1712 835e7b0 Thanks @threepointone! - Reclaim resumable-stream buffers from an alarm so idle chats don't leak storage (#1706)

Resumable-stream chunk buffers (cf_ai_chat_stream_*) were only swept lazily when a subsequent stream completed. A chat that received a single turn and then went idle never triggered that sweep, so its buffers lingered in the Durable Object's SQLite for the lifetime of the DO.

AIChatAgent and Think now arm a scheduled cleanup alarm whenever a stream starts and whenever it finishes (completes or errors). Arming on start guarantees that a stream whose DO is evicted mid-flight and never reaches a finish still gets a future sweep instead of leaking. This is the safety net for the non-durable path (e.g. chatRecovery: false, the AIChatAgent default): those turns don't run inside runFiber, so there's no leftover keepAlive alarm and no fiber-recovery scan, and if the client never reconnects nothing else wakes the DO. (Durable runFiber turns already self-heal — the keepAlive alarm survives eviction, wakes the DO, and recovery finalizes the stream, which arms cleanup — so arming on start is belt-and-suspenders there.) The alarm sweeps aged buffers via the retention windows below and re-arms only while reclaimable rows remain, so a fully-swept DO stops waking itself. Arming is idempotent so high-turn-count chats never accumulate cleanup schedules; the in-callback re-arm uses a fresh (non-idempotent) row so it survives the one-shot deletion of the firing schedule. No per-turn Durable Object and no change to the session DO lifecycle are required.

Retention is now split into two short, purpose-specific windows instead of a single 24h threshold: completed/errored buffers are kept for a brief 10-minute reconnect-and-replay grace (the assistant message is persisted separately, so the buffer is only needed to replay a just-finished stream or deliver a terminal error frame to a reconnecting client), while abandoned in-flight (streaming) rows are kept for 1 hour so an interrupted turn has ample time to be resumed or recovered before its buffer is presumed dead. The abandoned-row sweep keys off last chunk activity rather than stream start time, so a long-running stream that is still emitting chunks is never reclaimed mid-flight.

ResumableStream gains cleanup(now?) (force a sweep, bypassing the lazy interval gate) and hasReclaimableStreams() to support alarm-driven cleanup.
#1741 1d8641d Thanks @threepointone! - Prevent cancelled durable submissions from appending their messages when they were already claimed but still waiting behind an active turn.
#1713 18c438b Thanks @threepointone! - Support client tools on the Think sub-agent chat() RPC path (#1709)

ChatOptions now accepts clientTools (the same ClientToolSchema[] carried over the WebSocket chat protocol) and an onClientToolCall executor. This lets a parent agent that drives a Think sub-agent over chat() expose client-defined tools to the sub-agent and complete the tool round trip within the same turn:
```
await child.chat(message, callback, {
  signal,
  clientTools: [
    { name: "get_user_timezone", parameters: { type: "object" } }
  ],
  onClientToolCall: async ({ toolName, input }) =>
    runClientTool(toolName, input)
});
```
Without onClientToolCall, the schemas are still registered and the model's call is surfaced through the stream callback (execute-less), matching the WebSocket behavior. With it, the call is resolved inline so the turn can continue to completion — the RPC stream callback has no inbound result channel of its own.

Unlike the WebSocket path, the schemas and executor are kept per-turn and are NOT persisted: the executor is a live RPC reference that cannot survive an eviction, and there is no SPA to replay a tool-result. This keeps chat recovery correct — an eviction-interrupted client-tool call is repaired like a server tool (the model proceeds) rather than being mistaken for a pending human interaction and parking forever.

agents/chat's createToolsFromClientSchemas gains an optional { execute } delegate (and exports a new ClientToolExecutor type) to build the executable variant. Both additions are backward-compatible.
#1724 c18a446 Thanks @whoiskatrin! - Stop oversized sessions from permanently bricking the Durable Object with SQLITE_NOMEM on wake (#1710).

A throw out of onStart is terminal: partyserver resets its init state and rethrows, so every wake — including platform alarm retries — re-runs the failing onStart forever, and the failure survives redeploys because it is driven by stored data. Long-lived media-heavy sessions hit exactly this once eager full-transcript hydration approached the isolate's memory budget. Four changes:
- onStart degrades instead of throwing. Transcript hydration, declared scheduled-task reconciliation, and durable submission/workflow recovery are now best-effort: failures are recorded (readable via the new public getOnStartDegradations()), logged with remediation hints, and emitted as chat:onstart:degraded observability events, and the agent comes up reachable. The user-defined onStart() is intentionally NOT guarded.
- hydrationByteBudget (default 24MB). Cache refreshes hydrate at most this many stored bytes; an oversized transcript boots as a bounded window of the most recent messages — never fewer than the read-time truncation span the model sees at full fidelity (4 messages), so windowing cannot starve the model's context — and emits chat:hydration:windowed (on change, not on every sync). Durable storage is never truncated by this; session.getHistory() still reads the full path. Set to Infinity to restore unbounded hydration.
- mediaEviction (default on). Background passes rewrite oversized inline media — large data: URL file parts and large strings nested in tool outputs — in messages that have aged out of the recent window, replacing them with size/path markers. By default the original bytes are preserved as workspace files under /attachments/evicted/ (written BEFORE the row is rewritten, so no pass can lose data); set externalizeToWorkspace: false to drop them or false to disable. Passes are memory-bounded: row sizes come from getHistoryRowStats(), only rows large enough to contain an evictable value are parsed, one at a time, and rewrites use the session's silent maintenance path so no per-row full-history token estimate runs. When a pass stops at maxRowsPerPass with a backlog, the next pass is scheduled automatically. Providers without row-stats support log a one-time warning instead of silently no-opping.
- Plain text parts are never evicted, and keepRecentMessages is clamped to at least the read-time truncation window (4) so eviction can never rewrite content the model still sees at full fidelity.
#1715 5f6003f Thanks @threepointone! - Support experimental_transform on TurnConfig. The transform(s) returned from beforeTurn are now forwarded to streamText in the inference loop, so callers can inspect or rewrite the stream — for example, detecting tool results that carry { content, sources } and enqueuing additional source parts via the transform's controller. Accepts a single transform or an array applied in order. Closes #1714.
Updated dependencies [b2b6762, 4c2d1a7, 4c2d1a7, 7bcd1b1, 4c2d1a7]:
- @cloudflare/codemode@0.4.0
- create-think@0.0.4
- @cloudflare/shell@0.4.0

Assets 2