Version Packages#1660
Merged
Merged
Conversation
4fc6ce2 to
55a92fb
Compare
55a92fb to
adfc0d9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.
Releases
agents@0.14.1
Patch Changes
#1659
f99f890Thanks @threepointone! - Recover one-shot scheduled work (alarms) killed by a"This script has been upgraded…"deploy/code-update, not just"Durable Object reset because its code was updated."._executeScheduleCallbackonly re-runs a one-shot schedule row after a superseded-isolate error if the error matched/reset because its code was updated/i. The platform also surfaces the same failure class as"This script has been upgraded. Please send a new request to connect to the new version."(a stub/connection to a superseded script), which fell through to the swallow-and-delete branch — the one-shot row was deleted and the work abandoned. For a queued submission this orphaned the pending row with no driver (no alarm, no retry) until something unrelated woke the Durable Object, leaving the user on an indefinite spinner.The superseded-isolate matcher now recognizes both messages, so either causes the row to be preserved and re-run on the fresh isolate under the at-least-once alarm guarantee.
"Network connection lost."is intentionally not included (it is a connection error that may succeed on in-process retry, not an isolate replacement).#1661
41315b6Thanks @threepointone! - Enforce thetool_use.inputinvariant at the chat write boundary.A streamed tool call that finishes with no
input_json_deltaevents (the model called the tool with no args), or whose input surfaces as a stringified JSON blob, could persist a non-objectinput—null,undefined,"", an array, or a raw string. The Anthropic Messages API requirestool_use.inputto be a JSON object and rejects every subsequent turn withtool_use.input: Input should be an object(verified against the live API:{}→ 200, but"",[], and[{...}]all → 400). Because the bad shape lives in durable storage, the session is wedged across reconnects, redeploys, and DO evictions.applyChunkToParts(the shared accumulator used by@cloudflare/ai-chatand@cloudflare/think) now normalizes the finalized toolinputontool-input-available/tool-input-error: a plain object passes through untouched, a stringified-JSON object is parsed, and everything else (null/undefined/""/arrays/primitives/unparseable strings) collapses to{}. A newnormalizeToolInputhelper is exported fromagents/chatso read-side transcript repair can enforce the same invariant.#1665
13d6db0Thanks @threepointone! - Await Chat SDK state-agent cleanup scheduling during startup so tests and short-lived worker isolates do not leave dangling cleanup work.#1666
01a0b35Thanks @dcartertwo! - Fix MCP OAuth PKCE verifier lookup for overlapping authorization attempts.DurableObjectOAuthClientProvidernow binds pending PKCE verifiers to the OAuth callback state instead of storing a single verifier per client/server. Callback handling runs token exchange and verifier cleanup in the returned state's context, so older auth windows and retry churn no longer exchange an authorization code with another attempt's verifier.@cloudflare/ai-chat@0.8.1
Patch Changes
#1661
41315b6Thanks @threepointone! - Heal a malformedtool_use.inputwhen loading persisted messages.AIChatAgentdelegatesconvertToModelMessagesto youronChatMessage, so it has no framework-side pre-send pass to repair a transcript. A session that persisted a non-object toolinput—null,undefined,"", an array, or a raw string — before the write-side guard shipped would therefore keep 400ing withtool_use.input: Input should be an objecton every turn, wedged across reconnects/redeploys/evictions.autoTransformMessage(run on every load) now normalizes malformed tool inputs to{}(parsing stringified-JSON objects, and leaving healthy object inputs untouched), so existing wedged sessions self-heal on their next load without per-DO storage surgery. Healthy messages are returned by reference, so the persistence cache stays a no-op for them.#1654
f34cd30Thanks @cjol! - FixisStreamingstaying true after aborting during server-side tool calls.#1657
7bff8d7Thanks @threepointone! - fix(think): serialize parallel client-tool result/approval applies so siblings aren't clobbered (#1649 follow-up)The auto-continuation barrier added in #1651 stopped premature continuation, but a deeper race remained in Think. Each
tool-result/tool-approvalWebSocket message fired an independent read-modify-write of the whole assistant message, and_applyToolUpdateToMessagesawaits a storage read before its write. When the model fanned out parallel tool calls, the concurrent applies all read the sameinput-availablesnapshot, each patched only its own part, and the last write clobbered its siblings back toinput-available. The continuation barrier then timed out and the transcript-repair backstop errored the lost calls with "The tool call was interrupted before a result was recorded."Applies are now chained off a serialization tail so each read-modify-write commits atomically in arrival order.
_pendingInteractionPromisestill tracks the newest link, so the barrier's single-slot wake-up transitively waits for every predecessor.The same serialization is applied to
@cloudflare/ai-chatdefensively: its apply is currently synchronous (no await between the message read and the SQLite write), so it does not exhibit this clobber today, but the queue keeps the invariant safe if that ever changes.@cloudflare/think@0.8.1
Patch Changes
#1657
7bff8d7Thanks @threepointone! - fix(think): apply client-tool results that arrive mid-stream so they aren't dropped (#1649 follow-up)The serialization fix in #1657 stopped parallel results from clobbering each other, but a deeper window remained: during a streaming turn the assistant message lives only in the in-flight
StreamAccumulatoruntil_persistAssistantMessagewrites it at the turn boundary. Thetool-input-availablechunk is broadcast to the client mid-stream, so a fast client can resolve the tool and sendcf_agent_tool_resultbefore the message is ever persisted._applyToolUpdateToMessagesonly scanned durable storage, so the apply silently no-op'd, the end-of-stream persist then wroteinput-available, and the auto-continuation's transcript repair errored the call with "The tool call was interrupted before a result was recorded."_applyToolUpdateToMessagesnow applies the update to the in-flight accumulator (in place, so it rides into the eventual persist) in addition to durable storage, mirroring@cloudflare/ai-chat's_streamingMessagehandling. The accumulator is exposed via_streamingAssistantfor the duration of each streaming turn and cleared on every exit path and onresetTurnState. Applying to both locations is monotonic, so a stall-recovery partial persist can't downgrade an already-applied result back toinput-available.#1665
13d6db0Thanks @threepointone! - Avoid starting empty submission and workflow notification drains during agent startup, preventing short-lived facet initializations from leaving background keep-alive work behind.#1661
41315b6Thanks @threepointone! - Unwedge sessions corrupted by a malformedtool_use.input, and make the failure observable.Read-side repair gap. Transcript repair already normalized a
null/undefined/stringified-JSON tool input, but left an empty string"", an array, and other non-object primitives untouched — so a session that persisted one of those shapes before the write-side guard shipped kept 400ing forever withtool_use.input: Input should be an object(Anthropic rejects array inputs the same way it rejects""/null)._normalizeToolInputnow delegates to the sharednormalizeToolInput, collapsing any non-object input to{}so the pre-send repair pass rescues the session on its next turn.Observability. An AI-SDK provider error surfaces as a stream error part, not a thrown exception, so it took the in-band
errorbranch that emittedmessage:errorbut neverchat:request:failed. That branch now also emitschat:request:failed(stage: "stream"), so observers and turn-count telemetry see the post-beforeTurn, in-stream failure class without needing to know whether the error threw or arrived as a chunk.#1657
7bff8d7Thanks @threepointone! - fix(think): serialize parallel client-tool result/approval applies so siblings aren't clobbered (#1649 follow-up)The auto-continuation barrier added in #1651 stopped premature continuation, but a deeper race remained in Think. Each
tool-result/tool-approvalWebSocket message fired an independent read-modify-write of the whole assistant message, and_applyToolUpdateToMessagesawaits a storage read before its write. When the model fanned out parallel tool calls, the concurrent applies all read the sameinput-availablesnapshot, each patched only its own part, and the last write clobbered its siblings back toinput-available. The continuation barrier then timed out and the transcript-repair backstop errored the lost calls with "The tool call was interrupted before a result was recorded."Applies are now chained off a serialization tail so each read-modify-write commits atomically in arrival order.
_pendingInteractionPromisestill tracks the newest link, so the barrier's single-slot wake-up transitively waits for every predecessor.The same serialization is applied to
@cloudflare/ai-chatdefensively: its apply is currently synchronous (no await between the message read and the SQLite write), so it does not exhibit this clobber today, but the queue keeps the invariant safe if that ever changes.#1659
f99f890Thanks @threepointone! - Fix two chat-recovery failures that could leave a turn wedged at a half-finished assistant message after a deploy/eviction, with no terminal banner.Server-tool recovery deadlock. When a server-side tool's
execute()was interrupted by an eviction, the recovered turn's orphaned tool part was left atinput-available— but no clienttool-resultwill ever arrive for a server tool, sowaitUntilStablecould never converge. The recovery continuation burned its whole attempt budget on a wait that could not succeed.waitUntilStablenow treats aninput-availablepart as pending only when it is genuinely client-resolvable (a registered client tool whose result the SPA can replay, or anapproval-requestedpart). A dead server-tool orphan no longer blocks stability, so recovery converges and the existing transcript-repair pass flips the orphan to an errored result and the model continues the turn.Silent seal on a thrown recovery callback. A non-reset error thrown by
_chatRecoveryContinue/_chatRecoveryRetrywas re-thrown and then swallowed by the scheduler, which deleted the one-shot recovery alarm row — terminating the turn with noonExhaustedevent and no terminal banner. The recovery callbacks now terminalize a non-reset throw through the same exhaustion path (firingonExhaustedwith reasonrecovery_errorand delivering theterminalMessage), while still re-throwing a genuine Durable Object code-update reset so the platform re-runs recovery on the fresh isolate. The terminal banner is also now broadcast before the bookkeeping storage writes in the exhaustion path, and those writes are best-effort, so a storage failure during give-up can no longer suppress the user-visible terminalization.