fix(chat): orphaned-stream recovery no longer merges a new turn into the previous message (#1691) by threepointone · Pull Request #1693 · cloudflare/agents

threepointone · 2026-06-06T16:42:09Z

Summary

Fixes #1691. When an AIChatAgent stream is interrupted before its assistant message is persisted (Durable Object hibernation, deploy churn, isolate restart, reconnect), orphan recovery reconstructs the message from stored chunks. If those chunks carry no provider start.messageId — the common case with streamText(...).toUIMessageStreamResponse(), where the id is assigned client-side — recovery used to fall back to the last assistant message in history.

That is correct for a continuation, but wrong for a normal new turn after a later user message: the recovered chunks were appended onto the previous assistant message, corrupting both the persisted transcript and future model context.

The fix

Core

ResumableStream now persists the allocated assistant message id in stream metadata (message_id column, added via a one-time, schema-checked migration) and exposes getStreamMessageId().
_persistOrphanedStream keys recovery on that stored id when the chunks carry no provider start.messageId, so a new turn becomes its own message and a continuation still merges into the message it was extending (it stored the cloned last-assistant id). A provider start.messageId still wins when present. Pre-migration rows keep the legacy last-assistant fallback.
Dropped the now-unused is_continuation metadata column.

Two related variants of the same corruption (found during review, fixed here)

Duplicate tool parts: early-persist + recovery (e.g. a tool-approval pause) re-appended chunks it had already stored, duplicating a tool call's parts. Recovery now skips reconstructed parts whose toolCallId already exists on the message.
Lost-partial new turns: a new turn interrupted before any assistant part was persisted — cut off before the first chunk materialized, or discarded via onChatRecovery returning { persist: false } — was "continued" by cloning the previous assistant message and merging into it. _handleInternalFiberRecovery now detects that the conversation leaf is still the unanswered user message (no partial to continue) and re-runs the turn fresh, so it becomes its own message.

@cloudflare/think is unaffected — its session-tree recovery already allocates a distinct message id per orphan and never falls back to the last assistant message.

Tests

New regression + wiring tests in durable-chat-recovery, resumable-streaming, and the test worker, including the fiber-continuation happy path and the two edge cases (empty partial, persist: false) that previously merged.
Full recovery suites + pnpm run check (93 projects) green.

Verification (real LLMs)

A SIGKILL-mid-stream / restart harness (wip/issue-1691-live/, included and documented) drives the exact #1691 sequence against real models:

Isolation — across Workers AI, OpenAI, and Anthropic (and Think), the recovered turn always lands as its own message and the previous turn is untouched.
Continuation quality (large partials) — all three providers continue cleanly (0 resets, 0 duplicates); OpenAI and Anthropic resume a truncated partial to completion. Methodology note: the continuation runs in a scheduled alarm ~10–13s after recovery, so the harness waits for the message to stabilize before measuring.

Changeset

@cloudflare/ai-chat + agents — patch.

…the previous message (#1691) When an AIChatAgent stream is interrupted before its assistant message is persisted (Durable Object hibernation, deploy churn, isolate restart, reconnect), orphan recovery reconstructs the message from stored chunks. If the chunks carry no provider `start.messageId` — the common case with `streamText(...).toUIMessageStreamResponse()`, where the id is assigned client-side — recovery used to fall back to the LAST assistant message in history. That is correct for a continuation, but wrong for a normal new turn after a later user message: the recovered chunks were appended onto the PREVIOUS assistant message, corrupting both the persisted transcript and future model context. Core fix - ResumableStream now persists the allocated assistant message id in stream metadata (`message_id` column, added via a one-time, schema-checked migration) and exposes `getStreamMessageId()`. - `_persistOrphanedStream` keys recovery on that stored id when the chunks carry no provider `start.messageId`, so a new turn becomes its own message and a continuation still merges into the message it was extending (it stored the cloned last-assistant id). A provider `start.messageId` still wins when present. Pre-migration rows keep the legacy last-assistant fallback. - Dropped the now-unused `is_continuation` metadata column. Two related variants of the same corruption on the durable (chatRecovery) continuation path, found during review and fixed here: - Early-persist + recovery (e.g. a tool-approval pause) re-appended chunks it had already stored, duplicating a tool call's parts. Recovery now skips reconstructed parts whose `toolCallId` already exists on the message. - A new turn interrupted before any assistant part was persisted — cut off before the first chunk materialized, or discarded via `onChatRecovery` returning `{ persist: false }` — was "continued" by cloning the previous assistant message and merging into it. `_handleInternalFiberRecovery` now detects that the conversation leaf is still the unanswered user message (no partial to continue) and re-runs the turn fresh, so it becomes its own message. @cloudflare/think is unaffected — its session-tree recovery already allocates a distinct message id per orphan and never falls back to the last assistant message. Tests - New regression + wiring tests in durable-chat-recovery, resumable-streaming, and the test worker, including the fiber-continuation happy path and the two edge cases (empty partial, persist:false) that previously merged. Verification - Verified live against real LLMs (Workers AI, OpenAI, Anthropic) and Think via a SIGKILL-mid-stream / restart harness (wip/issue-1691-live): the recovered turn always lands as its own message and the previous turn is untouched. - Cross-model continuation with large partials is clean (no duplication, no restarts); OpenAI and Anthropic resume a truncated partial to completion. The harness and its methodology notes are documented in its README.

changeset-bot · 2026-06-06T16:42:13Z

🦋 Changeset detected

Latest commit: 63f8da4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
@cloudflare/ai-chat	Patch
agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

pkg-pr-new · 2026-06-06T16:49:25Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1693

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1693

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1693

hono-agents

npm i https://pkg.pr.new/hono-agents@1693

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1693

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1693

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1693

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1693

commit: 63f8da4

- Report `recoveryKind: "retry"` to `onChatRecovery` and the incident record for an empty-partial new turn (interrupted before any chunk), since that case is deterministically a retry — it's knowable before the hook runs. The `persist: false` sibling case still reports "continue" (it only becomes a retry based on the hook's own return value) and the comment documents why. - Await `_persistOrphanedStream` in the `triggerInterruptedStreamCheck` test helper so it matches the production fiber-recovery path (latent test-only race, harmless in practice but now correct). - Rename the two `wip/` package.json names to the `@cloudflare/agents-*` prefix so changesets' ignore glob excludes them from versioning/release.

devin-ai-integration Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread packages/ai-chat/src/tests/worker.ts

Comment thread packages/ai-chat/src/index.ts

threepointone mentioned this pull request Jun 6, 2026

AIChatAgent orphaned stream recovery can merge a new assistant response into the previous assistant message #1691

Closed

threepointone merged commit 6496c80 into main Jun 6, 2026
4 checks passed

threepointone deleted the fix-orphan-stream-message-merge-1691 branch June 6, 2026 23:59

github-actions Bot mentioned this pull request Jun 7, 2026

Version Packages #1689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat): orphaned-stream recovery no longer merges a new turn into the previous message (#1691)#1693

fix(chat): orphaned-stream recovery no longer merges a new turn into the previous message (#1691)#1693
threepointone merged 2 commits into
mainfrom
fix-orphan-stream-message-merge-1691

threepointone commented Jun 6, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

changeset-bot Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

threepointone commented Jun 6, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The fix

Tests

Verification (real LLMs)

Changeset

Uh oh!

changeset-bot Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

threepointone commented Jun 6, 2026 •

edited by devin-ai-integration Bot

Loading

changeset-bot Bot commented Jun 6, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 6, 2026 •

edited

Loading