Fix chat continuation failing on models that reject assistant-prefill#1618
Conversation
…prefill)
Continuing a partial assistant turn (e.g. after a deploy interrupts a stream)
replays a transcript whose final message is that partial assistant message.
Modern chat models reject a request ending in an assistant message — Anthropic
Claude 4.6+ returns a 400 ("This model does not support assistant message
prefill. The conversation must end with a user message."). So the continuation
threw and the turn was left silently interrupted.
Fix: `ensureValidContinueCheckpoint` appends an ephemeral user "continue"
checkpoint whenever a model request would otherwise end in an assistant message
(applied to finalMessages in `_runInferenceLoop`, after beforeTurn, so subclass
overrides are protected too). It shapes only the model request and is never
persisted. Auto-continuation after a tool call is unaffected (that transcript
ends in a tool-role message, not assistant).
Reproduction + test:
- examples/deploy-churn gains `/probe/trailing-{user,assistant}` routes
(Workers AI default; `?provider=anthropic&model=claude-sonnet-4-6`). Live:
Kimi 2.6 tolerates a trailing assistant (no repro); Anthropic Sonnet 4.6
returns the 400 above. Findings logged in examples/deploy-churn/INVESTIGATION.md.
- New deterministic regression test in think-session.test.ts uses a mock model
that rejects a trailing-assistant prompt; fails without the fix, passes with
it. Full think suite: 432 pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
…g broadcast Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: 0ee87fb The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…, workers-ai-provider) Co-authored-by: Cursor <cursoragent@cursor.com>
|
/bonk can you run the formatter on INVESTIGATION.md and commit it and push it? |
Co-authored-by: threepointone <threepointone@users.noreply.github.com>
|
The project uses That said, looking at the file, it's already well-formatted markdown. There are no formatting issues — the content is clean and follows standard markdown conventions. Oxfmt (the project's only formatter) doesn't handle There's nothing to format or commit here. The project's formatter ( |
|
/bonk just run |
|
Ran |
|
Ran the formatter — no changes were produced. All files in the PR are already properly formatted. |
Summary
When a chat turn is interrupted mid-stream (e.g. by a deploy) and recovery
continues it,
continueLastTurnreplays a transcript whose final message isthe partial assistant message — an "assistant prefill". Modern chat models
reject a request that ends in an assistant message, so the continuation throws
and the turn is left interrupted.
This model does not support assistant message prefill. The conversation must end with a user message.(Anthropic's own migration guidance: "move thecontinuation text into a user message".)
model-specific, and bites the modern Anthropic models the customer runs.
Fix
ensureValidContinueCheckpoint(messages)appends an ephemeral user "continue"checkpoint whenever a model request would otherwise end in an assistant message.
It is applied to
finalMessagesin_runInferenceLoop(afterbeforeTurn, sosubclass message overrides are protected too), shapes only the model request,
and is never persisted to the transcript.
Auto-continuation after a tool call is unaffected: that transcript ends in a
tool-role (user-side) message, not an assistant message.
@cloudflare/ai-chatis not affected — there the user'sonChatMessagebuildsthe model request; Think is the one that assembles and calls the model in
_runInferenceLoop.Reproduction
The
examples/deploy-churnharness gains/probe/trailing-{user,assistant}routes (Workers AI default;
?provider=anthropic&model=claude-sonnet-4-6):ok(no repro).AI_APICallError(the 400 above).Findings are logged in
examples/deploy-churn/INVESTIGATION.md.Test
New deterministic regression test in
think-session.test.ts("continuation does not replay a trailing assistant message (assistant
prefill)") uses a mock model that rejects a trailing-assistant prompt. It
fails without the fix (
expected 'assistant' not to be 'assistant') andpasses with it.
Test plan
npm run test -w @cloudflare/think— 432 pass (incl. new test)examples/deploy-churnagainst Kimi 2.6 and Anthropic Sonnet 4.6Follow-up (not in this PR)
Interrupted turns can still look "frozen" on the client after a WS reconnect
storm — but that is a separate hydration gap (terminal/recovery status is
broadcast transiently and not replayed on reconnect), tracked separately in
INVESTIGATION.md. Companion to the recovery fixes in #1615 and #1617.Made with Cursor