🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223
🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223
Conversation
Anthropic API requires a max_tokens parameter on every request. Our @ai-sdk/anthropic 3.0.37 had a stale capability table that mapped unknown opus-4-x model IDs to a 32k cap, so requests for opus-4-7 were silently clamped from 128k → 32k on the wire. Combined with `thinkingLevel: max`, the model would chew through 32k of (mostly hidden) reasoning and finish with finishReason: "length" before emitting any text or tool — and the UI auto-collapsed the reasoning, making the chat appear to silently end. Two fixes: 1. Bump @ai-sdk/anthropic to ^3.0.74 (3.0.70 introduced the opus-4-7 capability mapping with the correct 128k cap). 2. Surface finishReason: "length" as a stream-error row in the UI (new max_output_tokens StreamErrorType) and skip ReasoningMessage's auto-collapse when the parent message contains only reasoning, so the user can still see what the model was working on.
|
@codex review Two-part fix for Opus 4.7 chats that silently end mid-thinking:
Particular things to look at:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d315f0df14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that the prior `mergedParts.every(p => p.type === "reasoning")` check would falsely flag a turn as non-reasoning-only when it contained non-renderable parts (e.g. an empty text part) alongside the reasoning, re-introducing the silent-end UX we're trying to fix. Factor out an `isRenderablePart` predicate matching the renderer's filter, fold the lastPartIndex walk and the renderable tally into one pass, and derive the flag from the renderable counts. Adds a regression test for [reasoning + empty-text] turns.
|
@codex review Addressed the P2 feedback on the reasoning-only flag. Now derived from a shared |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b714efdbaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P2 follow-up: switching the renderable predicate to `part.text.length > 0` would throw on malformed history entries (`type: "text"` without a `text` field), regressing the AGENTS.md self-healing contract for chat.jsonl. Restore the truthy check (`Boolean(part.text)`) which gracefully skips malformed parts. Add a regression test that constructs an asst message with a malformed text part and verifies getDisplayedMessages doesn't throw.
|
@codex review Restored the truthy text check ( |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ator Both stream-error rows in `buildDisplayedMessagesForMessage` (the existing `message.metadata?.error` branch and the new `finishReason === "length"` branch added in #3223) push structurally identical objects, differing only in `id` suffix, `error` string, and `errorType`. The shared parent-message-derived fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`, `timestamp`) were duplicated across both pushes. Extract a local `pushStreamErrorRow` closure that captures the shared fields once. Each branch now reduces to a single call passing the three differing values. Pure refactor — emitted DisplayedMessage objects are identical.
…ator Both stream-error rows in `buildDisplayedMessagesForMessage` (the existing `message.metadata?.error` branch and the new `finishReason === "length"` branch added in #3223) push structurally identical objects, differing only in `id` suffix, `error` string, and `errorType`. The shared parent-message-derived fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`, `timestamp`) were duplicated across both pushes. Extract a local `pushStreamErrorRow` closure that captures the shared fields once. Each branch now reduces to a single call passing the three differing values. Pure refactor — emitted DisplayedMessage objects are identical.
Summary
Fixes Opus 4.7 chats that silently end mid-thinking. Combines a vendored-SDK upgrade (the proximate cause) with frontend guards that surface this class of failure for any future provider/model gap.
Background
Reproduction: select Anthropic Opus 4.7 with thinking level
max, ask the model to do meaningful reasoning. The chat appears to "end" with no text, no tool call, no error — the only visible artifact is a collapsedThinkingheader. Inspecting the persisted message shows:{ "role": "assistant", "metadata": { "model": "anthropic:claude-opus-4-7", "thinkingLevel": "max", "finishReason": "length", "usage": { "outputTokens": 32000, "reasoningTokens": 3720 } }, "parts": [ /* 283 reasoning parts, no text/tool */ ] }outputTokens: 32000is suspicious because Opus 4.7's documented per-response cap is128_000, andmodels-extra.tsalready declares it correctly.Root cause:
@ai-sdk/anthropic@3.0.37'sgetModelCapabilitiestable predates Opus 4.7.claude-opus-4-7doesn't match the 4.6 / 4.5 / 4.1 branches and falls through to aclaude-opus-4-catch-all that returnsmaxOutputTokens: 32_000. The SDK then runs:Anthropic's docs are explicit that
max_tokensis required and that for Opus 4.7 at xhigh/max effort you should send a large value (Anthropic recommends starting at 64k). So the parameter is correct to send — the value was being silently clipped on the wire.@ai-sdk/anthropic@3.0.70added the missing Opus 4.7 capability mapping (maxOutputTokens: 128e3); 3.0.74 is current latest.Independent of the SDK fix, the UI gave the user no signal that anything went wrong: the backend treats a
finishReason: "length"stream as a successful completion (nometadata.error), andReasoningMessageauto-collapses on stream-end when the reasoning is the last part — which hides the only output the model produced. So even after the SDK bump, any future provider/model gap (or genuinelengthtruncation on a hard turn) would still look like "the chat just ended".Implementation
@ai-sdk/anthropicto^3.0.74. Restores Opus 4.7's correct 128kmax_tokensceiling.max_output_tokenstoStreamErrorTypeSchema(src/common/orpc/schemas/errors.ts).stream-errorrow inStreamingMessageAggregator.getDisplayedMessages()when an assistant message ends withfinishReason: "length"and has no other error. The branch sits next to the existingmetadata.errorsynthesis and is mutually exclusive with it, so we don't stack banners.isOnlyMessageContent: truewhen the parent assistant message contains only reasoning parts.ReasoningMessage's auto-collapseuseEffectskips collapse in that case so the user can still read the work the model did before truncation.StreamErrorMessage.tsxrenders the new error type with a "Response truncated" title; the body string is set in SMA so the existingmessage.errorrendering path picks it up.Validation
StreamingMessageAggregator.test.ts > max output tokens) cover four branches: reasoning-only truncation, mixed-content truncation, normalstopfinish (no banner), and the precedence rule (real error wins overlengthbanner).make typecheck✅,make static-check✅, focused SMA test file ✅. The twoInline skill Markdown renderingfailures observed in batch mode reproduce on stockmainwithout this PR (pre-existing test-isolation flake, unrelated).claude-opus-4-7at 128k by inspectingnode_modules/@ai-sdk/anthropic/dist/index.mjsafterbun update.Risks
Low-risk, additive changes:
finishReason: "length"history (e.g. the repro session) self-heal on next render.max_output_tokensis a newStreamErrorTypeenum value. The schema is shared viaz.enum; the only producer is the new SMA branch (frontend), so older backends never emit it. Frontend renderers fall through to the generic stream-error layout if they hit an unknown type, which is graceful.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:max• Cost:$7.01