Skip to content

🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223

Merged
ammario merged 4 commits intomainfrom
chat-stream-mn3d
May 2, 2026
Merged

🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223
ammario merged 4 commits intomainfrom
chat-stream-mn3d

Conversation

@ammar-agent
Copy link
Copy Markdown
Collaborator

Summary

Fixes Opus 4.7 chats that silently end mid-thinking. Combines a vendored-SDK upgrade (the proximate cause) with frontend guards that surface this class of failure for any future provider/model gap.

Background

Reproduction: select Anthropic Opus 4.7 with thinking level max, ask the model to do meaningful reasoning. The chat appears to "end" with no text, no tool call, no error — the only visible artifact is a collapsed Thinking header. Inspecting the persisted message shows:

{
  "role": "assistant",
  "metadata": {
    "model": "anthropic:claude-opus-4-7",
    "thinkingLevel": "max",
    "finishReason": "length",
    "usage": { "outputTokens": 32000, "reasoningTokens": 3720 }
  },
  "parts": [ /* 283 reasoning parts, no text/tool */ ]
}

outputTokens: 32000 is suspicious because Opus 4.7's documented per-response cap is 128_000, and models-extra.ts already declares it correctly.

Root cause: @ai-sdk/anthropic@3.0.37's getModelCapabilities table predates Opus 4.7. claude-opus-4-7 doesn't match the 4.6 / 4.5 / 4.1 branches and falls through to a claude-opus-4- catch-all that returns maxOutputTokens: 32_000. The SDK then runs:

if (isKnownModel && baseArgs.max_tokens > maxOutputTokensForModel) {
  warnings.push({ feature: "maxOutputTokens", details: "..." });
  baseArgs.max_tokens = maxOutputTokensForModel; // 128k → 32k, silently
}

Anthropic's docs are explicit that max_tokens is required and that for Opus 4.7 at xhigh/max effort you should send a large value (Anthropic recommends starting at 64k). So the parameter is correct to send — the value was being silently clipped on the wire.

@ai-sdk/anthropic@3.0.70 added the missing Opus 4.7 capability mapping (maxOutputTokens: 128e3); 3.0.74 is current latest.

Independent of the SDK fix, the UI gave the user no signal that anything went wrong: the backend treats a finishReason: "length" stream as a successful completion (no metadata.error), and ReasoningMessage auto-collapses on stream-end when the reasoning is the last part — which hides the only output the model produced. So even after the SDK bump, any future provider/model gap (or genuine length truncation on a hard turn) would still look like "the chat just ended".

Implementation

  1. Bump @ai-sdk/anthropic to ^3.0.74. Restores Opus 4.7's correct 128k max_tokens ceiling.
  2. Add max_output_tokens to StreamErrorTypeSchema (src/common/orpc/schemas/errors.ts).
  3. Synthesize a stream-error row in StreamingMessageAggregator.getDisplayedMessages() when an assistant message ends with finishReason: "length" and has no other error. The branch sits next to the existing metadata.error synthesis and is mutually exclusive with it, so we don't stack banners.
  4. Mark reasoning rows with isOnlyMessageContent: true when the parent assistant message contains only reasoning parts. ReasoningMessage's auto-collapse useEffect skips collapse in that case so the user can still read the work the model did before truncation.
  5. StreamErrorMessage.tsx renders the new error type with a "Response truncated" title; the body string is set in SMA so the existing message.error rendering path picks it up.

Validation

  • New SMA tests (StreamingMessageAggregator.test.ts > max output tokens) cover four branches: reasoning-only truncation, mixed-content truncation, normal stop finish (no banner), and the precedence rule (real error wins over length banner).
  • Verified locally: make typecheck ✅, make static-check ✅, focused SMA test file ✅. The two Inline skill Markdown rendering failures observed in batch mode reproduce on stock main without this PR (pre-existing test-isolation flake, unrelated).
  • Verified the bumped SDK now caps claude-opus-4-7 at 128k by inspecting node_modules/@ai-sdk/anthropic/dist/index.mjs after bun update.

Risks

Low-risk, additive changes:

  • The SDK bump is patch-version; semver-compatible. Behavior change is strictly "Opus 4.7 stops being clipped to 32k" — no other models change.
  • Frontend changes only add a new displayed-message row in a previously-silent state and gate one auto-collapse path on a new-by-default-undefined flag. No persistence-format changes; chats with the existing finishReason: "length" history (e.g. the repro session) self-heal on next render.
  • max_output_tokens is a new StreamErrorType enum value. The schema is shared via z.enum; the only producer is the new SMA branch (frontend), so older backends never emit it. Frontend renderers fall through to the generic stream-error layout if they hit an unknown type, which is graceful.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max • Cost: $7.01

Anthropic API requires a max_tokens parameter on every request. Our
@ai-sdk/anthropic 3.0.37 had a stale capability table that mapped
unknown opus-4-x model IDs to a 32k cap, so requests for opus-4-7
were silently clamped from 128k → 32k on the wire. Combined with
`thinkingLevel: max`, the model would chew through 32k of (mostly
hidden) reasoning and finish with finishReason: "length" before
emitting any text or tool — and the UI auto-collapsed the reasoning,
making the chat appear to silently end.

Two fixes:

1. Bump @ai-sdk/anthropic to ^3.0.74 (3.0.70 introduced the opus-4-7
   capability mapping with the correct 128k cap).
2. Surface finishReason: "length" as a stream-error row in the UI
   (new max_output_tokens StreamErrorType) and skip ReasoningMessage's
   auto-collapse when the parent message contains only reasoning, so
   the user can still see what the model was working on.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Two-part fix for Opus 4.7 chats that silently end mid-thinking:

  1. Bump @ai-sdk/anthropic from 3.0.37 to 3.0.74 — the older SDK's stale capability table silently clipped Opus 4.7's max_tokens from 128k → 32k on the wire. 3.0.70+ adds the missing 4.7 mapping.
  2. Surface finishReason: "length" as a stream-error row (new max_output_tokens errorType) and skip ReasoningMessage auto-collapse when reasoning is the message's only content, so future provider/model gaps (or genuine length truncations) don't look like "the chat just ended".

Particular things to look at:

  • The new SMA branch in getDisplayedMessages() is intentionally mutually exclusive with the existing metadata.error synthesis — please confirm that semantics is what you'd expect.
  • The isOnlyMessageContent flag is computed pre-collapse using the same mergedParts array we iterate to build display rows; semantically it should match what the user actually sees, but please flag if you spot a case where merged != original.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d315f0df14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/utils/messages/StreamingMessageAggregator.ts Outdated
Codex flagged that the prior `mergedParts.every(p => p.type === "reasoning")`
check would falsely flag a turn as non-reasoning-only when it contained
non-renderable parts (e.g. an empty text part) alongside the reasoning,
re-introducing the silent-end UX we're trying to fix.

Factor out an `isRenderablePart` predicate matching the renderer's filter,
fold the lastPartIndex walk and the renderable tally into one pass, and
derive the flag from the renderable counts. Adds a regression test for
[reasoning + empty-text] turns.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed the P2 feedback on the reasoning-only flag. Now derived from a shared isRenderablePart predicate matching the renderer's filter (also folded the lastPartIndex walk and the renderable tally into a single pass). Added a regression test for [reasoning, empty-text] turns to lock in the behavior.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b714efdbaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/utils/messages/StreamingMessageAggregator.ts Outdated
P2 follow-up: switching the renderable predicate to `part.text.length > 0`
would throw on malformed history entries (`type: "text"` without a `text`
field), regressing the AGENTS.md self-healing contract for chat.jsonl.

Restore the truthy check (`Boolean(part.text)`) which gracefully skips
malformed parts. Add a regression test that constructs an asst message
with a malformed text part and verifies getDisplayedMessages doesn't
throw.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Restored the truthy text check (Boolean(part.text)) so malformed history entries are gracefully skipped instead of throwing — preserves the AGENTS.md self-healing contract for chat.jsonl. Added a regression test that constructs an asst message with a malformed text part ({ type: "text" } with no text field) and verifies getDisplayedMessages() doesn't throw.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammario ammario merged commit e5d4fee into main May 2, 2026
24 checks passed
@ammario ammario deleted the chat-stream-mn3d branch May 2, 2026 18:55
mux-bot Bot pushed a commit that referenced this pull request May 2, 2026
…ator

Both stream-error rows in `buildDisplayedMessagesForMessage` (the existing
`message.metadata?.error` branch and the new `finishReason === "length"`
branch added in #3223) push structurally identical objects, differing only in
`id` suffix, `error` string, and `errorType`. The shared parent-message-derived
fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`,
`timestamp`) were duplicated across both pushes.

Extract a local `pushStreamErrorRow` closure that captures the shared fields
once. Each branch now reduces to a single call passing the three differing
values. Pure refactor — emitted DisplayedMessage objects are identical.
@mux-bot mux-bot Bot mentioned this pull request May 2, 2026
mux-bot Bot pushed a commit that referenced this pull request May 3, 2026
…ator

Both stream-error rows in `buildDisplayedMessagesForMessage` (the existing
`message.metadata?.error` branch and the new `finishReason === "length"`
branch added in #3223) push structurally identical objects, differing only in
`id` suffix, `error` string, and `errorType`. The shared parent-message-derived
fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`,
`timestamp`) were duplicated across both pushes.

Extract a local `pushStreamErrorRow` closure that captures the shared fields
once. Each branch now reduces to a single call passing the three differing
values. Pure refactor — emitted DisplayedMessage objects are identical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants