🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74 by ammar-agent · Pull Request #3223 · coder/mux

ammar-agent · 2026-05-02T18:28:30Z

Summary

Fixes Opus 4.7 chats that silently end mid-thinking. Combines a vendored-SDK upgrade (the proximate cause) with frontend guards that surface this class of failure for any future provider/model gap.

Background

Reproduction: select Anthropic Opus 4.7 with thinking level max, ask the model to do meaningful reasoning. The chat appears to "end" with no text, no tool call, no error — the only visible artifact is a collapsed Thinking header. Inspecting the persisted message shows:

{
  "role": "assistant",
  "metadata": {
    "model": "anthropic:claude-opus-4-7",
    "thinkingLevel": "max",
    "finishReason": "length",
    "usage": { "outputTokens": 32000, "reasoningTokens": 3720 }
  },
  "parts": [ /* 283 reasoning parts, no text/tool */ ]
}

outputTokens: 32000 is suspicious because Opus 4.7's documented per-response cap is 128_000, and models-extra.ts already declares it correctly.

Root cause: @ai-sdk/anthropic@3.0.37's getModelCapabilities table predates Opus 4.7. claude-opus-4-7 doesn't match the 4.6 / 4.5 / 4.1 branches and falls through to a claude-opus-4- catch-all that returns maxOutputTokens: 32_000. The SDK then runs:

if (isKnownModel && baseArgs.max_tokens > maxOutputTokensForModel) {
  warnings.push({ feature: "maxOutputTokens", details: "..." });
  baseArgs.max_tokens = maxOutputTokensForModel; // 128k → 32k, silently
}

Anthropic's docs are explicit that max_tokens is required and that for Opus 4.7 at xhigh/max effort you should send a large value (Anthropic recommends starting at 64k). So the parameter is correct to send — the value was being silently clipped on the wire.

@ai-sdk/anthropic@3.0.70 added the missing Opus 4.7 capability mapping (maxOutputTokens: 128e3); 3.0.74 is current latest.

Independent of the SDK fix, the UI gave the user no signal that anything went wrong: the backend treats a finishReason: "length" stream as a successful completion (no metadata.error), and ReasoningMessage auto-collapses on stream-end when the reasoning is the last part — which hides the only output the model produced. So even after the SDK bump, any future provider/model gap (or genuine length truncation on a hard turn) would still look like "the chat just ended".

Implementation

Bump @ai-sdk/anthropic to ^3.0.74. Restores Opus 4.7's correct 128k max_tokens ceiling.
Add max_output_tokens to StreamErrorTypeSchema (src/common/orpc/schemas/errors.ts).
Synthesize a stream-error row in StreamingMessageAggregator.getDisplayedMessages() when an assistant message ends with finishReason: "length" and has no other error. The branch sits next to the existing metadata.error synthesis and is mutually exclusive with it, so we don't stack banners.
Mark reasoning rows with isOnlyMessageContent: true when the parent assistant message contains only reasoning parts. ReasoningMessage's auto-collapse useEffect skips collapse in that case so the user can still read the work the model did before truncation.
StreamErrorMessage.tsx renders the new error type with a "Response truncated" title; the body string is set in SMA so the existing message.error rendering path picks it up.

Validation

New SMA tests (StreamingMessageAggregator.test.ts > max output tokens) cover four branches: reasoning-only truncation, mixed-content truncation, normal stop finish (no banner), and the precedence rule (real error wins over length banner).
Verified locally: make typecheck ✅, make static-check ✅, focused SMA test file ✅. The two Inline skill Markdown rendering failures observed in batch mode reproduce on stock main without this PR (pre-existing test-isolation flake, unrelated).
Verified the bumped SDK now caps claude-opus-4-7 at 128k by inspecting node_modules/@ai-sdk/anthropic/dist/index.mjs after bun update.

Risks

Low-risk, additive changes:

The SDK bump is patch-version; semver-compatible. Behavior change is strictly "Opus 4.7 stops being clipped to 32k" — no other models change.
Frontend changes only add a new displayed-message row in a previously-silent state and gate one auto-collapse path on a new-by-default-undefined flag. No persistence-format changes; chats with the existing finishReason: "length" history (e.g. the repro session) self-heal on next render.
max_output_tokens is a new StreamErrorType enum value. The schema is shared via z.enum; the only producer is the new SMA branch (frontend), so older backends never emit it. Frontend renderers fall through to the generic stream-error layout if they hit an unknown type, which is graceful.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max • Cost: $7.01

Anthropic API requires a max_tokens parameter on every request. Our @ai-sdk/anthropic 3.0.37 had a stale capability table that mapped unknown opus-4-x model IDs to a 32k cap, so requests for opus-4-7 were silently clamped from 128k → 32k on the wire. Combined with `thinkingLevel: max`, the model would chew through 32k of (mostly hidden) reasoning and finish with finishReason: "length" before emitting any text or tool — and the UI auto-collapsed the reasoning, making the chat appear to silently end. Two fixes: 1. Bump @ai-sdk/anthropic to ^3.0.74 (3.0.70 introduced the opus-4-7 capability mapping with the correct 128k cap). 2. Surface finishReason: "length" as a stream-error row in the UI (new max_output_tokens StreamErrorType) and skip ReasoningMessage's auto-collapse when the parent message contains only reasoning, so the user can still see what the model was working on.

ammar-agent · 2026-05-02T18:28:52Z

@codex review

Two-part fix for Opus 4.7 chats that silently end mid-thinking:

Bump @ai-sdk/anthropic from 3.0.37 to 3.0.74 — the older SDK's stale capability table silently clipped Opus 4.7's max_tokens from 128k → 32k on the wire. 3.0.70+ adds the missing 4.7 mapping.
Surface finishReason: "length" as a stream-error row (new max_output_tokens errorType) and skip ReasoningMessage auto-collapse when reasoning is the message's only content, so future provider/model gaps (or genuine length truncations) don't look like "the chat just ended".

Particular things to look at:

The new SMA branch in getDisplayedMessages() is intentionally mutually exclusive with the existing metadata.error synthesis — please confirm that semantics is what you'd expect.
The isOnlyMessageContent flag is computed pre-collapse using the same mergedParts array we iterate to build display rows; semantically it should match what the user actually sees, but please flag if you spot a case where merged != original.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d315f0df14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex flagged that the prior `mergedParts.every(p => p.type === "reasoning")` check would falsely flag a turn as non-reasoning-only when it contained non-renderable parts (e.g. an empty text part) alongside the reasoning, re-introducing the silent-end UX we're trying to fix. Factor out an `isRenderablePart` predicate matching the renderer's filter, fold the lastPartIndex walk and the renderable tally into one pass, and derive the flag from the renderable counts. Adds a regression test for [reasoning + empty-text] turns.

ammar-agent · 2026-05-02T18:37:05Z

@codex review

Addressed the P2 feedback on the reasoning-only flag. Now derived from a shared isRenderablePart predicate matching the renderer's filter (also folded the lastPartIndex walk and the renderable tally into a single pass). Added a regression test for [reasoning, empty-text] turns to lock in the behavior.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b714efdbaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P2 follow-up: switching the renderable predicate to `part.text.length > 0` would throw on malformed history entries (`type: "text"` without a `text` field), regressing the AGENTS.md self-healing contract for chat.jsonl. Restore the truthy check (`Boolean(part.text)`) which gracefully skips malformed parts. Add a regression test that constructs an asst message with a malformed text part and verifies getDisplayedMessages doesn't throw.

ammar-agent · 2026-05-02T18:42:38Z

@codex review

Restored the truthy text check (Boolean(part.text)) so malformed history entries are gracefully skipped instead of throwing — preserves the AGENTS.md self-healing contract for chat.jsonl. Added a regression test that constructs an asst message with a malformed text part ({ type: "text" } with no text field) and verifies getDisplayedMessages() doesn't throw.

chatgpt-codex-connector · 2026-05-02T18:46:11Z

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ator Both stream-error rows in `buildDisplayedMessagesForMessage` (the existing `message.metadata?.error` branch and the new `finishReason === "length"` branch added in #3223) push structurally identical objects, differing only in `id` suffix, `error` string, and `errorType`. The shared parent-message-derived fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`, `timestamp`) were duplicated across both pushes. Extract a local `pushStreamErrorRow` closure that captures the shared fields once. Each branch now reduces to a single call passing the three differing values. Pure refactor — emitted DisplayedMessage objects are identical.

chore: refresh flake offlineCache outputHash for SDK bump

60e0022

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Comment thread src/browser/utils/messages/StreamingMessageAggregator.ts Outdated

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Comment thread src/browser/utils/messages/StreamingMessageAggregator.ts Outdated

ammario merged commit e5d4fee into main May 2, 2026
24 checks passed

ammario deleted the chat-stream-mn3d branch May 2, 2026 18:55

ammar-agent mentioned this pull request May 2, 2026

🤖 fix: stop scroll-up jitter at bottom + harden auto-scroll ownership #3226

Merged

mux-bot Bot mentioned this pull request May 2, 2026

🤖 refactor: auto-cleanup #3213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223

🤖 fix: surface max_tokens truncation; bump @ai-sdk/anthropic to 3.0.74#3223
ammario merged 4 commits intomainfrom
chat-stream-mn3d

ammar-agent commented May 2, 2026

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ammar-agent commented May 2, 2026

Summary

Background

Implementation

Validation

Risks

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants