You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenCode silently caps per-step maxOutputTokens at 32,000 even when opencode.json sets a much larger limit.output (e.g. 384000 for DeepSeek, 128000 for GPT/Claude). The only documented escape hatch is an experimental environment variable, OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX, which users must discover via scattered GitHub threads.
This is surprising, breaks agent runs on reasoning-heavy models, and makes user configuration misleading.
OpenCode should send max_output_tokens / max_tokens based on limit.output (possibly clamped to what the provider actually supports), not an undocumented global ceiling of 32k.
Actual behavior
ProviderTransform.maxOutputTokens() always applies a default cap:
Unless OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX is set in the environment, min(384000, 32000) === 32000 is what gets sent to the API.
Real-world impact (agent benchmark)
On a coding task with reasoningEffort: "max" via @ai-sdk/openai-compatible:
A step finishes with reason: "length"
Token usage: reasoning: 32000, output: 0
The agent stops mid-task with no visible completion
The same model with reasoningEffort: "high" completes normally in the same step budget because reasoning does not consume the entire cap.
This is not a provider failure; logs show OpenCode requesting ~32k max output while config says 384k.
Why OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX feels like the wrong fix
Misleading config — Users set limit.output in opencode.json believing it controls API behavior. It does not, unless they also export an env var whose name does not reference limit.output at all.
Global knob for per-model limits — Setting OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576 is a shotgun fix across all models instead of respecting per-model limit.output.
Still does not fix overflow semantics — overflow.ts uses context - maxOutputTokens() when limit.input is absent. Models with output close to context (e.g. Kimi K2.6 at 262144/262144) get usable === 0 once the env var is raised, causing immediate compaction/overflow unless users also add limit.input — another undocumented workaround.
Respect configured limit.output by default — Use OUTPUT_TOKEN_MAX (32k) only as a fallback when limit.output is missing or zero, not as Math.min(limit.output, 32k) for every model. (PR fix(provider): respect configured output limit #24384 attempted this but was closed as “not the right fix”; the problem remains.)
Optional global ceiling — If a safety cap is still desired, expose it in opencode.json (e.g. limits.maxOutputTokens) rather than a cryptic env var.
Decouple overflow math from per-step output cap — usable should not use context - limit.output when output can equal the full window; consider limit.input defaults or context - reserved for shared-window models.
Document the contract — Clarify in config schema/docs what limit.context, limit.input, and limit.output each control (API request vs compaction vs fallback).
Environment
OpenCode: v1.14.x (local install from anomalyco/opencode)
Summary
OpenCode silently caps per-step
maxOutputTokensat 32,000 even whenopencode.jsonsets a much largerlimit.output(e.g. 384000 for DeepSeek, 128000 for GPT/Claude). The only documented escape hatch is an experimental environment variable,OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX, which users must discover via scattered GitHub threads.This is surprising, breaks agent runs on reasoning-heavy models, and makes user configuration misleading.
Expected behavior
When a model defines explicit limits in config:
OpenCode should send
max_output_tokens/max_tokensbased onlimit.output(possibly clamped to what the provider actually supports), not an undocumented global ceiling of 32k.Actual behavior
ProviderTransform.maxOutputTokens()always applies a default cap:Unless
OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAXis set in the environment,min(384000, 32000) === 32000is what gets sent to the API.Real-world impact (agent benchmark)
On a coding task with
reasoningEffort: "max"via@ai-sdk/openai-compatible:reason: "length"reasoning: 32000,output: 0The same model with
reasoningEffort: "high"completes normally in the same step budget because reasoning does not consume the entire cap.This is not a provider failure; logs show OpenCode requesting ~32k max output while config says 384k.
Why
OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAXfeels like the wrong fixMisleading config — Users set
limit.outputinopencode.jsonbelieving it controls API behavior. It does not, unless they also export an env var whose name does not referencelimit.outputat all.Experimental / undiscoverable — The variable is buried under “experimental” flags in CLI docs. Issue max_tokens defaults to 32000 when using a custom provider #1735 was auto-closed; Custom provider (LM Studio) ignores limit.output config, hardcodes max_tokens to 32000 #20078 remains open with the same root cause.
Global knob for per-model limits — Setting
OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576is a shotgun fix across all models instead of respecting per-modellimit.output.Still does not fix overflow semantics —
overflow.tsusescontext - maxOutputTokens()whenlimit.inputis absent. Models withoutputclose tocontext(e.g. Kimi K2.6 at 262144/262144) getusable === 0once the env var is raised, causing immediate compaction/overflow unless users also addlimit.input— another undocumented workaround.Related issues
limit.outputignored, hardcodes 32000 (open)Suggested direction
Respect configured
limit.outputby default — UseOUTPUT_TOKEN_MAX(32k) only as a fallback whenlimit.outputis missing or zero, not asMath.min(limit.output, 32k)for every model. (PR fix(provider): respect configured output limit #24384 attempted this but was closed as “not the right fix”; the problem remains.)Optional global ceiling — If a safety cap is still desired, expose it in
opencode.json(e.g.limits.maxOutputTokens) rather than a cryptic env var.Decouple overflow math from per-step output cap —
usableshould not usecontext - limit.outputwhenoutputcan equal the full window; considerlimit.inputdefaults orcontext - reservedfor shared-window models.Document the contract — Clarify in config schema/docs what
limit.context,limit.input, andlimit.outputeach control (API request vs compaction vs fallback).Environment
anomalyco/opencode)@ai-sdk/openai-compatible→ OpenAI-compatible gatewaylimitblocks inopencode.jsonWorkaround today (ugly)
export OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576Plus per-model
limit.inputfor shared-window models — none of which should be required whenlimit.outputis already configured.