Skip to content

Bug: limit.output in config is silently capped at 32k; OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX is a poor workaround #29363

@g199209

Description

@g199209

Summary

OpenCode silently caps per-step maxOutputTokens at 32,000 even when opencode.json sets a much larger limit.output (e.g. 384000 for DeepSeek, 128000 for GPT/Claude). The only documented escape hatch is an experimental environment variable, OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX, which users must discover via scattered GitHub threads.

This is surprising, breaks agent runs on reasoning-heavy models, and makes user configuration misleading.

Expected behavior

When a model defines explicit limits in config:

"deepseek-v4-flash": {
  "limit": {
    "context": 1000000,
    "output": 384000
  }
}

OpenCode should send max_output_tokens / max_tokens based on limit.output (possibly clamped to what the provider actually supports), not an undocumented global ceiling of 32k.

Actual behavior

ProviderTransform.maxOutputTokens() always applies a default cap:

export const OUTPUT_TOKEN_MAX = 32_000

export function maxOutputTokens(model: Provider.Model, outputTokenMax = OUTPUT_TOKEN_MAX): number {
  return Math.min(model.limit.output, outputTokenMax) || outputTokenMax
}

Unless OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX is set in the environment, min(384000, 32000) === 32000 is what gets sent to the API.

Real-world impact (agent benchmark)

On a coding task with reasoningEffort: "max" via @ai-sdk/openai-compatible:

  • A step finishes with reason: "length"
  • Token usage: reasoning: 32000, output: 0
  • The agent stops mid-task with no visible completion

The same model with reasoningEffort: "high" completes normally in the same step budget because reasoning does not consume the entire cap.

This is not a provider failure; logs show OpenCode requesting ~32k max output while config says 384k.

Why OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX feels like the wrong fix

  1. Misleading config — Users set limit.output in opencode.json believing it controls API behavior. It does not, unless they also export an env var whose name does not reference limit.output at all.

  2. Experimental / undiscoverable — The variable is buried under “experimental” flags in CLI docs. Issue max_tokens defaults to 32000 when using a custom provider #1735 was auto-closed; Custom provider (LM Studio) ignores limit.output config, hardcodes max_tokens to 32000 #20078 remains open with the same root cause.

  3. Global knob for per-model limits — Setting OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576 is a shotgun fix across all models instead of respecting per-model limit.output.

  4. Still does not fix overflow semanticsoverflow.ts uses context - maxOutputTokens() when limit.input is absent. Models with output close to context (e.g. Kimi K2.6 at 262144/262144) get usable === 0 once the env var is raised, causing immediate compaction/overflow unless users also add limit.input — another undocumented workaround.

Related issues

Suggested direction

  1. Respect configured limit.output by default — Use OUTPUT_TOKEN_MAX (32k) only as a fallback when limit.output is missing or zero, not as Math.min(limit.output, 32k) for every model. (PR fix(provider): respect configured output limit #24384 attempted this but was closed as “not the right fix”; the problem remains.)

  2. Optional global ceiling — If a safety cap is still desired, expose it in opencode.json (e.g. limits.maxOutputTokens) rather than a cryptic env var.

  3. Decouple overflow math from per-step output capusable should not use context - limit.output when output can equal the full window; consider limit.input defaults or context - reserved for shared-window models.

  4. Document the contract — Clarify in config schema/docs what limit.context, limit.input, and limit.output each control (API request vs compaction vs fallback).

Environment

  • OpenCode: v1.14.x (local install from anomalyco/opencode)
  • Custom provider: @ai-sdk/openai-compatible → OpenAI-compatible gateway
  • Models: DeepSeek V4 Flash/Pro, Kimi K2.6, GPT 5.5, etc. with explicit limit blocks in opencode.json

Workaround today (ugly)

export OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576

Plus per-model limit.input for shared-window models — none of which should be required when limit.output is already configured.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions