Bug: `limit.output` in config is silently capped at 32k; `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` is a poor workaround

## Summary

OpenCode silently caps per-step `maxOutputTokens` at **32,000** even when `opencode.json` sets a much larger `limit.output` (e.g. 384000 for DeepSeek, 128000 for GPT/Claude). The only documented escape hatch is an **experimental** environment variable, `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX`, which users must discover via scattered GitHub threads.

This is surprising, breaks agent runs on reasoning-heavy models, and makes user configuration misleading.

## Expected behavior

When a model defines explicit limits in config:

```json
"deepseek-v4-flash": {
  "limit": {
    "context": 1000000,
    "output": 384000
  }
}
```

OpenCode should send `max_output_tokens` / `max_tokens` based on **`limit.output`** (possibly clamped to what the provider actually supports), not an undocumented global ceiling of 32k.

## Actual behavior

`ProviderTransform.maxOutputTokens()` always applies a default cap:

```ts
export const OUTPUT_TOKEN_MAX = 32_000

export function maxOutputTokens(model: Provider.Model, outputTokenMax = OUTPUT_TOKEN_MAX): number {
  return Math.min(model.limit.output, outputTokenMax) || outputTokenMax
}
```

Unless `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` is set in the environment, **`min(384000, 32000) === 32000`** is what gets sent to the API.

### Real-world impact (agent benchmark)

On a coding task with `reasoningEffort: "max"` via `@ai-sdk/openai-compatible`:

- A step finishes with `reason: "length"`
- Token usage: `reasoning: 32000`, `output: 0`
- The agent stops mid-task with no visible completion

The same model with `reasoningEffort: "high"` completes normally in the same step budget because reasoning does not consume the entire cap.

This is not a provider failure; logs show OpenCode requesting ~32k max output while config says 384k.

## Why `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` feels like the wrong fix

1. **Misleading config** — Users set `limit.output` in `opencode.json` believing it controls API behavior. It does not, unless they also export an env var whose name does not reference `limit.output` at all.

2. **Experimental / undiscoverable** — The variable is buried under “experimental” flags in CLI docs. Issue #1735 was auto-closed; #20078 remains open with the same root cause.

3. **Global knob for per-model limits** — Setting `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576` is a shotgun fix across all models instead of respecting per-model `limit.output`.

4. **Still does not fix overflow semantics** — `overflow.ts` uses `context - maxOutputTokens()` when `limit.input` is absent. Models with `output` close to `context` (e.g. Kimi K2.6 at 262144/262144) get `usable === 0` once the env var is raised, causing immediate compaction/overflow unless users also add `limit.input` — another undocumented workaround.

## Related issues

- #1735 — max_tokens defaults to 32000 for custom providers (closed by stale bot, problem persists)
- #20078 — `limit.output` ignored, hardcodes 32000 (open)
- #2949 — 32k cap vs extended thinking / zero output tokens (closed with partial fix)

## Suggested direction

1. **Respect configured `limit.output` by default** — Use `OUTPUT_TOKEN_MAX` (32k) only as a fallback when `limit.output` is missing or zero, not as `Math.min(limit.output, 32k)` for every model. (PR #24384 attempted this but was closed as “not the right fix”; the problem remains.)

2. **Optional global ceiling** — If a safety cap is still desired, expose it in `opencode.json` (e.g. `limits.maxOutputTokens`) rather than a cryptic env var.

3. **Decouple overflow math from per-step output cap** — `usable` should not use `context - limit.output` when `output` can equal the full window; consider `limit.input` defaults or `context - reserved` for shared-window models.

4. **Document the contract** — Clarify in config schema/docs what `limit.context`, `limit.input`, and `limit.output` each control (API request vs compaction vs fallback).

## Environment

- OpenCode: v1.14.x (local install from `anomalyco/opencode`)
- Custom provider: `@ai-sdk/openai-compatible` → OpenAI-compatible gateway
- Models: DeepSeek V4 Flash/Pro, Kimi K2.6, GPT 5.5, etc. with explicit `limit` blocks in `opencode.json`

## Workaround today (ugly)

```bash
export OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX=1048576
```

Plus per-model `limit.input` for shared-window models — none of which should be required when `limit.output` is already configured.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: `limit.output` in config is silently capped at 32k; `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` is a poor workaround #29363

Summary

Expected behavior

Actual behavior

Real-world impact (agent benchmark)

Why `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` feels like the wrong fix

Related issues

Suggested direction

Environment

Workaround today (ugly)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: limit.output in config is silently capped at 32k; OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX is a poor workaround #29363

Description

Summary

Expected behavior

Actual behavior

Real-world impact (agent benchmark)

Why OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX feels like the wrong fix

Related issues

Suggested direction

Environment

Workaround today (ugly)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: `limit.output` in config is silently capped at 32k; `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` is a poor workaround #29363

Why `OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX` feels like the wrong fix