Skip to content

cloudflare-ai-gateway: Azure GPT-5.x models fail (max_tokens vs max_completion_tokens); per-model compat not honored #32776

@ArchiTecCTT

Description

@ArchiTecCTT

Summary

Azure GPT-5.x deployments routed through the built-in cloudflare-ai-gateway provider fail in OpenCode because requests use max_tokens instead of max_completion_tokens.

This is distinct from #25096 (custom @ai-sdk/openai-compatible providers) and #22623 (native Azure provider). Here the failure is in the built-in cloudflare-ai-gateway path backed by @earendil-works/pi-ai.

Environment

  • OpenCode: 1.17.8
  • Provider: cloudflare-ai-gateway (connected via /connect, Azure keys in gateway BYOK)
  • Gateway compat endpoint: https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/compat
  • Model IDs: azure-openai/gpt-5.4-nano, azure-openai/gpt-5.4, azure-openai/gpt-5.4-mini, azure-openai/gpt-5.5
  • Other Azure deployments on the same provider (e.g. grok-4-20-non-reasoning, kimi-k2.6) work fine

Reproduce

~/.config/opencode/opencode.jsonc:

{
  "provider": {
    "cloudflare-ai-gateway": {
      "models": {
        "azure-openai/gpt-5.4-nano": { "name": "GPT 5.4 Nano (Azure)" }
      }
    }
  }
}
opencode run -m cloudflare-ai-gateway/azure-openai/gpt-5.4-nano "Reply with exactly: OK"

Error:

Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

Root cause (pi-ai)

In @earendil-works/pi-ai openai-completions.js, detectCompat() treats all cloudflare-ai-gateway URLs as non-standard and sets:

maxTokensField: useMaxTokens ? "max_tokens" : "max_completion_tokens"
// where useMaxTokens includes isCloudflareAiGateway

So every gateway request sends max_tokens, which newer Azure GPT-5 deployments reject.

getCompat() can override via model.compat.maxTokensField, but that override is not reachable from normal OpenCode config for custom models added in opencode.jsonc.

Workarounds attempted (none worked)

  1. Per-model options.compat.maxTokensField in opencode.jsonc
  2. modelOverrides / full model entries in ~/.config/opencode/models.json
  3. Full model entries in ~/.pi/agent/models.json (pi docs path)
  4. Separate custom provider using @ai-sdk/openai-compatible pointed at the same compat baseURL (still sends max_tokens — same class of bug as openai-compatible adapter sends max_tokens to GPT-5/o-series reasoning models that require max_completion_tokens #25096)

Expected behavior

At least one of these should work:

  • options.compat.maxTokensField: "max_completion_tokens" per model in opencode.jsonc
  • modelOverrides / models.json compat for config-defined cloudflare-ai-gateway models
  • Smarter defaults: azure-openai/gpt-5* routes use max_completion_tokens while other gateway models keep max_tokens

Suggested fix

  1. OpenCode: Map provider.models.*.options.compat (or top-level model compat) into pi model.compat when registering config models.
  2. pi-ai: Do not blanket-force max_tokens for all cloudflare-ai-gateway traffic; either per-model override or prefix-based detection for azure-openai/gpt-5*.

Notes

  • Gateway + Azure deployment itself is healthy; this is a client parameter naming issue.
  • Switching to the native Azure provider without pointing baseURL at the gateway would bypass Cloudflare AI Gateway observability/BYOK routing.

Related: #22623, #25096

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions