Skip to content

docs: modalities field for custom provider models is undocumented #25229

@f-trycua

Description

@f-trycua

Summary

The modalities config field for custom provider model entries is not documented anywhere — not on opencode.ai/docs, not in the README, not in the public config.json schema reference. Users hit confusing errors and silent failures with no way to fix them without reading source.

The problem in practice

When using a vision-capable model via a custom provider (e.g. gemma4:26b via Ollama / @ai-sdk/openai-compatible), images are silently stripped and the model receives:

ERROR: Cannot read image (this model does not support image input). Inform the user.

The fix is to declare modalities in the model config — but this is impossible to discover without reading provider.ts.

The fix (what should be documented)

Add modalities to a custom model entry to declare its input/output capabilities:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "gemma4:26b": {
          "modalities": {
            "input": ["text", "image"],
            "output": ["text"]
          }
        }
      }
    }
  }
}

Allowed modality values (from provider.ts): "text", "audio", "image", "video", "pdf".

Without this, capabilities.input.image defaults to false for all custom provider models, so vision never works regardless of what the underlying model supports.

Also worth noting

supportsMediaInToolResults in message-v2.ts is a hardcoded allowlist (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/amazon-bedrock, @ai-sdk/google). Custom providers using @ai-sdk/openai-compatible (the standard path for Ollama) are excluded, so MCP tool results containing images are never passed natively in the tool result — they get re-injected as synthetic user messages instead. This is a separate issue but compounds the confusion.

Suggested fix

Document modalities in the configuration reference, ideally with a "custom Ollama vision model" example.

Metadata

Metadata

Assignees

Labels

docsneeds:complianceThis means the issue will auto-close after 2 hours.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions