Summary
The modalities config field for custom provider model entries is not documented anywhere — not on opencode.ai/docs, not in the README, not in the public config.json schema reference. Users hit confusing errors and silent failures with no way to fix them without reading source.
The problem in practice
When using a vision-capable model via a custom provider (e.g. gemma4:26b via Ollama / @ai-sdk/openai-compatible), images are silently stripped and the model receives:
ERROR: Cannot read image (this model does not support image input). Inform the user.
The fix is to declare modalities in the model config — but this is impossible to discover without reading provider.ts.
The fix (what should be documented)
Add modalities to a custom model entry to declare its input/output capabilities:
{
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"gemma4:26b": {
"modalities": {
"input": ["text", "image"],
"output": ["text"]
}
}
}
}
}
}
Allowed modality values (from provider.ts): "text", "audio", "image", "video", "pdf".
Without this, capabilities.input.image defaults to false for all custom provider models, so vision never works regardless of what the underlying model supports.
Also worth noting
supportsMediaInToolResults in message-v2.ts is a hardcoded allowlist (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/amazon-bedrock, @ai-sdk/google). Custom providers using @ai-sdk/openai-compatible (the standard path for Ollama) are excluded, so MCP tool results containing images are never passed natively in the tool result — they get re-injected as synthetic user messages instead. This is a separate issue but compounds the confusion.
Suggested fix
Document modalities in the configuration reference, ideally with a "custom Ollama vision model" example.
Summary
The
modalitiesconfig field for custom provider model entries is not documented anywhere — not on opencode.ai/docs, not in the README, not in the public config.json schema reference. Users hit confusing errors and silent failures with no way to fix them without reading source.The problem in practice
When using a vision-capable model via a custom provider (e.g.
gemma4:26bvia Ollama /@ai-sdk/openai-compatible), images are silently stripped and the model receives:The fix is to declare
modalitiesin the model config — but this is impossible to discover without readingprovider.ts.The fix (what should be documented)
Add
modalitiesto a custom model entry to declare its input/output capabilities:{ "provider": { "ollama": { "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://localhost:11434/v1" }, "models": { "gemma4:26b": { "modalities": { "input": ["text", "image"], "output": ["text"] } } } } } }Allowed modality values (from
provider.ts):"text","audio","image","video","pdf".Without this,
capabilities.input.imagedefaults tofalsefor all custom provider models, so vision never works regardless of what the underlying model supports.Also worth noting
supportsMediaInToolResultsinmessage-v2.tsis a hardcoded allowlist (@ai-sdk/anthropic,@ai-sdk/openai,@ai-sdk/amazon-bedrock,@ai-sdk/google). Custom providers using@ai-sdk/openai-compatible(the standard path for Ollama) are excluded, so MCP tool results containing images are never passed natively in the tool result — they get re-injected as synthetic user messages instead. This is a separate issue but compounds the confusion.Suggested fix
Document
modalitiesin the configuration reference, ideally with a "custom Ollama vision model" example.