Description
When using OpenCode with a local Ollama instance serving vision models (e.g. llama3.2-vision, llava), attaching images via drag-drop or paste silently fails — the model never sees the image. Additionally, the TUI has no drag-and-drop support for images at all.
After tracing the full pipeline through both the Ollama server source and OpenCode's provider/integration layers, two root causes were identified.
Root Cause #1 (Critical): Image capabilities default to false for user-configured models
The Chain of Failure
User configures local Ollama model without `modalities` field
↓
provider.ts:1224 → capabilities.input.image = model.modalities?.... ?? false
↓
transform.ts:393 → unsupportedParts() sees capabilities.input.image === false
↓
Image part is SILENTLY replaced with error text
↓
LLM receives error text instead of the image — vision is broken
Code Evidence
Step 1 — User config parsing defaults image to false:
packages/opencode/src/provider/provider.ts:1221-1226
input: {
text: model.modalities?.input?.includes("text") ?? existingModel?.capabilities.input.text ?? true,
audio: model.modalities?.input?.includes("audio") ?? existingModel?.capabilities.input.audio ?? false,
image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image ?? false,
// ^^^^^
// For local Ollama models: no modalities in user config → undefined
// no existingModel (models-api.json only has Ollama Cloud) → undefined
// Result: FALSE → images blocked
},
Step 2 — unsupportedParts() silently discards the image:
packages/opencode/src/provider/transform.ts:393-428
function unsupportedParts(msgs: ModelMessage[], model: Provider.Model): ModelMessage[] {
return msgs.map((msg) => {
if (msg.role !== "user" || !Array.isArray(msg.content)) return msg
const filtered = msg.content.map((part) => {
if (part.type !== "file" && part.type !== "image") return part
// ...
const mime = part.type === "image" ? String(part.image).split(";")[0].replace("data:", "") : part.mediaType
const modality = mimeToModality(mime) // "image/png" → "image"
if (!modality) return part
if (model.capabilities.input[modality]) return part // FALSE → never returns
// Replaces image with error text — user never sees this!
return {
type: "text" as const,
text: `ERROR: Cannot read ${name} (this model does not support ${modality} input). Inform the user.`,
}
})
return { ...msg, content: filtered }
})
}
Step 3 — The official documentation example hides the modalities field:
packages/web/src/content/docs/providers.mdx (and all 15 translations):
{
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"llama3.2-vision": { "name": "llama3.2-vision" }
// ❌ No modalities field → capabilities.input.image = false
// User MUST add: "modalities": { "input": ["text", "image"], "output": ["text"] }
}
}
}
}
Step 4 — The config schema DOES support modalities, but docs never mention it:
packages/opencode/src/config/provider.ts:46-51
modalities: Schema.optional(
Schema.Struct({
input: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
output: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
}),
),
Why This Matters
This affects every user who configures a local vision model provider manually — not just Ollama, but also vLLM, LocalAI, LM Studio, etc. The AI SDK's @ai-sdk/openai-compatible provider correctly converts file parts to image_url format, and the downstream server handles them fine — but OpenCode strips the image before the request is even sent because capabilities.input.image is false.
Ollama Confirmation
The Ollama server's OpenAI-compatible layer (openai/openai.go:FromChatRequest) DOES handle image_url content parts and converts them to the native api.Message.Images field:
case "image_url":
// ... parses url ...
img, err := decodeImageURL(url)
if err != nil {
return nil, err
}
messages = append(messages, api.Message{Role: msg.Role, Images: []api.ImageData{img}})
So the Ollama server is ready to receive images — OpenCode just never sends them.
Root Cause #2: TUI has zero drag-and-drop event handlers
Code Evidence
The TUI prompt <textarea> in packages/opencode/src/cli/cmd/tui/component/prompt/index.tsx has only these event handlers:
// Lines 1484-1552 — the <textarea> element
<textarea
onContentChange={...}
onCursorChange={...}
onKeyDown={...}
onSubmit={...}
onPaste={...} // ← paste only, NO drag events
onMouseDown={...}
/>
Zero matches for drag, Drop, dragOver, onDrop, onDrag anywhere in the TUI prompt directory.
Compare with the App UI (packages/app/src/components/prompt-input/attachments.ts:143-188) which has full global drag/drop listeners:
makeEventListener(document, "dragover", handleGlobalDragOver)
makeEventListener(document, "dragleave", handleGlobalDragLeave)
makeEventListener(document, "drop", handleGlobalDrop)
The TUI tips file (home/tips-view.tsx:61) even suggests:
"Drag and drop images or PDFs into the terminal to add them as context"
This depends entirely on terminal-emulator-level file drag support (e.g., kitty, iTerm2), which is not available in many terminals (Windows Terminal <1.25, most Linux terminals, etc.), and is NOT implemented in OpenCode's TUI code.
What Works (Already Implemented)
| Feature |
TUI |
App UI |
| Ctrl+V image paste |
Yes (macOS / Windows / Linux) |
Yes |
| Drag-drop images |
No |
Yes |
| AI SDK file→image_url conversion |
Yes (convertToOpenAICompatibleChatMessages) |
Yes |
| Image normalization (resize/compress) |
Yes (image.ts) |
Yes |
| Ollama server image reception |
Yes (FromChatRequest) |
Yes |
Suggested Fix
Fix 1: Default image to true for OpenAI-compatible providers
In provider.ts:1224, when the user hasn't set modalities AND no existingModel exists from models-api, default image to true if the npm is @ai-sdk/openai-compatible (since all OpenAI-compatible APIs support image_url parts by protocol):
image: model.modalities?.input?.includes("image")
?? existingModel?.capabilities.input.image
- ?? false,
+ ?? (apiNpm === "@ai-sdk/openai-compatible" ? true : false),
The user can still set "modalities": { "input": ["text"] } to explicitly disable image support.
Fix 2: Add drag-drop to TUI prompt
Add onDrop / onDragOver event handlers to the TUI <textarea> (or a surrounding wrapper), reading event.dataTransfer.files and invoking the existing pasteAttachment() pipeline.
Fix 3: Document modalities field
Add the modalities field to all provider config documentation examples, especially the Ollama section in providers.mdx.
CC @thdxr
Description
When using OpenCode with a local Ollama instance serving vision models (e.g.
llama3.2-vision,llava), attaching images via drag-drop or paste silently fails — the model never sees the image. Additionally, the TUI has no drag-and-drop support for images at all.After tracing the full pipeline through both the Ollama server source and OpenCode's provider/integration layers, two root causes were identified.
Root Cause #1 (Critical): Image capabilities default to
falsefor user-configured modelsThe Chain of Failure
Code Evidence
Step 1 — User config parsing defaults image to
false:packages/opencode/src/provider/provider.ts:1221-1226Step 2 —
unsupportedParts()silently discards the image:packages/opencode/src/provider/transform.ts:393-428Step 3 — The official documentation example hides the
modalitiesfield:packages/web/src/content/docs/providers.mdx(and all 15 translations):{ "provider": { "ollama": { "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://localhost:11434/v1" }, "models": { "llama3.2-vision": { "name": "llama3.2-vision" } // ❌ No modalities field → capabilities.input.image = false // User MUST add: "modalities": { "input": ["text", "image"], "output": ["text"] } } } } }Step 4 — The config schema DOES support
modalities, but docs never mention it:packages/opencode/src/config/provider.ts:46-51Why This Matters
This affects every user who configures a local vision model provider manually — not just Ollama, but also vLLM, LocalAI, LM Studio, etc. The AI SDK's
@ai-sdk/openai-compatibleprovider correctly converts file parts toimage_urlformat, and the downstream server handles them fine — but OpenCode strips the image before the request is even sent becausecapabilities.input.imageisfalse.Ollama Confirmation
The Ollama server's OpenAI-compatible layer (
openai/openai.go:FromChatRequest) DOES handleimage_urlcontent parts and converts them to the nativeapi.Message.Imagesfield:So the Ollama server is ready to receive images — OpenCode just never sends them.
Root Cause #2: TUI has zero drag-and-drop event handlers
Code Evidence
The TUI prompt
<textarea>inpackages/opencode/src/cli/cmd/tui/component/prompt/index.tsxhas only these event handlers:Zero matches for
drag,Drop,dragOver,onDrop,onDraganywhere in the TUI prompt directory.Compare with the App UI (
packages/app/src/components/prompt-input/attachments.ts:143-188) which has full global drag/drop listeners:The TUI tips file (
home/tips-view.tsx:61) even suggests:This depends entirely on terminal-emulator-level file drag support (e.g., kitty, iTerm2), which is not available in many terminals (Windows Terminal <1.25, most Linux terminals, etc.), and is NOT implemented in OpenCode's TUI code.
What Works (Already Implemented)
convertToOpenAICompatibleChatMessages)image.ts)FromChatRequest)Suggested Fix
Fix 1: Default image to
truefor OpenAI-compatible providersIn
provider.ts:1224, when the user hasn't set modalities AND no existingModel exists from models-api, defaultimagetotrueif the npm is@ai-sdk/openai-compatible(since all OpenAI-compatible APIs supportimage_urlparts by protocol):image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image - ?? false, + ?? (apiNpm === "@ai-sdk/openai-compatible" ? true : false),The user can still set
"modalities": { "input": ["text"] }to explicitly disable image support.Fix 2: Add drag-drop to TUI prompt
Add
onDrop/onDragOverevent handlers to the TUI<textarea>(or a surrounding wrapper), readingevent.dataTransfer.filesand invoking the existingpasteAttachment()pipeline.Fix 3: Document
modalitiesfieldAdd the
modalitiesfield to all provider config documentation examples, especially the Ollama section inproviders.mdx.CC @thdxr