Skip to content

[Bug] Images silently discarded with local Ollama vision models + TUI lacks drag-and-drop #26780

@LifetimeVip

Description

@LifetimeVip

Description

When using OpenCode with a local Ollama instance serving vision models (e.g. llama3.2-vision, llava), attaching images via drag-drop or paste silently fails — the model never sees the image. Additionally, the TUI has no drag-and-drop support for images at all.

After tracing the full pipeline through both the Ollama server source and OpenCode's provider/integration layers, two root causes were identified.


Root Cause #1 (Critical): Image capabilities default to false for user-configured models

The Chain of Failure

User configures local Ollama model without `modalities` field
    ↓
provider.ts:1224  →  capabilities.input.image = model.modalities?.... ?? false
    ↓
transform.ts:393  →  unsupportedParts() sees capabilities.input.image === false
    ↓
Image part is SILENTLY replaced with error text
    ↓
LLM receives error text instead of the image — vision is broken

Code Evidence

Step 1 — User config parsing defaults image to false:

packages/opencode/src/provider/provider.ts:1221-1226

input: {
  text: model.modalities?.input?.includes("text") ?? existingModel?.capabilities.input.text ?? true,
  audio: model.modalities?.input?.includes("audio") ?? existingModel?.capabilities.input.audio ?? false,
  image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image ?? false,
  //                                                                                               ^^^^^
  // For local Ollama models: no modalities in user config → undefined
  //                           no existingModel (models-api.json only has Ollama Cloud) → undefined
  //                           Result: FALSE → images blocked
},

Step 2 — unsupportedParts() silently discards the image:

packages/opencode/src/provider/transform.ts:393-428

function unsupportedParts(msgs: ModelMessage[], model: Provider.Model): ModelMessage[] {
  return msgs.map((msg) => {
    if (msg.role !== "user" || !Array.isArray(msg.content)) return msg
    const filtered = msg.content.map((part) => {
      if (part.type !== "file" && part.type !== "image") return part
      // ...
      const mime = part.type === "image" ? String(part.image).split(";")[0].replace("data:", "") : part.mediaType
      const modality = mimeToModality(mime)        // "image/png" → "image"
      if (!modality) return part
      if (model.capabilities.input[modality]) return part  // FALSE → never returns
      // Replaces image with error text — user never sees this!
      return {
        type: "text" as const,
        text: `ERROR: Cannot read ${name} (this model does not support ${modality} input). Inform the user.`,
      }
    })
    return { ...msg, content: filtered }
  })
}

Step 3 — The official documentation example hides the modalities field:

packages/web/src/content/docs/providers.mdx (and all 15 translations):

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "llama3.2-vision": { "name": "llama3.2-vision" }
        // ❌ No modalities field → capabilities.input.image = false
        // User MUST add: "modalities": { "input": ["text", "image"], "output": ["text"] }
      }
    }
  }
}

Step 4 — The config schema DOES support modalities, but docs never mention it:

packages/opencode/src/config/provider.ts:46-51

modalities: Schema.optional(
  Schema.Struct({
    input: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
    output: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
  }),
),

Why This Matters

This affects every user who configures a local vision model provider manually — not just Ollama, but also vLLM, LocalAI, LM Studio, etc. The AI SDK's @ai-sdk/openai-compatible provider correctly converts file parts to image_url format, and the downstream server handles them fine — but OpenCode strips the image before the request is even sent because capabilities.input.image is false.

Ollama Confirmation

The Ollama server's OpenAI-compatible layer (openai/openai.go:FromChatRequest) DOES handle image_url content parts and converts them to the native api.Message.Images field:

case "image_url":
    // ... parses url ...
    img, err := decodeImageURL(url)
    if err != nil {
        return nil, err
    }
    messages = append(messages, api.Message{Role: msg.Role, Images: []api.ImageData{img}})

So the Ollama server is ready to receive images — OpenCode just never sends them.


Root Cause #2: TUI has zero drag-and-drop event handlers

Code Evidence

The TUI prompt <textarea> in packages/opencode/src/cli/cmd/tui/component/prompt/index.tsx has only these event handlers:

// Lines 1484-1552 — the <textarea> element
<textarea
  onContentChange={...}
  onCursorChange={...}
  onKeyDown={...}
  onSubmit={...}
  onPaste={...}      // ← paste only, NO drag events
  onMouseDown={...}
/>

Zero matches for drag, Drop, dragOver, onDrop, onDrag anywhere in the TUI prompt directory.

Compare with the App UI (packages/app/src/components/prompt-input/attachments.ts:143-188) which has full global drag/drop listeners:

makeEventListener(document, "dragover", handleGlobalDragOver)
makeEventListener(document, "dragleave", handleGlobalDragLeave)
makeEventListener(document, "drop", handleGlobalDrop)

The TUI tips file (home/tips-view.tsx:61) even suggests:

"Drag and drop images or PDFs into the terminal to add them as context"

This depends entirely on terminal-emulator-level file drag support (e.g., kitty, iTerm2), which is not available in many terminals (Windows Terminal <1.25, most Linux terminals, etc.), and is NOT implemented in OpenCode's TUI code.


What Works (Already Implemented)

Feature TUI App UI
Ctrl+V image paste Yes (macOS / Windows / Linux) Yes
Drag-drop images No Yes
AI SDK file→image_url conversion Yes (convertToOpenAICompatibleChatMessages) Yes
Image normalization (resize/compress) Yes (image.ts) Yes
Ollama server image reception Yes (FromChatRequest) Yes

Suggested Fix

Fix 1: Default image to true for OpenAI-compatible providers

In provider.ts:1224, when the user hasn't set modalities AND no existingModel exists from models-api, default image to true if the npm is @ai-sdk/openai-compatible (since all OpenAI-compatible APIs support image_url parts by protocol):

  image: model.modalities?.input?.includes("image") 
    ?? existingModel?.capabilities.input.image 
-   ?? false,
+   ?? (apiNpm === "@ai-sdk/openai-compatible" ? true : false),

The user can still set "modalities": { "input": ["text"] } to explicitly disable image support.

Fix 2: Add drag-drop to TUI prompt

Add onDrop / onDragOver event handlers to the TUI <textarea> (or a surrounding wrapper), reading event.dataTransfer.files and invoking the existing pasteAttachment() pipeline.

Fix 3: Document modalities field

Add the modalities field to all provider config documentation examples, especially the Ollama section in providers.mdx.


CC @thdxr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions