[Bug] Images silently discarded with local Ollama vision models + TUI lacks drag-and-drop

## Description

When using OpenCode with a **local Ollama** instance serving vision models (e.g. `llama3.2-vision`, `llava`), attaching images via drag-drop or paste **silently fails** — the model never sees the image. Additionally, the **TUI has no drag-and-drop support** for images at all.

After tracing the full pipeline through both the Ollama server source and OpenCode's provider/integration layers, two root causes were identified.

---

## Root Cause #1 (Critical): Image capabilities default to `false` for user-configured models

### The Chain of Failure

```
User configures local Ollama model without `modalities` field
    ↓
provider.ts:1224  →  capabilities.input.image = model.modalities?.... ?? false
    ↓
transform.ts:393  →  unsupportedParts() sees capabilities.input.image === false
    ↓
Image part is SILENTLY replaced with error text
    ↓
LLM receives error text instead of the image — vision is broken
```

### Code Evidence

**Step 1 — User config parsing defaults image to `false`:**

`packages/opencode/src/provider/provider.ts:1221-1226`
```typescript
input: {
  text: model.modalities?.input?.includes("text") ?? existingModel?.capabilities.input.text ?? true,
  audio: model.modalities?.input?.includes("audio") ?? existingModel?.capabilities.input.audio ?? false,
  image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image ?? false,
  //                                                                                               ^^^^^
  // For local Ollama models: no modalities in user config → undefined
  //                           no existingModel (models-api.json only has Ollama Cloud) → undefined
  //                           Result: FALSE → images blocked
},
```

**Step 2 — `unsupportedParts()` silently discards the image:**

`packages/opencode/src/provider/transform.ts:393-428`
```typescript
function unsupportedParts(msgs: ModelMessage[], model: Provider.Model): ModelMessage[] {
  return msgs.map((msg) => {
    if (msg.role !== "user" || !Array.isArray(msg.content)) return msg
    const filtered = msg.content.map((part) => {
      if (part.type !== "file" && part.type !== "image") return part
      // ...
      const mime = part.type === "image" ? String(part.image).split(";")[0].replace("data:", "") : part.mediaType
      const modality = mimeToModality(mime)        // "image/png" → "image"
      if (!modality) return part
      if (model.capabilities.input[modality]) return part  // FALSE → never returns
      // Replaces image with error text — user never sees this!
      return {
        type: "text" as const,
        text: `ERROR: Cannot read ${name} (this model does not support ${modality} input). Inform the user.`,
      }
    })
    return { ...msg, content: filtered }
  })
}
```

**Step 3 — The official documentation example hides the `modalities` field:**

`packages/web/src/content/docs/providers.mdx` (and all 15 translations):
```json
{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "llama3.2-vision": { "name": "llama3.2-vision" }
        // ❌ No modalities field → capabilities.input.image = false
        // User MUST add: "modalities": { "input": ["text", "image"], "output": ["text"] }
      }
    }
  }
}
```

**Step 4 — The config schema DOES support `modalities`, but docs never mention it:**

`packages/opencode/src/config/provider.ts:46-51`
```typescript
modalities: Schema.optional(
  Schema.Struct({
    input: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
    output: Schema.mutable(Schema.Array(Schema.Literals(["text", "audio", "image", "video", "pdf"]))),
  }),
),
```

### Why This Matters

This affects **every** user who configures a local vision model provider manually — not just Ollama, but also vLLM, LocalAI, LM Studio, etc. The AI SDK's `@ai-sdk/openai-compatible` provider correctly converts file parts to `image_url` format, and the downstream server handles them fine — but OpenCode strips the image **before the request is even sent** because `capabilities.input.image` is `false`.

### Ollama Confirmation

The Ollama server's OpenAI-compatible layer (`openai/openai.go:FromChatRequest`) DOES handle `image_url` content parts and converts them to the native `api.Message.Images` field:

```go
case "image_url":
    // ... parses url ...
    img, err := decodeImageURL(url)
    if err != nil {
        return nil, err
    }
    messages = append(messages, api.Message{Role: msg.Role, Images: []api.ImageData{img}})
```

So the Ollama server is **ready to receive images** — OpenCode just never sends them.

---

## Root Cause #2: TUI has zero drag-and-drop event handlers

### Code Evidence

The TUI prompt `<textarea>` in `packages/opencode/src/cli/cmd/tui/component/prompt/index.tsx` has only these event handlers:

```tsx
// Lines 1484-1552 — the <textarea> element
<textarea
  onContentChange={...}
  onCursorChange={...}
  onKeyDown={...}
  onSubmit={...}
  onPaste={...}      // ← paste only, NO drag events
  onMouseDown={...}
/>
```

**Zero matches** for `drag`, `Drop`, `dragOver`, `onDrop`, `onDrag` anywhere in the TUI prompt directory.

Compare with the App UI (`packages/app/src/components/prompt-input/attachments.ts:143-188`) which has full global drag/drop listeners:

```typescript
makeEventListener(document, "dragover", handleGlobalDragOver)
makeEventListener(document, "dragleave", handleGlobalDragLeave)
makeEventListener(document, "drop", handleGlobalDrop)
```

The TUI tips file (`home/tips-view.tsx:61`) even suggests:
> "Drag and drop images or PDFs into the terminal to add them as context"

This depends entirely on terminal-emulator-level file drag support (e.g., kitty, iTerm2), which is not available in many terminals (Windows Terminal <1.25, most Linux terminals, etc.), and is NOT implemented in OpenCode's TUI code.

---

## What Works (Already Implemented)

| Feature | TUI | App UI |
|---------|-----|--------|
| Ctrl+V image paste | Yes (macOS / Windows / Linux) | Yes |
| Drag-drop images | **No** | Yes |
| AI SDK file→image_url conversion | Yes (`convertToOpenAICompatibleChatMessages`) | Yes |
| Image normalization (resize/compress) | Yes (`image.ts`) | Yes |
| Ollama server image reception | Yes (`FromChatRequest`) | Yes |

---

## Suggested Fix

### Fix 1: Default image to `true` for OpenAI-compatible providers

In `provider.ts:1224`, when the user hasn't set modalities AND no existingModel exists from models-api, default `image` to `true` if the npm is `@ai-sdk/openai-compatible` (since all OpenAI-compatible APIs support `image_url` parts by protocol):

```diff
  image: model.modalities?.input?.includes("image") 
    ?? existingModel?.capabilities.input.image 
-   ?? false,
+   ?? (apiNpm === "@ai-sdk/openai-compatible" ? true : false),
```

The user can still set `"modalities": { "input": ["text"] }` to explicitly disable image support.

### Fix 2: Add drag-drop to TUI prompt

Add `onDrop` / `onDragOver` event handlers to the TUI `<textarea>` (or a surrounding wrapper), reading `event.dataTransfer.files` and invoking the existing `pasteAttachment()` pipeline.

### Fix 3: Document `modalities` field

Add the `modalities` field to all provider config documentation examples, especially the Ollama section in `providers.mdx`.

---

CC @thdxr


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Images silently discarded with local Ollama vision models + TUI lacks drag-and-drop #26780

Description

Root Cause #1 (Critical): Image capabilities default to `false` for user-configured models

The Chain of Failure

Code Evidence

Why This Matters

Ollama Confirmation

Root Cause #2: TUI has zero drag-and-drop event handlers

Code Evidence

What Works (Already Implemented)

Suggested Fix

Fix 1: Default image to `true` for OpenAI-compatible providers

Fix 2: Add drag-drop to TUI prompt

Fix 3: Document `modalities` field

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature	TUI	App UI
Ctrl+V image paste	Yes (macOS / Windows / Linux)	Yes
Drag-drop images	No	Yes
AI SDK file→image_url conversion	Yes (`convertToOpenAICompatibleChatMessages`)	Yes
Image normalization (resize/compress)	Yes (`image.ts`)	Yes
Ollama server image reception	Yes (`FromChatRequest`)	Yes

[Bug] Images silently discarded with local Ollama vision models + TUI lacks drag-and-drop #26780

Description

Description

Root Cause #1 (Critical): Image capabilities default to false for user-configured models

The Chain of Failure

Code Evidence

Why This Matters

Ollama Confirmation

Root Cause #2: TUI has zero drag-and-drop event handlers

Code Evidence

What Works (Already Implemented)

Suggested Fix

Fix 1: Default image to true for OpenAI-compatible providers

Fix 2: Add drag-drop to TUI prompt

Fix 3: Document modalities field

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Root Cause #1 (Critical): Image capabilities default to `false` for user-configured models

Fix 1: Default image to `true` for OpenAI-compatible providers

Fix 3: Document `modalities` field