[FEATURE]: Auto-describe images via vision fallback when active model lacks vision support

### Feature hasn't been suggested before.

- [x] I have verified this feature I'm about to request hasn't been suggested before. (See note below about #22828.)

### Describe the enhancement you want to request

When using a coding-focused LLM that doesn't support image input (e.g. DeepSeek, GLM, Haiku), pasting a screenshot into the chat causes the image to be silently dropped. The model receives an error string instead, with no indication that images aren't supported.

**Proposed solution:** Before calling `streamText`, if the active model can't read images but the message contains image parts, automatically find a vision-capable model from any configured provider, call it with a description prompt, and replace the image parts with the returned text description. The main model then receives the description as plain text — entirely transparent, no model switching required.

A new optional `vision_model` config field lets users pin a specific model (e.g. `openai/gpt-4o`). If not set, the first image-capable model found across all configured providers is used, skipping the current model to handle cases where the active provider has billing or rate-limit issues.

**Implementation:** PR #24382

**Note on #22828:** That issue proposes a similar concept but with a different scope — it focuses on transcription for non-multimodal *providers*, while this proposal handles the case where the *specific active model* lacks vision support, regardless of provider, and includes fallback to any available provider. The implementation here also adds a dedicated `vision_model` config field and integrates into the streaming pipeline rather than as a pre-processing step.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Auto-describe images via vision fallback when active model lacks vision support #24948

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE]: Auto-describe images via vision fallback when active model lacks vision support #24948

Description

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions