Skip to content

fix: add fallback vision detection for Ollama models with incomplete capabilities#12214

Draft
roomote-v0[bot] wants to merge 1 commit intomainfrom
fix/ollama-vision-fallback-detection
Draft

fix: add fallback vision detection for Ollama models with incomplete capabilities#12214
roomote-v0[bot] wants to merge 1 commit intomainfrom
fix/ollama-vision-fallback-detection

Conversation

@roomote-v0
Copy link
Copy Markdown
Contributor

@roomote-v0 roomote-v0 Bot commented Apr 28, 2026

Related GitHub Issue

Closes: #12211

Description

This PR attempts to address Issue #12211 where Ollama models (like gemma4 unsloth variants) report "Does not support images" even though they have vision capabilities. The root cause is that some third-party model quants strip the "vision" entry from the Ollama capabilities array.

How it works:

The parseOllamaModel function previously relied solely on capabilities.includes("vision"). This PR adds a detectVisionSupport() helper that checks three sources in order:

  1. capabilities array (authoritative, preferred) -- unchanged behavior
  2. details.families -- checks for known vision encoder family names (clip, siglip, mmproj, mllama)
  3. model_info keys -- regex match for keys containing vision, clip, siglip, mmproj, or image_encoder

This makes Roo Code more resilient when the capabilities array is incomplete while still preferring the authoritative field when available.

Test Procedure

  • Added 8 new unit tests covering all fallback paths (families-based detection, model_info-based detection, case insensitivity, and negative cases)
  • All 20 tests in ollama.test.ts pass
  • Lint and type checks pass across the monorepo

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes.
  • Documentation Impact: No documentation updates required.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Documentation Updates

  • No documentation updates are required.

Additional Notes

Feedback and guidance are welcome. The set of vision family names and model_info key patterns may need to be expanded as new multimodal architectures appear in Ollama.

Interactively review PR in Roo Code Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] GEMMA 4 has Vision but ROO CODE says Does not support images

1 participant