fix: add fallback vision detection for Ollama models with incomplete capabilities#12214
Draft
roomote-v0[bot] wants to merge 1 commit intomainfrom
Draft
fix: add fallback vision detection for Ollama models with incomplete capabilities#12214roomote-v0[bot] wants to merge 1 commit intomainfrom
roomote-v0[bot] wants to merge 1 commit intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related GitHub Issue
Closes: #12211
Description
This PR attempts to address Issue #12211 where Ollama models (like gemma4 unsloth variants) report "Does not support images" even though they have vision capabilities. The root cause is that some third-party model quants strip the
"vision"entry from the Ollamacapabilitiesarray.How it works:
The
parseOllamaModelfunction previously relied solely oncapabilities.includes("vision"). This PR adds adetectVisionSupport()helper that checks three sources in order:capabilitiesarray (authoritative, preferred) -- unchanged behaviordetails.families-- checks for known vision encoder family names (clip,siglip,mmproj,mllama)model_infokeys -- regex match for keys containingvision,clip,siglip,mmproj, orimage_encoderThis makes Roo Code more resilient when the capabilities array is incomplete while still preferring the authoritative field when available.
Test Procedure
ollama.test.tspassPre-Submission Checklist
Documentation Updates
Additional Notes
Feedback and guidance are welcome. The set of vision family names and model_info key patterns may need to be expanded as new multimodal architectures appear in Ollama.
Interactively review PR in Roo Code Cloud