-
Notifications
You must be signed in to change notification settings - Fork 3.8k
fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845
Conversation
…n, and Mistral models The isModelCapable function was showing false warnings for Llama, Nemotron, and Mistral models, claiming they had "limited reasoning and tool calling capabilities" when they actually have excellent capabilities. **Changes:** - Added /llama/, /nemotron/, /mistral/ patterns to capability detection regex - Updated tests to reflect that these model families ARE capable - All tests passing (26/26) **Research validation:** - Llama 3.3/Nemotron: continuedev#1 on alignment benchmarks, Arena Hard 85.0 - Mistral: 81.2% MMLU, supports function calling and JSON mode - Both families widely used for agent workflows with proven tool calling **Impact:** - Removes false warnings for users of these popular model families - Enables proper multiEdit tool usage for capable models - Aligns detection with real-world model capabilities Tested with nvidia/Llama-3_3-Nemotron-Super-49B-v1 on MITRE AIP endpoints. Authored by: Aaron Lippold <lippold@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 2 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronlippold we've not included 7B models because they often have poor outputs and are (debatably of course) relatively on the less capable end. In our experience the CLI performs poorly with most llama models and we'd want users to be warned
Thoughts on an environment variable that hides this warning instead? Or perhaps, making it show for only a couple consecutive sessions?
|
Sure, I’d be happy to do it that way. I will take a look at it tomorrow
I am also a contributor that doesn’t mind if you just cherry pick what you
like and tell me why you left the other stuff up. It’s your code base after
all :-)
…--------
Aaron Lippold
***@***.***
260-255-4779
twitter/aim/yahoo,etc.
'aaronlippold'
On Tue, Nov 25, 2025 at 23:15 Dallin Romney ***@***.***> wrote:
***@***.**** requested changes on this pull request.
@aaronlippold <https://github.com/aaronlippold> we've not included 7B
models because they often have poor outputs and are (debatably of course)
relatively on the less capable end
Thoughts on an environment variable that hides this warning instead?
—
Reply to this email directly, view it on GitHub
<#8845 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALK42H533ZDAU23QLTGV7L36USOXAVCNFSM6AAAAACM4WM632VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMBYGU2DQNRUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Summary
Fixes false capability warnings for Llama, Nemotron, and Mistral model families in Continue CLI.
Problem
The
isModelCapablefunction was showing warnings like:This warning was incorrect - these models have excellent capabilities.
Solution
Added detection patterns for three major model families:
/llama/- Meta Llama models (3.1, 3.3, Code Llama, etc.)/nemotron/- NVIDIA Nemotron models/mistral/- Mistral AI models (Small, Large, Codestral, etc.)Research Validation
Llama 3.3 / Nemotron:
Mistral:
Testing
Impact
Checklist
Authored by: Aaron Lippold lippold@gmail.com
Summary by cubic
Fixes false capability warnings in the Continue CLI by recognizing Llama, Nemotron, and Mistral models as capable. This makes capability checks accurate and enables tools like multiEdit for these models.
Written for commit d494bc1. Summary will update automatically on new commits.