fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845

aaronlippold · 2025-11-22T15:37:15Z

Summary

Fixes false capability warnings for Llama, Nemotron, and Mistral model families in Continue CLI.

Problem

The isModelCapable function was showing warnings like:

"The model 'Llama 3.3 Nemotron 49B' is not recommended for use with cn due to limited reasoning and tool calling capabilities"

This warning was incorrect - these models have excellent capabilities.

Solution

Added detection patterns for three major model families:

/llama/ - Meta Llama models (3.1, 3.3, Code Llama, etc.)
/nemotron/ - NVIDIA Nemotron models
/mistral/ - Mistral AI models (Small, Large, Codestral, etc.)

Research Validation

Llama 3.3 / Nemotron:

Arena Hard: 85.0
AlpacaEval 2 LC: 57.6
Prompt engineer a faster method of code editing #1 on automatic alignment benchmarks (as of Oct 2024)
Proven tool calling support

Mistral:

MMLU: 81.2% (near GPT-4)
Native function calling and JSON mode
Strong code generation and reasoning

Testing

✅ All 26 tests passing
✅ Updated tests to reflect intentional behavior change
✅ Lint and format checks passing
✅ Verified with nvidia/Llama-3_3-Nemotron-Super-49B-v1

Impact

Removes false warnings for widely-used model families
Enables multiEdit tool for capable models
Improves user experience with open-weight models
Aligns capability detection with real-world performance

Checklist

Tests added/updated and passing
Code follows Continue coding standards
Changes are backward compatible
Research-validated change

Authored by: Aaron Lippold lippold@gmail.com

Summary by cubic

Fixes false capability warnings in the Continue CLI by recognizing Llama, Nemotron, and Mistral models as capable. This makes capability checks accurate and enables tools like multiEdit for these models.

Bug Fixes
- Added /llama/, /nemotron/, /mistral/ patterns to isModelCapable (matches name or model).
- Updated tests to mark these families capable across providers; 26/26 passing.

^{Written for commit d494bc1. Summary will update automatically on new commits.}

…n, and Mistral models The isModelCapable function was showing false warnings for Llama, Nemotron, and Mistral models, claiming they had "limited reasoning and tool calling capabilities" when they actually have excellent capabilities. **Changes:** - Added /llama/, /nemotron/, /mistral/ patterns to capability detection regex - Updated tests to reflect that these model families ARE capable - All tests passing (26/26) **Research validation:** - Llama 3.3/Nemotron: continuedev#1 on alignment benchmarks, Arena Hard 85.0 - Mistral: 81.2% MMLU, supports function calling and JSON mode - Both families widely used for agent workflows with proven tool calling **Impact:** - Removes false warnings for users of these popular model families - Enables proper multiEdit tool usage for capable models - Aligns detection with real-world model capabilities Tested with nvidia/Llama-3_3-Nemotron-Super-49B-v1 on MITRE AIP endpoints. Authored by: Aaron Lippold <lippold@gmail.com>

cubic-dev-ai

No issues found across 2 files

RomneyDa

@aaronlippold we've not included 7B models because they often have poor outputs and are (debatably of course) relatively on the less capable end. In our experience the CLI performs poorly with most llama models and we'd want users to be warned

Thoughts on an environment variable that hides this warning instead? Or perhaps, making it show for only a couple consecutive sessions?

aaronlippold · 2025-11-26T04:20:31Z

Sure, I’d be happy to do it that way. I will take a look at it tomorrow I am also a contributor that doesn’t mind if you just cherry pick what you like and tell me why you left the other stuff up. It’s your code base after all :-)

…

-------- Aaron Lippold ***@***.*** 260-255-4779 twitter/aim/yahoo,etc. 'aaronlippold'

On Tue, Nov 25, 2025 at 23:15 Dallin Romney ***@***.***> wrote: ***@***.**** requested changes on this pull request. @aaronlippold <https://github.com/aaronlippold> we've not included 7B models because they often have poor outputs and are (debatably of course) relatively on the less capable end Thoughts on an environment variable that hides this warning instead? — Reply to this email directly, view it on GitHub <#8845 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALK42H533ZDAU23QLTGV7L36USOXAVCNFSM6AAAAACM4WM632VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMBYGU2DQNRUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aaronlippold requested a review from a team as a code owner November 22, 2025 15:37

aaronlippold requested review from Patrick-Erichsen and removed request for a team November 22, 2025 15:37

github-project-automation bot moved this to Todo in Issues and PRs Nov 22, 2025

github-project-automation bot added this to Issues and PRs Nov 22, 2025

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 22, 2025

cubic-dev-ai bot reviewed Nov 22, 2025

View reviewed changes

RomneyDa requested changes Nov 26, 2025

View reviewed changes

github-project-automation bot moved this from Todo to In Progress in Issues and PRs Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845

fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845

aaronlippold commented Nov 22, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

RomneyDa left a comment •

edited

Loading

Uh oh!

aaronlippold commented Nov 26, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845

Are you sure you want to change the base?

fix(cli): expand model capability detection to include Llama, Nemotron, and Mistral #8845

Conversation

aaronlippold commented Nov 22, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Research Validation

Testing

Impact

Checklist

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

RomneyDa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronlippold commented Nov 26, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aaronlippold commented Nov 22, 2025 •

edited by cubic-dev-ai bot

Loading

RomneyDa left a comment •

edited

Loading