Skip to content

Conversation

@aaronlippold
Copy link
Contributor

@aaronlippold aaronlippold commented Nov 22, 2025

Summary

Fixes false capability warnings for Llama, Nemotron, and Mistral model families in Continue CLI.

Problem

The isModelCapable function was showing warnings like:

"The model 'Llama 3.3 Nemotron 49B' is not recommended for use with cn due to limited reasoning and tool calling capabilities"

This warning was incorrect - these models have excellent capabilities.

Solution

Added detection patterns for three major model families:

  • /llama/ - Meta Llama models (3.1, 3.3, Code Llama, etc.)
  • /nemotron/ - NVIDIA Nemotron models
  • /mistral/ - Mistral AI models (Small, Large, Codestral, etc.)

Research Validation

Llama 3.3 / Nemotron:

Mistral:

  • MMLU: 81.2% (near GPT-4)
  • Native function calling and JSON mode
  • Strong code generation and reasoning

Testing

  • ✅ All 26 tests passing
  • ✅ Updated tests to reflect intentional behavior change
  • ✅ Lint and format checks passing
  • ✅ Verified with nvidia/Llama-3_3-Nemotron-Super-49B-v1

Impact

  • Removes false warnings for widely-used model families
  • Enables multiEdit tool for capable models
  • Improves user experience with open-weight models
  • Aligns capability detection with real-world performance

Checklist

  • Tests added/updated and passing
  • Code follows Continue coding standards
  • Changes are backward compatible
  • Research-validated change

Authored by: Aaron Lippold lippold@gmail.com


Summary by cubic

Fixes false capability warnings in the Continue CLI by recognizing Llama, Nemotron, and Mistral models as capable. This makes capability checks accurate and enables tools like multiEdit for these models.

  • Bug Fixes
    • Added /llama/, /nemotron/, /mistral/ patterns to isModelCapable (matches name or model).
    • Updated tests to mark these families capable across providers; 26/26 passing.

Written for commit d494bc1. Summary will update automatically on new commits.

…n, and Mistral models

The isModelCapable function was showing false warnings for Llama, Nemotron,
and Mistral models, claiming they had "limited reasoning and tool calling
capabilities" when they actually have excellent capabilities.

**Changes:**
- Added /llama/, /nemotron/, /mistral/ patterns to capability detection regex
- Updated tests to reflect that these model families ARE capable
- All tests passing (26/26)

**Research validation:**
- Llama 3.3/Nemotron: continuedev#1 on alignment benchmarks, Arena Hard 85.0
- Mistral: 81.2% MMLU, supports function calling and JSON mode
- Both families widely used for agent workflows with proven tool calling

**Impact:**
- Removes false warnings for users of these popular model families
- Enables proper multiEdit tool usage for capable models
- Aligns detection with real-world model capabilities

Tested with nvidia/Llama-3_3-Nemotron-Super-49B-v1 on MITRE AIP endpoints.

Authored by: Aaron Lippold <lippold@gmail.com>
@aaronlippold aaronlippold requested a review from a team as a code owner November 22, 2025 15:37
@aaronlippold aaronlippold requested review from Patrick-Erichsen and removed request for a team November 22, 2025 15:37
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 22, 2025
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlippold we've not included 7B models because they often have poor outputs and are (debatably of course) relatively on the less capable end. In our experience the CLI performs poorly with most llama models and we'd want users to be warned

Thoughts on an environment variable that hides this warning instead? Or perhaps, making it show for only a couple consecutive sessions?

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Issues and PRs Nov 26, 2025
@aaronlippold
Copy link
Contributor Author

aaronlippold commented Nov 26, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants