Add supports_tools capability gate for LLM model selection#596
Conversation
Implement a capability gate in model selection to prevent runtime crashes caused by models that do not support OpenAI-compatible tool calling. - Extend `ModelEntry` and `ProviderDescriptor` contracts with capability metadata. - Implement tool-capability heuristics in `LlmProviderRegistry`. - Enforce validation in `PUT /api/llm/active-config` with `?force=true` bypass. - Export context window heuristics from `LLMClient` to `ModelCatalogService`. - Regenerate JSON schemas to stay in sync with frontend. Closes #557, follows #571. Co-authored-by: arn0ld87 <212052432+arn0ld87@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request implements a capability gate to ensure that selected LLM models support tool calling, preventing potential simulation crashes. Key changes include the introduction of a ModelEntry contract, metadata enrichment in the ModelCatalogService (including tool support, JSON mode, and context window), and a heuristic-based tool support check in the LlmProviderRegistry. Review feedback suggests addressing potential latency from synchronous catalog fetching in the API handler and refining the tool-capability heuristics to exclude legacy models that do not support the required features.
| if not force: | ||
| resolver = SecretResolver() | ||
| api_key = resolver.get_api_key(provider_id, provider.type) | ||
| models = _model_catalog.get_models(provider_id, provider.type, provider.base_url, api_key) |
There was a problem hiding this comment.
Calling _model_catalog.get_models synchronously within a request handler can be problematic. If the provider's /models endpoint is slow or unresponsive, this PUT request will block a Flask worker for up to the 5-second timeout defined in ModelCatalogService. While there is a cache, the first request or an expired cache will introduce significant latency. Consider if this validation can be performed against a background-refreshed catalog or if the latency is acceptable for this administrative endpoint.
| if provider_type in ("openai", "google", "github_copilot"): | ||
| return True |
There was a problem hiding this comment.
The heuristic for the openai provider type is overly optimistic. Legacy models like gpt-3.5-turbo-instruct or very old versions (e.g., gpt-3.5-turbo-0301) do not support OpenAI-compatible tool calling. Returning True for all models from these providers might still lead to the simulation hangs this PR aims to prevent. Additionally, ensure that Ollama-specific parameters (like 'think' or 'num_ctx') are not included in the extra_body for non-Ollama platforms, as this often results in 400 'Unknown parameter' errors. Consider applying the family whitelist or an explicit blacklist for instruct models even for natively capable providers.
References
- Avoid including Ollama-specific parameters (such as 'think' or 'num_ctx') in the 'extra_body' for non-Ollama platforms like OpenAI or Anthropic gateways to prevent 400 errors.
- Avoid relying on legacy heuristic functions for platform-specific capabilities; prefer explicit checks or inline construction to ensure correct parameter handling.
This PR implements a capability-gate for LLM model selection, specifically targeting tool-calling support. This addresses issue #557 where choosing a model like
ministral-3:14bwould cause simulation hangs due to incompatible function calling.Key changes:
supports_tools,supports_json_mode, andcontext_windowto theModelEntryPydantic model.LlmProviderRegistry.is_model_tool_capable, whitelisting modern model families and explicitly gating known problematic models.PUT /api/llm/active-configto verify model capabilities before saving. It returns422 Unprocessable Entitywith codeunsupported_capabilityif a model lacks tool support, unless overridden with?force=true.ModelCatalogServicenow populates these capability fields during discovery, utilizing the newly exported context window heuristics fromLLMClient.backend/tests/api/test_llm_active.pyto cover the new validation logic and bypass mechanism.Schemas have been regenerated and verified via
dump_schemas --check.Fixes #576
PR created automatically by Jules for task 12753529996915194420 started by @arn0ld87