Skip to content

Add capability-aware routing & failover (tools/vision/reasoning/json)#33

Merged
BillJr99 merged 1 commit into
mainfrom
claude/hermes-free-models-tool-calls-24M2k
May 29, 2026
Merged

Add capability-aware routing & failover (tools/vision/reasoning/json)#33
BillJr99 merged 1 commit into
mainfrom
claude/hermes-free-models-tool-calls-24M2k

Conversation

@BillJr99
Copy link
Copy Markdown
Owner

Summary

Adds comprehensive capability-aware routing and failover to llmproxy, allowing requests to be intelligently routed to models that support specific capabilities (tools, vision, reasoning, json) and automatically failing over when a model claims to support a capability but doesn't deliver it in the response.

Key Changes

Core Capability Detection & Routing

  • Capability detectors: Implemented pure detector functions for each capability:
    • _request_has_tools(), _tool_use_forced(), _response_has_tool_call() for tools
    • _request_has_image() for vision
    • _request_wants_reasoning() for reasoning
    • _request_wants_json(), _response_is_json() for json
  • Capability metadata: Added _CAPABILITIES dict mapping each capability to its request detector, strict detector (for forced cases), and response validator
  • Model tagging: New model_capabilities config field allows tagging models with supported capabilities (case-insensitive, auto-populated from scrapers)
  • Proactive ordering: _order_by_capability() reorders candidates so models supporting needed capabilities are tried first, with fallback to unknown-capability models
  • Reactive failover: Modified _proxy_cycling_non_streaming() to detect when a 200 response failed to deliver a forced capability (e.g., tool_choice: "required" but no tool calls) and automatically try the next candidate

Virtual Endpoints

  • Added _CAPABILITY_VIRTUALS constant defining capability-based virtual models
  • New virtual endpoints: llmproxy__tools, llmproxy__vision, llmproxy__tools/free, llmproxy__vision/free (and legacy llmproxy/ forms)
  • Implemented _get_capability_model_candidates() and _get_capability_free_candidates() selectors
  • Updated _get_virtual_candidates() to dispatch capability-based virtual models
  • Virtual models appear in /v1/models list when at least one model is tagged with that capability

Configuration & Setup

  • Added model_capabilities field to config schema (optional, top-level object)
  • Setup wizard now includes "Tag model with capabilities" and "Remove capability tag" menu options
  • Defensive parsing: _model_capabilities() handles missing/malformed config gracefully

Scraper Integration

  • OpenRouter source now extracts capabilities from supported_parameters (tools/reasoning/structured_outputs) and architecture.input_modalities (image → vision)
  • Scraper aggregation merges capabilities from high-confidence sources
  • apply_updates() stores capabilities only for free-tier models (parallel to free_limits)
  • Capabilities are dropped when models are removed from the free set
  • Config sync is add-only: user-set capabilities are never overwritten by scraper updates

Testing

  • Comprehensive test suite in tests/test_capabilities.py covering:
    • All detector functions (tools, vision, reasoning, json)
    • Capability ordering and failover logic
    • Virtual endpoint dispatch
    • Defensive config parsing
  • Scraper tests verify capabilities are only stored for free models and dropped on removal

Notable Implementation Details

  • Safe defaults: Response validators return True on malformed JSON or unexpected shapes to avoid spurious failover on unparseable responses
  • Streaming limitation: Reactive 200-body capability checks only apply to non-streaming requests; streaming still benefits from proactive ordering but cannot inspect delta chunks without buffering
  • Tool choice semantics: tool_choice: "auto" or "none" never trigger failover even without tool calls (model may legitimately answer without tools)
  • Stable reordering: Capability ordering never drops candidates, so incomplete metadata never causes hard failures
  • Provider/model lookup: Capability lookups support both bare model IDs and full provider/model forms, matching existing model_reasoning behavior

https://claude.ai/code/session_019YMQmPWsAUtALVqqY9FHPo

Virtual models previously only failed over on HTTP status >= 400, so a free
model that returns 200 while silently ignoring tools/function calls looked like
a success and broke tool-using clients. This adds a general model-capability
framework (tools / vision / reasoning / json):

- Runtime (server.py): a capability registry with request detectors, strict
  detectors, and response validators. Virtual-model requests now (1) proactively
  prefer candidates that support the needed capabilities via a stable reorder
  that never drops candidates, and (2) reactively fail over when a *forced*
  capability isn't delivered by a 200 (tools forced but no tool_calls; JSON mode
  but non-JSON body). Reactive 200-body detection is non-streaming only;
  vision/reasoning rely on existing HTTP-error failover.
- New capability virtual endpoints: llmproxy__tools, llmproxy__tools/free,
  llmproxy__vision, llmproxy__vision/free (plus legacy llmproxy/ forms),
  advertised in /v1/models when backing candidates exist, with config hints.
- New optional model_capabilities config map (model -> [caps]), threaded through
  the scraper pipeline (Evidence, OpenRouter supported_parameters/input_modalities,
  aggregate/apply_updates/regenerate/reconcile), providers.py, config.example.json,
  and the setup wizard (tag/remove/view + auto-populate). Empty = full backward
  compat; reconcile is add-only so hand-set tags are never pruned.
- Docs: README capability section + endpoints + model_capabilities field.
- Tests: detectors, ordering, reactive failover, virtual dispatch/hints, and
  pipeline (OpenRouter mapping, apply_updates, add-only reconcile, wizard shape).

https://claude.ai/code/session_019YMQmPWsAUtALVqqY9FHPo
@BillJr99 BillJr99 merged commit e97e21e into main May 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants