-
Notifications
You must be signed in to change notification settings - Fork 1k
Auto Combo
Self-managing model chains with adaptive scoring + zero-config auto-routing
NEW: No combo creation required. Use
auto/prefix directly in any client.
| Model ID | Variant | Behavior |
|---|---|---|
auto |
default | All connected providers, LKGP strategy, balanced weights |
auto/coding |
coding | Quality-first weights, suitable for code generation |
auto/fast |
fast | Low-latency weighted selection |
auto/cheap |
cheap | Cost-optimized routing (lowest cost first) |
auto/offline |
offline | Favors providers with highest quota availability |
auto/smart |
smart | Quality-first + higher exploration rate (10%) for better model discovery |
auto/lkgp |
lkgp | Explicit LKGP (same as default auto) |
How to use:
# Any IDE or CLI tool that supports OpenAI format
Base URL: http://localhost:20128/v1
API Key: <your-endpoint-key>
# In your code/config, set model to:
model: "auto" # balanced default
model: "auto/coding" # best for coding tasks
model: "auto/fast" # fastest available
model: "auto/cheap" # cheapest per tokenWhat happens:
- OmniRoute detects
auto/prefix insrc/sse/handlers/chat.ts - Queries all active provider connections from the database
- Filters to those with valid credentials (API key or OAuth token)
- Determines the model per connection (
connection.defaultModelor provider's first model) - Builds a virtual combo in-memory (not stored in DB)
- Routes using the selected variant's weight profile + LKGP strategy
Key properties:
- ✅ Always-on: No toggle, no combo creation, no configuration needed
- ✅ Dynamic: Reflects current connected providers automatically
- ✅ Session stickiness: LKGP ensures last successful provider is prioritized
- ✅ Multi-account aware: Each provider connection becomes a separate candidate
- ✅ No DB writes: Virtual combo exists only for the request, zero persistence overhead
Behind the scenes:
Request: { model: "auto/coding" }
↓
src/sse/handlers/chat.ts detects prefix
↓
createVirtualAutoCombo('coding') → candidatePool from active connections
↓
handleComboChat (same engine as persisted combos)
↓
Auto-scoring selects best provider/model per requestImplementation files:
| File | Purpose |
|---|---|
open-sse/services/autoCombo/autoPrefix.ts |
Prefix parser (parseAutoPrefix) |
open-sse/services/autoCombo/virtualFactory.ts |
Creates virtual AutoComboConfig objects |
open-sse/services/autoCombo/providerRegistryAccessor.ts |
Test hook for mocking provider registry |
src/sse/handlers/chat.ts |
Integration: auto prefix short-circuit |
src/shared/constants/providers.ts |
SYSTEM_PROVIDERS.auto system entry |
The Auto-Combo Engine dynamically selects the best provider/model for each request using a 9-factor scoring function (defined in open-sse/services/autoCombo/scoring.ts → DEFAULT_WEIGHTS). All weights sum to 1.0.
Source: diagrams/auto-combo-9factor.mmd
| Factor | Default Weight | Description |
|---|---|---|
health |
0.22 | Health score from circuit breaker (CLOSED=1.0, HALF_OPEN=0.5, OPEN=0.0) |
quota |
0.17 | Remaining quota / rate-limit headroom [0..1] |
costInv |
0.17 | Inverse blended cost (60% input + 40% output token price, normalized) — cheaper = higher score |
latencyInv |
0.13 | Inverse p95 latency normalized to pool — faster = higher score |
taskFit |
0.08 | Task-type fitness (coding, review, planning, analysis, debugging, docs) |
specificityMatch |
0.08 | Match between request specificity (manifest hint) and model tier |
stability |
0.05 | Variance-based stability (low latency stdDev / error rate) |
tierPriority |
0.05 | Account-tier priority — Ultra=1.0, Pro=0.67, Standard=0.33, Free=0.0 |
tierAffinity |
0.05 | Affinity between the candidate's tier and the manifest-recommended tier |
Sum: 0.22 + 0.17 + 0.17 + 0.13 + 0.08 + 0.08 + 0.05 + 0.05 + 0.05 = 1.0 (validated by validateWeights()).
Four pre-defined weight profiles in open-sse/services/autoCombo/modePacks.ts. Each pack overrides the default weights to bias selection toward a specific goal. Below are the full weight tables per pack (each row sums to 1.0).
| Factor | ship-fast | cost-saver | quality-first | offline-friendly |
|---|---|---|---|---|
| quota | 0.15 | 0.15 | 0.10 | 0.40 |
| health | 0.30 | 0.20 | 0.20 | 0.30 |
| costInv | 0.05 | 0.40 | 0.05 | 0.10 |
| latencyInv | 0.35 | 0.05 | 0.05 | 0.05 |
| taskFit | 0.10 | 0.10 | 0.40 | 0.00 |
| stability | 0.00 | 0.05 | 0.15 | 0.10 |
| tierPriority | 0.05 | 0.05 | 0.05 | 0.05 |
Notes:
-
tierAffinityandspecificityMatchare not set in mode packs —calculateScore()treats them as?? 0when absent. - Each pack's emphasis at a glance:
- ship-fast → latencyInv 0.35 + health 0.30 (low-latency, healthy connections)
- cost-saver → costInv 0.40 (cheapest tokens win)
- quality-first → taskFit 0.40 + stability 0.15 (best model for the task, consistent)
- offline-friendly → quota 0.40 + health 0.30 (max headroom regardless of speed/cost)
OmniRoute's combo engine supports 14 routing strategies (declared in src/shared/constants/routingStrategies.ts → ROUTING_STRATEGY_VALUES). The Auto Combo engine itself is exposed under the auto strategy; the others are available for persisted combos.
| Strategy | Description |
|---|---|
priority |
First-target ordered list with explicit priority |
weighted |
Weighted random by per-target weight |
round-robin |
Cycle through targets in order |
context-relay |
Hand off context across targets (long conversations) |
fill-first |
Fill each target's quota before moving to next |
p2c |
Power-of-2-choices random load balancing |
random |
Uniform random selection |
least-used |
Pick target with lowest current load |
cost-optimized |
Minimize $ per request given catalog pricing |
reset-aware ⭐ |
Prioritize by quota reset time — short reset windows ranked higher |
strict-random |
Random without deduplication of repeats |
auto |
Use Auto Combo scoring (9-factor) — recommended |
lkgp |
Last-Known-Good Path (sticky route to last successful target) |
context-optimized |
Pick target with best fit for current context size |
⭐ = New in v3.8.0
The Auto Combo engine doesn't require pre-defined combos. Instead, open-sse/services/autoCombo/virtualFactory.ts builds candidates on-the-fly:
- Pulls
getProviderConnections({ isActive: true })(all enabled connections) - Filters to those with valid credentials (API key or non-expired OAuth token via
hasUsableOAuthToken()) - Cross-references with
getProviderRegistry()for model availability + pricing - For each tuple
(provider, model, connection), builds aVirtualAutoComboCandidate - Picks
connection.defaultModel(or the registry's first model) as the dispatch target - Scores each candidate using the 9-factor
scorePool()and the variant's weight pack - Returns the resulting in-memory
AutoComboConfigforhandleComboChat()— never persisted to DB
This means adding a new provider with auto/* enabled automatically expands the candidate pool — no manual combo editing needed. The virtual combo is rebuilt per request, so newly-added or newly-healthy connections are picked up immediately.
- Temporary exclusion: Score < 0.2 → excluded for 5 min (progressive backoff, max 30 min)
- Circuit breaker awareness: OPEN → auto-excluded; HALF_OPEN → probe requests
- Incident mode: >50% OPEN → disable exploration, maximize stability
- Cooldown recovery: After exclusion, first request is a "probe" with reduced timeout
5% of requests (configurable) are routed to random providers for exploration. Disabled in incident mode.
There is no dedicated POST /api/combos/auto endpoint — Auto-Combo is consumed in two ways:
-
Zero-config (recommended): Send any chat completion request with
model: "auto"ormodel: "auto/<variant>". The virtual factory builds the combo per request — no persistence, no API calls needed. -
Persisted combo with
strategy: "auto": Create a regular combo viaPOST /api/combosand setstrategy: "auto"plusconfig.auto.weights/config.auto.candidatePool. The same scoring engine is used; the combo is stored incombosand reusable by ID.
# Zero-config usage (no combo creation)
curl -X POST http://localhost:20128/v1/chat/completions \
-H "Authorization: Bearer <key>" \
-H "Content-Type: application/json" \
-d '{"model":"auto/coding","messages":[{"role":"user","content":"Hello"}]}'
# Persisted auto combo via the regular combos endpoint
curl -X POST http://localhost:20128/api/combos \
-H "Content-Type: application/json" \
-d '{"id":"my-auto","name":"Auto Coder","strategy":"auto","config":{"auto":{"candidatePool":["anthropic","google","openai"],"weights":{"quota":0.15,"health":0.3,"costInv":0.05,"latencyInv":0.35,"taskFit":0.1,"stability":0,"tierPriority":0.05}}}}'30+ models scored across 6 task types (coding, review, planning, analysis, debugging, documentation). Supports wildcard patterns (e.g., *-coder → high coding score).
Including the bare auto (default) plus the 6 AutoVariant values declared in autoPrefix.ts, there are 7 invokable model IDs:
auto, auto/coding, auto/fast, auto/cheap, auto/offline, auto/smart, auto/lkgp
(AutoVariant itself enumerates 6 values; the 7th option is "no variant" — bare auto — handled by parseAutoPrefix() as variant: undefined.)
The 9-factor scoring function (open-sse/services/autoCombo/scoring.ts) treats tier
membership as one signal via the tierPriority weight. Default weights (from DEFAULT_WEIGHTS):
| Factor | Default weight | Notes |
|---|---|---|
| Tier priority | 0.05 | Tier 1 premium → higher score |
| Latency (p50 inverse) | 0.35 | Fastest wins |
| Cost ($/1M inverse) | 0.20 | Cheapest blended price wins (60% input + 40% output ratio) |
| Recent health/error rate | 0.15 | Unhealthy deprioritized |
| Quota remaining | 0.10 | Near-exhausted deprioritized |
| Context window match | 0.08 | Penalizes short windows |
| Task fitness | 0.10 | Coding → coding-specialist models |
| Stability | 0.00 | Disabled by default |
Tier alone does not force Tier 1 first — if Tier 1 latency is bad or
cost-vs-quality is suboptimal, Tier 2 wins. To force tier ordering, use combo
strategy priority and arrange providers by tier.
To strongly favor Tier 1 (subscription), increase tierPriority weight:
{
"strategy": "auto",
"config": { "auto": { "weights": { "tierPriority": 0.3, "costInv": 0.05 } } }
}See docs/marketing/TIERS.md for tier definitions and provider classification.
| File | Purpose |
|---|---|
open-sse/services/autoCombo/scoring.ts |
9-factor scoring function, DEFAULT_WEIGHTS, pool norm |
open-sse/services/autoCombo/taskFitness.ts |
Model × task fitness lookup |
open-sse/services/autoCombo/engine.ts |
Selection logic, bandit, budget cap |
open-sse/services/autoCombo/selfHealing.ts |
Exclusion, probes, incident mode |
open-sse/services/autoCombo/modePacks.ts |
4 weight profiles (ship-fast, cost-saver, quality-first, offline-friendly) |
open-sse/services/autoCombo/autoPrefix.ts |
auto/ prefix parser + 6 variants |
open-sse/services/autoCombo/virtualFactory.ts |
Builds in-memory AutoComboConfig from live connections |
open-sse/services/autoCombo/providerRegistryAccessor.ts |
Test hook for mocking provider registry |
src/shared/constants/routingStrategies.ts |
ROUTING_STRATEGY_VALUES (14 strategies) |
src/sse/handlers/chat.ts |
Integration: auto-prefix short-circuit |
OmniRoute · Website · npm · Docker Hub
- Setup Guide
- User Guide
- Features
- Quick Start (Docker)
- Electron Desktop App
- Termux (Android)
- PWA Guide
- MCP Server
- A2A Server
- Agent Protocols
- OpenCode Plugin
- Webhooks
- Cloud Agents
- Skills
- Memory
- Evals
- Gamification
- Guardrails
- Compliance
- Error Sanitization
- Public Credentials
- Route Guard Tiers
- Stealth Guide
- CLI Token Auth