Skip to content

Model Routing

CortexPrism edited this page Jun 17, 2026 · 1 revision

Model Routing

CortexPrism supports two model routing strategies that wrap LLM providers. Both implement LLMProvider, making them drop-in replacements for any provider.

Cascade Router

Tries the cheapest provider first, escalates on low confidence.

Provider 1 (cheapest)
  → estimateConfidence(text) — multi-signal heuristic
    (hedging, vagueness, repetition, specificity, length)
  → confidence < threshold?
    → Provider 2 (next cheapest)
      → ...
        → Return last result if all exhausted

Configuration

{
  "router": {
    "enabled": true,
    "strategy": "cascade",
    "confidenceThreshold": 0.7,
    "cascade": [
      { "provider": "ollama", "model": "llama3.2:3b" },
      { "provider": "anthropic", "model": "claude-haiku-4-5" },
      { "provider": "anthropic", "model": "claude-sonnet-4-5" }
    ]
  }
}

Threshold Router (RouteLLM-style)

Scores the user's prompt before generating, then routes to strong or weak model based on complexity signals.

Complexity Signals

  • Code block detection
  • Question length
  • Reasoning keywords
  • Task-specific vocabulary

Configuration

{
  "router": {
    "enabled": true,
    "strategy": "threshold",
    "confidenceThreshold": 0.5,
    "threshold": {
      "strongProvider": "anthropic",
      "strongModel": "claude-sonnet-4-5",
      "weakProvider": "ollama",
      "weakModel": "llama3.2:3b",
      "scorer": "heuristic"
    }
  }
}

Router Metrics

Available via router.getMetrics():

  • Decisions per model
  • Total cost and savings
  • Per-model token counts

See Also

Clone this wiki locally