Skip to content

research(routing): unified routing+cascading framework (ICLR 2025) #2165

@bug-ops

Description

@bug-ops

Source

"A Unified Approach to Routing and Cascading for LLMs" (arXiv:2410.10347, ICLR 2025)
https://arxiv.org/abs/2410.10347

Summary

Proves that optimal model selection combines routing (commit to one model upfront) and cascading (try cheap → escalate if quality low) into a single framework. A quality estimator scores the cheap model's output; if above threshold, stop; otherwise escalate. Outperforms pure routing or pure cascading by ~4% on RouterBench with 80% relative improvement over naive baselines.

Applicability to Zeph

Relevance: HIGH. Zeph uses Thompson sampling + RAPS for routing but the cascade strategy is a separate code path. Unifying them — so Thompson sampling can decide to cascade rather than commit — would reduce cost without degrading quality on hard tasks.

Currently:

  • routing = "cascade" is static (chain order = cost order)
  • routing = "thompson" commits upfront with no fallback on quality

A unified approach: let Thompson pick the primary provider, but attach a lightweight confidence check that can trigger a cascade escalation.

Implementation sketch

  • Add quality_gate: Option<f32> to RouterConfig (confidence threshold)
  • After a Thompson/EMA selection, run a fast self-assessment probe
  • If probe score < quality_gate, escalate to next provider in pool
  • Log cascade events with reason (thompson_quality_fallback)

Complexity: MEDIUM

Metadata

Metadata

Assignees

No one assigned

    Labels

    P4Long-term / exploratoryresearchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions