-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Source
"A Unified Approach to Routing and Cascading for LLMs" (arXiv:2410.10347, ICLR 2025)
https://arxiv.org/abs/2410.10347
Summary
Proves that optimal model selection combines routing (commit to one model upfront) and cascading (try cheap → escalate if quality low) into a single framework. A quality estimator scores the cheap model's output; if above threshold, stop; otherwise escalate. Outperforms pure routing or pure cascading by ~4% on RouterBench with 80% relative improvement over naive baselines.
Applicability to Zeph
Relevance: HIGH. Zeph uses Thompson sampling + RAPS for routing but the cascade strategy is a separate code path. Unifying them — so Thompson sampling can decide to cascade rather than commit — would reduce cost without degrading quality on hard tasks.
Currently:
routing = "cascade"is static (chain order = cost order)routing = "thompson"commits upfront with no fallback on quality
A unified approach: let Thompson pick the primary provider, but attach a lightweight confidence check that can trigger a cascade escalation.
Implementation sketch
- Add
quality_gate: Option<f32>toRouterConfig(confidence threshold) - After a Thompson/EMA selection, run a fast self-assessment probe
- If probe score < quality_gate, escalate to next provider in pool
- Log cascade events with reason (
thompson_quality_fallback)