You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Task-class routing (routing module): classify is an LLM-free, deterministic
policy that sorts a prompt into rule, slm, or heavy by its length and a
small keyword set, and TieredChatClient is a chat backend that dispatches
each request to the backend for its class (falling back to a default), so
trivial requests are answered cheaply and only hard ones reach a heavy model.
The model worker gains a tiered provider (a rule path plus SLM and heavy HTTP
models) and a --heavy-model option. classify, TaskClass, and TieredChatClient are exported.
A committed routing benchmark (benchmarks/routing_benchmark.py): a fixed
prompt set with checked-in results reporting the class distribution, the
per-prompt decision, and a verification that a tiered client dispatches each
prompt to its class. Decisions are exact and reproducible; backend latency is
out of the offline scope (the slm/heavy tiers need a live model server).