Smart LLM request router — automatically route prompts to the right model based on cost, speed, quality or complexity.
pip install routerllmYou have access to multiple LLMs — cheap ones, fast ones, powerful ones. But you use the same model for everything. Simple questions go to GPT-4o (overkill, expensive). Complex analysis goes to GPT-4o-mini (not enough, bad output).
routerllm fixes this automatically.
from routerllm import Router
router = Router(strategy="complexity")
router.add("openai", "gpt-4o-mini", api_key="sk-...") # cheap, fast
router.add("openai", "gpt-4o", api_key="sk-...") # powerful, expensive
# Simple → gpt-4o-mini
result = router.complete("What is 2+2?")
print(result.model) # "gpt-4o-mini"
print(result.cost_usd) # ~$0.000002
# Complex → gpt-4o
result = router.complete(
"Analyze the long-term macroeconomic impact of AI on developing nations, "
"comparing Keynesian and neoclassical perspectives."
)
print(result.model) # "gpt-4o"
print(result.routing.reason) # "Complex prompt (score: 0.78) → most powerful model"| Strategy | Description |
|---|---|
complexity |
Analyze prompt complexity → route accordingly (default) |
cost |
Always use cheapest model |
quality |
Always use most powerful model |
speed |
Always use fastest model |
# Set default strategy
router = Router(strategy="cost")
# Override per call
result = router.complete("Explain quantum entanglement", strategy="quality")decision = router.dry_run("What is machine learning?")
print(decision)
# {
# "would_use": "openai/gpt-4o-mini",
# "strategy": "complexity",
# "reason": "Simple prompt (score: 0.28) → cheapest model",
# "complexity_score": 0.28,
# "estimated_cost_per_1k": 0.00075
# }router = Router(strategy="complexity")
router.add("openai", "gpt-4o-mini", api_key="sk-...")
router.add("openai", "gpt-4o", api_key="sk-...")
router.add("anthropic", "claude-haiku-4", api_key="sk-ant-...")
router.add("anthropic", "claude-opus-4", api_key="sk-ant-...")
router.add("gemini", "gemini-2.5-flash", api_key="AIza...")result.output # LLM response text
result.model # Model that was used
result.provider # Provider that was used
result.cost_usd # Cost in USD
result.tokens_in # Input tokens
result.tokens_out # Output tokens
result.latency_ms # Response time in ms
result.success # True if no error
result.routing.reason # Why this model was chosen
result.routing.complexity_score # 0.0-1.0MIT