Intelligent LLM routing for Python. Stop hardcoding a single model. routeforge automatically sends each prompt to the right model based on task type, complexity, and cost — with full observability logging.
pip install routeforgefrom routeforge import LLMRouter
router = LLMRouter.from_yaml("config.yaml")
response = router.route("Write a FastAPI endpoint that streams SSE events")
print(response.content)
print(response.meta.model) # gpt-4o
print(response.meta.routing_layer) # task_classifier
print(response.meta.estimated_cost_usd) # 0.000312Different prompts need different models. A simple translation doesn't need GPT-4o. A multi-step reasoning problem shouldn't go to a 7B model. routeforge makes that decision automatically — saving cost without sacrificing quality.
| Without routeforge | With routeforge |
|---|---|
| Every prompt hits your most expensive model | Simple prompts go to cheap models automatically |
| No visibility into cost or latency | Every run logged with tokens, cost, latency |
| Locked to one provider | OpenAI, Anthropic, OpenRouter, HuggingFace, Ollama |
| Manual load balancing across API keys | Round-robin built in |
Every prompt passes through four layers in order:
Prompt
│
▼
Layer 1 — Task classifier
Detects task type via keyword/regex: code, reasoning, creative,
translation, summarisation, factual. Routes to models tagged
with that task. Picks cheap vs strong based on complexity.
│
▼ (no task match)
Layer 2 — Complexity gate
Scores prompt 0.0–1.0 using length, sentence depth, and
pattern signals. Unambiguous scores route directly to
cheap (<0.35) or strong (>0.65) tagged models.
│
▼ (ambiguous score 0.35–0.65)
Layer 3 — Meta-router
Sends the prompt to a cheap LLM (e.g. gpt-4o-mini) with a
structured prompt asking it to pick the best model from your
config. Returns JSON: model, task_type, reason.
│
▼ (fallback)
Layer 4 — Default model
Uses default_model from config. Always succeeds.
pip install routeforgeRequires Python 3.10+.
# config.yaml
default_model: gpt-4o-mini
log_path: runs.json
complexity_threshold: 0.45
models:
- name: gpt-4o-mini
model_id: gpt-4o-mini
provider: openai
api_keys:
- sk-your-key
cost_per_1k_input: 0.00015
cost_per_1k_output: 0.0006
tags: [cheap, general, summarisation, translation]
- name: gpt-4o
model_id: gpt-4o
provider: openai
api_keys:
- sk-your-key
cost_per_1k_input: 0.0025
cost_per_1k_output: 0.01
tags: [strong, reasoning, code, creative]from routeforge import LLMRouter
router = LLMRouter.from_yaml("config.yaml")
response = router.route("Summarise this paragraph in one sentence: ...")
print(response.meta.model) # gpt-4o-mini (cheap, summarisation tag)
print(response.meta.routing_layer) # task_classifier
response = router.route("Prove that sqrt(2) is irrational")
print(response.meta.model) # gpt-4o (strong, reasoning tag)
print(response.meta.routing_layer) # task_classifierfrom routeforge import LLMRouter
router = LLMRouter.from_dict({
"default_model": "mini",
"log_path": "runs.json",
"models": [
{
"name": "mini",
"model_id": "gpt-4o-mini",
"provider": "openai",
"api_keys": ["sk-your-key"],
"cost_per_1k_input": 0.00015,
"cost_per_1k_output": 0.0006,
"tags": ["cheap", "general"],
}
],
})| Provider | Value | Notes |
|---|---|---|
| OpenAI | openai |
GPT-4o, GPT-4o-mini, o1, etc. |
| Anthropic | anthropic |
Claude Sonnet, Haiku, Opus |
| OpenRouter | openrouter |
100+ models via one API key |
| HuggingFace | huggingface |
Inference API, /v1/chat/completions |
| Ollama | ollama |
Local models, no API key needed |
- name: deepseek-r1
model_id: deepseek/deepseek-r1
provider: openrouter
api_keys:
- sk-or-your-openrouter-key
cost_per_1k_input: 0.0008
cost_per_1k_output: 0.0032
tags: [strong, reasoning]- name: llama3
model_id: llama3.2
provider: ollama
base_url: http://localhost:11434
api_keys: []
cost_per_1k_input: 0.0
cost_per_1k_output: 0.0
tags: [cheap, general]Add multiple keys to any model — routeforge round-robins across them automatically:
- name: gpt-4o-mini
model_id: gpt-4o-mini
provider: openai
api_keys:
- sk-key-one
- sk-key-two
- sk-key-three
tags: [cheap]Every router.route() call is logged to a JSON file with full metadata:
for run in router.logs(last_n=5):
print(run){
"timestamp": "2026-03-28T10:42:01.123Z",
"prompt_preview": "Write a FastAPI endpoint that streams SSE...",
"model": "gpt-4o",
"provider": "openai",
"routing_layer": "task_classifier",
"routing_reason": "Task detected as 'code'; selected by complexity (0.61)",
"task_type": "code",
"complexity_score": 0.61,
"input_tokens": 48,
"output_tokens": 312,
"latency_ms": 1842.5,
"estimated_cost_usd": 0.003240
}response = router.route("your prompt")
response.content # str — model's reply
response.meta.model # str — model alias used
response.meta.provider # str — provider name
response.meta.routing_layer # "task_classifier" | "complexity" | "meta_router" | "default"
response.meta.routing_reason # str — human-readable explanation
response.meta.task_type # "code" | "reasoning" | "creative" | "translation" | "summarisation" | "factual" | "general"
response.meta.complexity_score # float 0.0–1.0
response.meta.input_tokens # int
response.meta.output_tokens # int
response.meta.latency_ms # float
response.meta.estimated_cost_usd # float| Field | Type | Default | Description |
|---|---|---|---|
default_model |
str | — | Alias of fallback model |
log_path |
str | runs.json |
Path to JSON log file |
complexity_threshold |
float | 0.5 |
Below = cheap model, above = strong |
meta_router_model |
str | cheapest tagged model | Model used for meta-routing |
models |
list | — | List of model entries |
Model entry fields:
| Field | Type | Required | Description |
|---|---|---|---|
name |
str | Yes | Alias used in routing and logs |
model_id |
str | Yes | Provider's model string |
provider |
str | Yes | openai, anthropic, openrouter, huggingface, ollama |
api_keys |
list[str] | Yes | One or more API keys |
base_url |
str | No | Override endpoint (OpenRouter, Ollama, custom) |
cost_per_1k_input |
float | No | USD per 1000 input tokens |
cost_per_1k_output |
float | No | USD per 1000 output tokens |
context_window |
int | No | Model context window size |
tags |
list[str] | No | Used for routing: cheap, strong, code, reasoning, etc. |
Use these tags on your models to enable task-aware routing:
| Tag | Triggers on |
|---|---|
code |
Python, functions, scripts, debug, refactor, FastAPI, SQL |
reasoning |
Prove, derive, calculate, logic, math, equations |
creative |
Stories, poems, blog posts, marketing copy |
translation |
Translate, French, Spanish, German, Hindi, etc. |
summarisation |
Summarise, TL;DR, shorten, key points |
factual |
What is, who is, define, explain, how does |
cheap |
Fallback for low-complexity prompts |
strong |
Fallback for high-complexity prompts |
MIT — built by NorthCommits