routeforge

Intelligent LLM routing for Python. Stop hardcoding a single model. routeforge automatically sends each prompt to the right model based on task type, complexity, and cost — with full observability logging.

pip install routeforge

from routeforge import LLMRouter

router = LLMRouter.from_yaml("config.yaml")
response = router.route("Write a FastAPI endpoint that streams SSE events")

print(response.content)
print(response.meta.model)               # gpt-4o
print(response.meta.routing_layer)       # task_classifier
print(response.meta.estimated_cost_usd)  # 0.000312

Why routeforge?

Different prompts need different models. A simple translation doesn't need GPT-4o. A multi-step reasoning problem shouldn't go to a 7B model. routeforge makes that decision automatically — saving cost without sacrificing quality.

Without routeforge	With routeforge
Every prompt hits your most expensive model	Simple prompts go to cheap models automatically
No visibility into cost or latency	Every run logged with tokens, cost, latency
Locked to one provider	OpenAI, Anthropic, OpenRouter, HuggingFace, Ollama
Manual load balancing across API keys	Round-robin built in

How routing works

Every prompt passes through four layers in order:

Prompt
  │
  ▼
Layer 1 — Task classifier
  Detects task type via keyword/regex: code, reasoning, creative,
  translation, summarisation, factual. Routes to models tagged
  with that task. Picks cheap vs strong based on complexity.
  │
  ▼ (no task match)
Layer 2 — Complexity gate
  Scores prompt 0.0–1.0 using length, sentence depth, and
  pattern signals. Unambiguous scores route directly to
  cheap (<0.35) or strong (>0.65) tagged models.
  │
  ▼ (ambiguous score 0.35–0.65)
Layer 3 — Meta-router
  Sends the prompt to a cheap LLM (e.g. gpt-4o-mini) with a
  structured prompt asking it to pick the best model from your
  config. Returns JSON: model, task_type, reason.
  │
  ▼ (fallback)
Layer 4 — Default model
  Uses default_model from config. Always succeeds.

Installation

pip install routeforge

Requires Python 3.10+.

Quick start

1. Create a config file

# config.yaml
default_model: gpt-4o-mini
log_path: runs.json
complexity_threshold: 0.45

models:
  - name: gpt-4o-mini
    model_id: gpt-4o-mini
    provider: openai
    api_keys:
      - sk-your-key
    cost_per_1k_input: 0.00015
    cost_per_1k_output: 0.0006
    tags: [cheap, general, summarisation, translation]

  - name: gpt-4o
    model_id: gpt-4o
    provider: openai
    api_keys:
      - sk-your-key
    cost_per_1k_input: 0.0025
    cost_per_1k_output: 0.01
    tags: [strong, reasoning, code, creative]

2. Route prompts

from routeforge import LLMRouter

router = LLMRouter.from_yaml("config.yaml")

response = router.route("Summarise this paragraph in one sentence: ...")
print(response.meta.model)           # gpt-4o-mini  (cheap, summarisation tag)
print(response.meta.routing_layer)   # task_classifier

response = router.route("Prove that sqrt(2) is irrational")
print(response.meta.model)           # gpt-4o  (strong, reasoning tag)
print(response.meta.routing_layer)   # task_classifier

3. Or configure inline

from routeforge import LLMRouter

router = LLMRouter.from_dict({
    "default_model": "mini",
    "log_path": "runs.json",
    "models": [
        {
            "name": "mini",
            "model_id": "gpt-4o-mini",
            "provider": "openai",
            "api_keys": ["sk-your-key"],
            "cost_per_1k_input": 0.00015,
            "cost_per_1k_output": 0.0006,
            "tags": ["cheap", "general"],
        }
    ],
})

Supported providers

Provider	Value	Notes
OpenAI	`openai`	GPT-4o, GPT-4o-mini, o1, etc.
Anthropic	`anthropic`	Claude Sonnet, Haiku, Opus
OpenRouter	`openrouter`	100+ models via one API key
HuggingFace	`huggingface`	Inference API, `/v1/chat/completions`
Ollama	`ollama`	Local models, no API key needed

OpenRouter example

- name: deepseek-r1
  model_id: deepseek/deepseek-r1
  provider: openrouter
  api_keys:
    - sk-or-your-openrouter-key
  cost_per_1k_input: 0.0008
  cost_per_1k_output: 0.0032
  tags: [strong, reasoning]

Ollama (local) example

- name: llama3
  model_id: llama3.2
  provider: ollama
  base_url: http://localhost:11434
  api_keys: []
  cost_per_1k_input: 0.0
  cost_per_1k_output: 0.0
  tags: [cheap, general]

Load balancing across API keys

Add multiple keys to any model — routeforge round-robins across them automatically:

- name: gpt-4o-mini
  model_id: gpt-4o-mini
  provider: openai
  api_keys:
    - sk-key-one
    - sk-key-two
    - sk-key-three
  tags: [cheap]

Observability

Every router.route() call is logged to a JSON file with full metadata:

for run in router.logs(last_n=5):
    print(run)

{
  "timestamp": "2026-03-28T10:42:01.123Z",
  "prompt_preview": "Write a FastAPI endpoint that streams SSE...",
  "model": "gpt-4o",
  "provider": "openai",
  "routing_layer": "task_classifier",
  "routing_reason": "Task detected as 'code'; selected by complexity (0.61)",
  "task_type": "code",
  "complexity_score": 0.61,
  "input_tokens": 48,
  "output_tokens": 312,
  "latency_ms": 1842.5,
  "estimated_cost_usd": 0.003240
}

RouteResponse reference

response = router.route("your prompt")

response.content                      # str — model's reply
response.meta.model                   # str — model alias used
response.meta.provider                # str — provider name
response.meta.routing_layer           # "task_classifier" | "complexity" | "meta_router" | "default"
response.meta.routing_reason          # str — human-readable explanation
response.meta.task_type               # "code" | "reasoning" | "creative" | "translation" | "summarisation" | "factual" | "general"
response.meta.complexity_score        # float 0.0–1.0
response.meta.input_tokens            # int
response.meta.output_tokens           # int
response.meta.latency_ms              # float
response.meta.estimated_cost_usd      # float

Config reference

Field	Type	Default	Description
`default_model`	str	—	Alias of fallback model
`log_path`	str	`runs.json`	Path to JSON log file
`complexity_threshold`	float	`0.5`	Below = cheap model, above = strong
`meta_router_model`	str	cheapest tagged model	Model used for meta-routing
`models`	list	—	List of model entries

Model entry fields:

Field	Type	Required	Description
`name`	str	Yes	Alias used in routing and logs
`model_id`	str	Yes	Provider's model string
`provider`	str	Yes	`openai`, `anthropic`, `openrouter`, `huggingface`, `ollama`
`api_keys`	list[str]	Yes	One or more API keys
`base_url`	str	No	Override endpoint (OpenRouter, Ollama, custom)
`cost_per_1k_input`	float	No	USD per 1000 input tokens
`cost_per_1k_output`	float	No	USD per 1000 output tokens
`context_window`	int	No	Model context window size
`tags`	list[str]	No	Used for routing: `cheap`, `strong`, `code`, `reasoning`, etc.

Task tags

Use these tags on your models to enable task-aware routing:

Tag	Triggers on
`code`	Python, functions, scripts, debug, refactor, FastAPI, SQL
`reasoning`	Prove, derive, calculate, logic, math, equations
`creative`	Stories, poems, blog posts, marketing copy
`translation`	Translate, French, Spanish, German, Hindi, etc.
`summarisation`	Summarise, TL;DR, shorten, key points
`factual`	What is, who is, define, explain, how does
`cheap`	Fallback for low-complexity prompts
`strong`	Fallback for high-complexity prompts

License

MIT — built by NorthCommits

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
routeforge		routeforge
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

routeforge

Why routeforge?

How routing works

Installation

Quick start

1. Create a config file

2. Route prompts

3. Or configure inline

Supported providers

OpenRouter example

Ollama (local) example

Load balancing across API keys

Observability

RouteResponse reference

Config reference

Task tags

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

routeforge

Why routeforge?

How routing works

Installation

Quick start

1. Create a config file

2. Route prompts

3. Or configure inline

Supported providers

OpenRouter example

Ollama (local) example

Load balancing across API keys

Observability

RouteResponse reference

Config reference

Task tags

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages