Skip to content

NorthCommits/llmrouter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

routeforge

Intelligent LLM routing for Python. Stop hardcoding a single model. routeforge automatically sends each prompt to the right model based on task type, complexity, and cost — with full observability logging.

pip install routeforge
from routeforge import LLMRouter

router = LLMRouter.from_yaml("config.yaml")
response = router.route("Write a FastAPI endpoint that streams SSE events")

print(response.content)
print(response.meta.model)               # gpt-4o
print(response.meta.routing_layer)       # task_classifier
print(response.meta.estimated_cost_usd)  # 0.000312

Why routeforge?

Different prompts need different models. A simple translation doesn't need GPT-4o. A multi-step reasoning problem shouldn't go to a 7B model. routeforge makes that decision automatically — saving cost without sacrificing quality.

Without routeforge With routeforge
Every prompt hits your most expensive model Simple prompts go to cheap models automatically
No visibility into cost or latency Every run logged with tokens, cost, latency
Locked to one provider OpenAI, Anthropic, OpenRouter, HuggingFace, Ollama
Manual load balancing across API keys Round-robin built in

How routing works

Every prompt passes through four layers in order:

Prompt
  │
  ▼
Layer 1 — Task classifier
  Detects task type via keyword/regex: code, reasoning, creative,
  translation, summarisation, factual. Routes to models tagged
  with that task. Picks cheap vs strong based on complexity.
  │
  ▼ (no task match)
Layer 2 — Complexity gate
  Scores prompt 0.0–1.0 using length, sentence depth, and
  pattern signals. Unambiguous scores route directly to
  cheap (<0.35) or strong (>0.65) tagged models.
  │
  ▼ (ambiguous score 0.35–0.65)
Layer 3 — Meta-router
  Sends the prompt to a cheap LLM (e.g. gpt-4o-mini) with a
  structured prompt asking it to pick the best model from your
  config. Returns JSON: model, task_type, reason.
  │
  ▼ (fallback)
Layer 4 — Default model
  Uses default_model from config. Always succeeds.

Installation

pip install routeforge

Requires Python 3.10+.


Quick start

1. Create a config file

# config.yaml
default_model: gpt-4o-mini
log_path: runs.json
complexity_threshold: 0.45

models:
  - name: gpt-4o-mini
    model_id: gpt-4o-mini
    provider: openai
    api_keys:
      - sk-your-key
    cost_per_1k_input: 0.00015
    cost_per_1k_output: 0.0006
    tags: [cheap, general, summarisation, translation]

  - name: gpt-4o
    model_id: gpt-4o
    provider: openai
    api_keys:
      - sk-your-key
    cost_per_1k_input: 0.0025
    cost_per_1k_output: 0.01
    tags: [strong, reasoning, code, creative]

2. Route prompts

from routeforge import LLMRouter

router = LLMRouter.from_yaml("config.yaml")

response = router.route("Summarise this paragraph in one sentence: ...")
print(response.meta.model)           # gpt-4o-mini  (cheap, summarisation tag)
print(response.meta.routing_layer)   # task_classifier

response = router.route("Prove that sqrt(2) is irrational")
print(response.meta.model)           # gpt-4o  (strong, reasoning tag)
print(response.meta.routing_layer)   # task_classifier

3. Or configure inline

from routeforge import LLMRouter

router = LLMRouter.from_dict({
    "default_model": "mini",
    "log_path": "runs.json",
    "models": [
        {
            "name": "mini",
            "model_id": "gpt-4o-mini",
            "provider": "openai",
            "api_keys": ["sk-your-key"],
            "cost_per_1k_input": 0.00015,
            "cost_per_1k_output": 0.0006,
            "tags": ["cheap", "general"],
        }
    ],
})

Supported providers

Provider Value Notes
OpenAI openai GPT-4o, GPT-4o-mini, o1, etc.
Anthropic anthropic Claude Sonnet, Haiku, Opus
OpenRouter openrouter 100+ models via one API key
HuggingFace huggingface Inference API, /v1/chat/completions
Ollama ollama Local models, no API key needed

OpenRouter example

- name: deepseek-r1
  model_id: deepseek/deepseek-r1
  provider: openrouter
  api_keys:
    - sk-or-your-openrouter-key
  cost_per_1k_input: 0.0008
  cost_per_1k_output: 0.0032
  tags: [strong, reasoning]

Ollama (local) example

- name: llama3
  model_id: llama3.2
  provider: ollama
  base_url: http://localhost:11434
  api_keys: []
  cost_per_1k_input: 0.0
  cost_per_1k_output: 0.0
  tags: [cheap, general]

Load balancing across API keys

Add multiple keys to any model — routeforge round-robins across them automatically:

- name: gpt-4o-mini
  model_id: gpt-4o-mini
  provider: openai
  api_keys:
    - sk-key-one
    - sk-key-two
    - sk-key-three
  tags: [cheap]

Observability

Every router.route() call is logged to a JSON file with full metadata:

for run in router.logs(last_n=5):
    print(run)
{
  "timestamp": "2026-03-28T10:42:01.123Z",
  "prompt_preview": "Write a FastAPI endpoint that streams SSE...",
  "model": "gpt-4o",
  "provider": "openai",
  "routing_layer": "task_classifier",
  "routing_reason": "Task detected as 'code'; selected by complexity (0.61)",
  "task_type": "code",
  "complexity_score": 0.61,
  "input_tokens": 48,
  "output_tokens": 312,
  "latency_ms": 1842.5,
  "estimated_cost_usd": 0.003240
}

RouteResponse reference

response = router.route("your prompt")

response.content                      # str — model's reply
response.meta.model                   # str — model alias used
response.meta.provider                # str — provider name
response.meta.routing_layer           # "task_classifier" | "complexity" | "meta_router" | "default"
response.meta.routing_reason          # str — human-readable explanation
response.meta.task_type               # "code" | "reasoning" | "creative" | "translation" | "summarisation" | "factual" | "general"
response.meta.complexity_score        # float 0.0–1.0
response.meta.input_tokens            # int
response.meta.output_tokens           # int
response.meta.latency_ms              # float
response.meta.estimated_cost_usd      # float

Config reference

Field Type Default Description
default_model str Alias of fallback model
log_path str runs.json Path to JSON log file
complexity_threshold float 0.5 Below = cheap model, above = strong
meta_router_model str cheapest tagged model Model used for meta-routing
models list List of model entries

Model entry fields:

Field Type Required Description
name str Yes Alias used in routing and logs
model_id str Yes Provider's model string
provider str Yes openai, anthropic, openrouter, huggingface, ollama
api_keys list[str] Yes One or more API keys
base_url str No Override endpoint (OpenRouter, Ollama, custom)
cost_per_1k_input float No USD per 1000 input tokens
cost_per_1k_output float No USD per 1000 output tokens
context_window int No Model context window size
tags list[str] No Used for routing: cheap, strong, code, reasoning, etc.

Task tags

Use these tags on your models to enable task-aware routing:

Tag Triggers on
code Python, functions, scripts, debug, refactor, FastAPI, SQL
reasoning Prove, derive, calculate, logic, math, equations
creative Stories, poems, blog posts, marketing copy
translation Translate, French, Spanish, German, Hindi, etc.
summarisation Summarise, TL;DR, shorten, key points
factual What is, who is, define, explain, how does
cheap Fallback for low-complexity prompts
strong Fallback for high-complexity prompts

License

MIT — built by NorthCommits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages