LLM Gateway for cost-optimized routing across local (Ollama), OpenAI, and Anthropic models.
BSGateway sits in front of LiteLLM Proxy and intercepts every chat completion request via async_pre_call_hook. Based on request complexity, it routes to the cheapest capable model:
Client Request
|
v
[LiteLLM Proxy] --> [BSGateway Hook]
|
+---------+---------+
| | |
simple medium complex
| | |
local/glm-4.7-flash gpt-5-mini claude-opus
Four routing methods:
- Passthrough - known model names go directly (auto-derived from
model_list) - Alias - shorthand names resolve to specific models (
auto-> complexity routing) - Pattern match - glob patterns auto-route matching models (
claude-*catches any Claude Code model) - Auto-route - classifier scores complexity 0-100, maps to tier
| Strategy | How | When |
|---|---|---|
static |
Weighted keyword/token/structure heuristics | Fast, no external dependency |
llm |
Local Ollama classifies in ~1 word, falls back to static | Default, best accuracy |
ml |
sklearn model (stub, trained from collected data) | Future |
cp .env.example .env
# Fill in API keys
docker compose upThe gateway starts on http://localhost:4000.
Single file: gateway.yaml
- Add a model: add to
model_list- routing auto-recognizes it as passthrough - Add an alias: add to
routing.aliases - Change classifier: set
routing.classifier.strategytostatic,llm, orml
# Use via OpenAI-compatible API
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "hello"}]}'
# "auto" -> complexity-based routing
# "claude-sonnet-4-6" -> matched by "claude-*" pattern, auto-routed
# "claude-opus" -> passthrough (defined in model_list)- Add a pattern: add to
routing.auto_route_patterns(fnmatch glob syntax)
Every auto-routed request is logged to PostgreSQL (routing_logs table) for ML training:
- Original text + system prompt (for classification validation)
- Numeric features (token count, code blocks, conversation turns, etc.)
- Classification labels (tier, strategy, score)
- Optional embedding vector (via Ollama
qwen3-embedding)
bsgateway/
core/
config.py # pydantic-settings (env vars)
logging.py # structlog JSON config
routing/
hook.py # LiteLLM callback + config loader + BSGatewayRouter
models.py # Dataclasses (TierConfig, RoutingDecision, etc.)
collector.py # PostgreSQL logger (asyncpg)
classifiers/
base.py # Protocol + text extraction utils
static.py # Weighted heuristic classifier
llm.py # Ollama-based classifier
ml.py # sklearn stub
sql/
schema.sql # PostgreSQL DDL
queries.sql # Named queries (-- name: pattern)
tests/ # pytest-asyncio, 64 tests
gateway.yaml # Unified config (LiteLLM + routing)
# Install
pip install -e ".[dev]"
# Test
pytest bsgateway/tests/ -v
# Coverage
pytest bsgateway/tests/ --cov=bsgateway --cov-fail-under=80
# Lint
ruff check bsgateway/