An OpenAI-compatible HTTP proxy that aggregates multiple LLM providers behind a single endpoint. Clients that speak the OpenAI API (LangChain, LiteLLM, Open WebUI, Cursor, etc.) connect to llmproxy without modification; llmproxy routes each request to the correct upstream based on a provider-prefix embedded in the model name.
llmproxy/
├── run.py ← start the server (no install needed)
├── llmproxy_test_client.py ← test client (standalone, no install needed)
├── llmproxy/ ← the package
│ ├── __main__.py
│ ├── config.py
│ ├── server.py
│ └── setup_wizard.py
├── requirements.txt
├── setup.py ← only needed for pip install
├── Dockerfile
├── docker-compose.yml
└── config.example.json
All models exposed by llmproxy follow this pattern:
<provider_name>/<upstream_model_id>
The upstream_model_id may itself contain slashes. Examples:
| Proxy model string | Provider | Upstream model |
|---|---|---|
openrouter/openrouter/free |
openrouter | openrouter/free |
openrouter/anthropic/claude-3.5-sonnet |
openrouter | anthropic/claude-3.5-sonnet |
openai/gpt-4o |
openai | gpt-4o |
ollama/llama3 |
ollama | llama3 |
The proxy strips the leading <provider_name>/ before forwarding the request to
the upstream provider's base URL.
Config is stored at ~/.config/llmproxy/config.json (or the path in
$LLMPROXY_CONFIG, or the --config flag).
{
"providers": {
"<name>": {
"base_url": "https://...",
"api_key": "sk-...",
"model_filter": ["model-a", "model-b"]
}
},
"server": {
"host": "0.0.0.0",
"port": 8080,
"log_level": "INFO",
"request_timeout": 120,
"stream_timeout": 300
}
}model_filter is a list of upstream model IDs to allow (without the provider
prefix). Set it to null or omit it to permit all models from that provider.
See config.example.json for a complete annotated example.
This is the recommended path for local use. You only need flask and
requests; no pip install . or pip install -e . is required.
pip install flask requestsgunicorn is optional. If installed, the server uses it automatically for
better concurrency; otherwise it falls back to the Flask development server,
which is fine for local use.
pip install gunicorn # optionalRun the interactive setup wizard. It creates ~/.config/llmproxy/config.json
and prompts you for each provider's name, base URL, API key, and optional model
filter.
python run.py --setupYou can re-run --setup at any time to add, edit, or remove providers.
python run.pyThe server binds to 0.0.0.0:8080 by default. Override host or port without
editing the config:
python run.py --port 9000 --log-level DEBUGrun.py resolves its own location via os.path.abspath(__file__), so it works
correctly regardless of which directory you invoke it from:
python /path/to/llmproxy/run.py --setup
python /path/to/llmproxy/run.pypython run.py --setupThe server hot-reloads config on each request (modification-time cache), so
provider changes take effect immediately without a restart. Only host or
port changes require a restart.
llmproxy_test_client.py is a standalone script with no dependencies beyond
requests. It connects to a running llmproxy instance and exercises all
endpoints, printing a pass/fail/skip report.
# Run all test suites against the default localhost:8080
python llmproxy_test_client.py
# Target a different host or port
python llmproxy_test_client.py --base-url http://localhost:9000/v1
# Force a specific model for chat/embedding/streaming tests
python llmproxy_test_client.py --model openrouter/openrouter/free
# Run only the structural tests (no live LLM calls required)
python llmproxy_test_client.py --suite health --suite errors
# Skip streaming (useful in environments that buffer SSE)
python llmproxy_test_client.py --no-stream
# Include OpenAI SDK compatibility test (requires: pip install openai)
python llmproxy_test_client.py --use-sdk| Suite | What it checks | Needs provider? |
|---|---|---|
health |
GET /health returns 200 and lists active providers |
No |
errors |
Missing model field, bad prefix, unknown provider, non-JSON body | No |
models |
GET /v1/models aggregates all providers; naming convention |
Yes |
chat |
Non-streaming chat completion; checks response content | Yes |
streaming |
Streaming SSE chat; prints tokens live as they arrive | Yes |
embeddings |
Embedding request; accepts graceful 400/404 if unsupported | Yes |
sdk |
Same chat + stream tests via the openai Python package |
Yes |
When no --model flag is given, the client auto-selects a model from the
proxy's /v1/models list, preferring names that suggest a free or small model
(free, mini, flash, haiku, small, 8b, etc.).
llmproxy test client
Target: http://localhost:8080/v1
───────────────────────────────────────────────────────
══ Health Check ══
✓ GET /health returns 200 providers=[]
No providers configured yet. Run: python run.py --setup
══ Error Handling ══
✓ Missing 'model' field → 400
✓ Non-prefixed model string → 400
✓ Unknown provider → 404
✓ Non-JSON body → 400
✓ GET /health JSON schema contains 'status'
───────────────────────────────────────────────────────
Results: 6 passed 0 failed 1 skipped / 7 total
If you prefer a system-wide llmproxy command, install the package:
pip install -e . # editable install (recommended for development)
# or
pip install .After installation, run.py is no longer needed; use the llmproxy command
directly:
llmproxy --setup
llmproxy
llmproxy --port 9000 --log-level DEBUG
llmproxy --list-providers
llmproxy --versiondocker build -t llmproxy .The configuration lives in a named Docker volume (llmproxy_config) mounted at
/root/.config/llmproxy inside the container. You never need to map host
filesystem paths into the container.
# Interactive setup wizard (creates/updates the config volume)
docker run -it --rm \
-v llmproxy_config:/root/.config/llmproxy \
llmproxy --setupdocker run -d \
-p 8080:8080 \
-v llmproxy_config:/root/.config/llmproxy \
--name llmproxy \
llmproxy# Run setup in a temporary container sharing the same volume
docker run -it --rm \
-v llmproxy_config:/root/.config/llmproxy \
llmproxy --setup
# Restart only if host or port changed; otherwise hot-reload handles it
docker restart llmproxy# Build and start the server (detached)
docker-compose up -d
# First-time setup or reconfigure (interactive)
docker-compose run --rm setup
# Restart to apply host/port changes
docker-compose restart llmproxy
# View logs
docker-compose logs -f llmproxy
# Tear down containers (config volume is preserved)
docker-compose down
# Tear down everything including the config volume
docker-compose down -vThe setup service shares the llmproxy_config named volume with the server
service. It is declared with profiles: [setup] so it is never started by a
plain docker-compose up.
All endpoints mirror the OpenAI API.
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check; returns provider list |
| GET | /v1/models |
Aggregate model list from all providers |
| GET | /v1/models/<model_id> |
Single model lookup |
| POST | /v1/chat/completions |
Chat completions (streaming supported) |
| POST | /v1/completions |
Legacy text completions |
| POST | /v1/embeddings |
Embeddings |
| * | /v1/<anything> |
Pass-through to upstream (see note below) |
For pass-through endpoints not listed above (e.g., /v1/audio/transcriptions),
the proxy routes based on the model field in the request body. For
GET/DELETE requests without a model field, append ?provider=<name> to the URL.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-used", # llmproxy uses the upstream key from config
)
response = client.chat.completions.create(
model="openrouter/anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hello!"}],
)Add the following to ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"plugin": [
"opencode-lmstudio"
],
"provider": {
"lmstudio": {
"npm": "@ai-sdk/openai-compatible",
"name": "llmproxy",
"options": {
"baseURL": "http://localhost:8080/v1",
"apiKey": "sk-local"
}
}
}
}The opencode-lmstudio plugin provides the @ai-sdk/openai-compatible adapter.
The apiKey value is not used by llmproxy but is required by the adapter; any
non-empty string works.
Install the pi-openai-compat
plugin and point it at http://localhost:8080. No API key is required.
# List all available models
curl http://localhost:8080/v1/models | jq '.data[].id'
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openrouter/openrouter/free",
"messages": [{"role": "user", "content": "Hello!"}]
}'All flags apply equally to python run.py and the installed llmproxy command.
usage: run.py [--setup] [--config PATH] [--host HOST] [--port PORT]
[--log-level LEVEL] [--list-providers] [--version]
(no flags) Start the proxy server.
--setup Interactive configuration wizard.
--config PATH Override config file location.
--host HOST Bind host (overrides config).
--port PORT Bind port (overrides config).
--log-level LEVEL DEBUG | INFO | WARNING | ERROR.
--list-providers Print configured providers and exit.
--version Print version and exit.
| Variable | Purpose |
|---|---|
LLMPROXY_CONFIG |
Override the default config file path. |
- The server is a thin Flask application backed by gunicorn (gthread workers) when gunicorn is installed, falling back to the Flask development server.
/v1/modelsqueries all providers concurrently viaThreadPoolExecutor. A single unreachable provider is logged as a warning and omitted from the aggregate response rather than causing an overall failure.- Config is hot-reloaded on each request via an mtime cache; provider changes
take effect without a server restart. Only
hostandportchanges require one. - Streaming responses are relayed as raw SSE byte streams via
stream_with_context, preserving upstream chunk boundaries.