llmproxy

An OpenAI-compatible HTTP proxy that aggregates multiple LLM providers behind a single endpoint. Clients that speak the OpenAI API (LangChain, LiteLLM, Open WebUI, Cursor, etc.) connect to llmproxy without modification; llmproxy routes each request to the correct upstream based on a provider-prefix embedded in the model name.

File overview

llmproxy/
├── run.py                   ← start the server (no install needed)
├── llmproxy_test_client.py  ← test client (standalone, no install needed)
├── llmproxy/                ← the package
│   ├── __main__.py
│   ├── config.py
│   ├── server.py
│   └── setup_wizard.py
├── requirements.txt
├── setup.py                 ← only needed for pip install
├── Dockerfile
├── docker-compose.yml
└── config.example.json

Model naming convention

All models exposed by llmproxy follow this pattern:

<provider_name>/<upstream_model_id>

The upstream_model_id may itself contain slashes. Examples:

Proxy model string	Provider	Upstream model
`openrouter/openrouter/free`	openrouter	`openrouter/free`
`openrouter/anthropic/claude-3.5-sonnet`	openrouter	`anthropic/claude-3.5-sonnet`
`openai/gpt-4o`	openai	`gpt-4o`
`ollama/llama3`	ollama	`llama3`

The proxy strips the leading <provider_name>/ before forwarding the request to the upstream provider's base URL.

Configuration

Config is stored at ~/.config/llmproxy/config.json (or the path in $LLMPROXY_CONFIG, or the --config flag).

Schema

{
  "providers": {
    "<name>": {
      "base_url": "https://...",
      "api_key": "sk-...",
      "model_filter": ["model-a", "model-b"]
    }
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "log_level": "INFO",
    "request_timeout": 120,
    "stream_timeout": 300
  }
}

model_filter is a list of upstream model IDs to allow (without the provider prefix). Set it to null or omit it to permit all models from that provider.

See config.example.json for a complete annotated example.

Quick start — local, no install

This is the recommended path for local use. You only need flask and requests; no pip install . or pip install -e . is required.

1. Install dependencies

pip install flask requests

gunicorn is optional. If installed, the server uses it automatically for better concurrency; otherwise it falls back to the Flask development server, which is fine for local use.

pip install gunicorn   # optional

2. Configure providers

Run the interactive setup wizard. It creates ~/.config/llmproxy/config.json and prompts you for each provider's name, base URL, API key, and optional model filter.

python run.py --setup

You can re-run --setup at any time to add, edit, or remove providers.

3. Start the server

python run.py

The server binds to 0.0.0.0:8080 by default. Override host or port without editing the config:

python run.py --port 9000 --log-level DEBUG

run.py resolves its own location via os.path.abspath(__file__), so it works correctly regardless of which directory you invoke it from:

python /path/to/llmproxy/run.py --setup
python /path/to/llmproxy/run.py

4. Reconfigure at any time

python run.py --setup

The server hot-reloads config on each request (modification-time cache), so provider changes take effect immediately without a restart. Only host or port changes require a restart.

Test client

llmproxy_test_client.py is a standalone script with no dependencies beyond requests. It connects to a running llmproxy instance and exercises all endpoints, printing a pass/fail/skip report.

Basic usage

# Run all test suites against the default localhost:8080
python llmproxy_test_client.py

# Target a different host or port
python llmproxy_test_client.py --base-url http://localhost:9000/v1

# Force a specific model for chat/embedding/streaming tests
python llmproxy_test_client.py --model openrouter/openrouter/free

# Run only the structural tests (no live LLM calls required)
python llmproxy_test_client.py --suite health --suite errors

# Skip streaming (useful in environments that buffer SSE)
python llmproxy_test_client.py --no-stream

# Include OpenAI SDK compatibility test (requires: pip install openai)
python llmproxy_test_client.py --use-sdk

Test suites

Suite	What it checks	Needs provider?
`health`	`GET /health` returns 200 and lists active providers	No
`errors`	Missing model field, bad prefix, unknown provider, non-JSON body	No
`models`	`GET /v1/models` aggregates all providers; naming convention	Yes
`chat`	Non-streaming chat completion; checks response content	Yes
`streaming`	Streaming SSE chat; prints tokens live as they arrive	Yes
`embeddings`	Embedding request; accepts graceful 400/404 if unsupported	Yes
`sdk`	Same chat + stream tests via the `openai` Python package	Yes

When no --model flag is given, the client auto-selects a model from the proxy's /v1/models list, preferring names that suggest a free or small model (free, mini, flash, haiku, small, 8b, etc.).

Example output (no providers configured)

llmproxy test client
Target: http://localhost:8080/v1
───────────────────────────────────────────────────────

══ Health Check ══
  ✓ GET /health returns 200  providers=[]
  No providers configured yet. Run: python run.py --setup

══ Error Handling ══
  ✓ Missing 'model' field → 400
  ✓ Non-prefixed model string → 400
  ✓ Unknown provider → 404
  ✓ Non-JSON body → 400
  ✓ GET /health JSON schema contains 'status'

───────────────────────────────────────────────────────
Results:  6 passed  0 failed  1 skipped  / 7 total

Installation via pip (optional)

If you prefer a system-wide llmproxy command, install the package:

pip install -e .        # editable install (recommended for development)
# or
pip install .

After installation, run.py is no longer needed; use the llmproxy command directly:

llmproxy --setup
llmproxy
llmproxy --port 9000 --log-level DEBUG
llmproxy --list-providers
llmproxy --version

Docker

Build the image

docker build -t llmproxy .

First-time setup

The configuration lives in a named Docker volume (llmproxy_config) mounted at /root/.config/llmproxy inside the container. You never need to map host filesystem paths into the container.

# Interactive setup wizard (creates/updates the config volume)
docker run -it --rm \
  -v llmproxy_config:/root/.config/llmproxy \
  llmproxy --setup

Start the server

docker run -d \
  -p 8080:8080 \
  -v llmproxy_config:/root/.config/llmproxy \
  --name llmproxy \
  llmproxy

Reconfigure without stopping the server

# Run setup in a temporary container sharing the same volume
docker run -it --rm \
  -v llmproxy_config:/root/.config/llmproxy \
  llmproxy --setup

# Restart only if host or port changed; otherwise hot-reload handles it
docker restart llmproxy

docker-compose

# Build and start the server (detached)
docker-compose up -d

# First-time setup or reconfigure (interactive)
docker-compose run --rm setup

# Restart to apply host/port changes
docker-compose restart llmproxy

# View logs
docker-compose logs -f llmproxy

# Tear down containers (config volume is preserved)
docker-compose down

# Tear down everything including the config volume
docker-compose down -v

The setup service shares the llmproxy_config named volume with the server service. It is declared with profiles: [setup] so it is never started by a plain docker-compose up.

API endpoints

All endpoints mirror the OpenAI API.

Method	Path	Description
GET	`/health`	Health check; returns provider list
GET	`/v1/models`	Aggregate model list from all providers
GET	`/v1/models/<model_id>`	Single model lookup
POST	`/v1/chat/completions`	Chat completions (streaming supported)
POST	`/v1/completions`	Legacy text completions
POST	`/v1/embeddings`	Embeddings
*	`/v1/<anything>`	Pass-through to upstream (see note below)

For pass-through endpoints not listed above (e.g., /v1/audio/transcriptions), the proxy routes based on the model field in the request body. For GET/DELETE requests without a model field, append ?provider=<name> to the URL.

Client configuration examples

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-used",           # llmproxy uses the upstream key from config
)

response = client.chat.completions.create(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)

opencode

Add the following to ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",

  "plugin": [
    "opencode-lmstudio"
  ],

  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llmproxy",
      "options": {
        "baseURL": "http://localhost:8080/v1",
        "apiKey": "sk-local"
      }
    }
  }
}

The opencode-lmstudio plugin provides the @ai-sdk/openai-compatible adapter. The apiKey value is not used by llmproxy but is required by the adapter; any non-empty string works.

pi.dev

Install the pi-openai-compat plugin and point it at http://localhost:8080. No API key is required.

curl

# List all available models
curl http://localhost:8080/v1/models | jq '.data[].id'

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/openrouter/free",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

CLI reference

All flags apply equally to python run.py and the installed llmproxy command.

usage: run.py [--setup] [--config PATH] [--host HOST] [--port PORT]
              [--log-level LEVEL] [--list-providers] [--version]

  (no flags)           Start the proxy server.
  --setup              Interactive configuration wizard.
  --config PATH        Override config file location.
  --host HOST          Bind host (overrides config).
  --port PORT          Bind port (overrides config).
  --log-level LEVEL    DEBUG | INFO | WARNING | ERROR.
  --list-providers     Print configured providers and exit.
  --version            Print version and exit.

Environment variables

Variable	Purpose
`LLMPROXY_CONFIG`	Override the default config file path.

Architecture notes

The server is a thin Flask application backed by gunicorn (gthread workers) when gunicorn is installed, falling back to the Flask development server.
/v1/models queries all providers concurrently via ThreadPoolExecutor. A single unreachable provider is logged as a warning and omitted from the aggregate response rather than causing an overall failure.
Config is hot-reloaded on each request via an mtime cache; provider changes take effect without a server restart. Only host and port changes require one.
Streaming responses are relayed as raw SSE byte streams via stream_with_context, preserving upstream chunk boundaries.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llmproxy		llmproxy
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
llmproxy_test_client.py		llmproxy_test_client.py
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

llmproxy

File overview

Model naming convention

Configuration

Schema

Quick start — local, no install

1. Install dependencies

2. Configure providers

3. Start the server

4. Reconfigure at any time

Test client

Basic usage

Test suites

Example output (no providers configured)

Installation via pip (optional)

Docker

Build the image

First-time setup

Start the server

Reconfigure without stopping the server

docker-compose

API endpoints

Client configuration examples

OpenAI Python SDK

opencode

pi.dev

curl

CLI reference

Environment variables

Architecture notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages