Skip to content

BillJr99/llmproxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmproxy

An OpenAI-compatible HTTP proxy that aggregates multiple LLM providers behind a single endpoint. Clients that speak the OpenAI API (LangChain, LiteLLM, Open WebUI, Cursor, etc.) connect to llmproxy without modification; llmproxy routes each request to the correct upstream based on a provider-prefix embedded in the model name.


File overview

llmproxy/
├── run.py                   ← start the server (no install needed)
├── llmproxy_test_client.py  ← test client (standalone, no install needed)
├── llmproxy/                ← the package
│   ├── __main__.py
│   ├── config.py
│   ├── server.py
│   └── setup_wizard.py
├── requirements.txt
├── setup.py                 ← only needed for pip install
├── Dockerfile
├── docker-compose.yml
└── config.example.json

Model naming convention

All models exposed by llmproxy follow this pattern:

<provider_name>/<upstream_model_id>

The upstream_model_id may itself contain slashes. Examples:

Proxy model string Provider Upstream model
openrouter/openrouter/free openrouter openrouter/free
openrouter/anthropic/claude-3.5-sonnet openrouter anthropic/claude-3.5-sonnet
openai/gpt-4o openai gpt-4o
ollama/llama3 ollama llama3

The proxy strips the leading <provider_name>/ before forwarding the request to the upstream provider's base URL.


Configuration

Config is stored at ~/.config/llmproxy/config.json (or the path in $LLMPROXY_CONFIG, or the --config flag).

Schema

{
  "providers": {
    "<name>": {
      "base_url": "https://...",
      "api_key": "sk-...",
      "model_filter": ["model-a", "model-b"]
    }
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "log_level": "INFO",
    "request_timeout": 120,
    "stream_timeout": 300
  }
}

model_filter is a list of upstream model IDs to allow (without the provider prefix). Set it to null or omit it to permit all models from that provider.

See config.example.json for a complete annotated example.


Quick start — local, no install

This is the recommended path for local use. You only need flask and requests; no pip install . or pip install -e . is required.

1. Install dependencies

pip install flask requests

gunicorn is optional. If installed, the server uses it automatically for better concurrency; otherwise it falls back to the Flask development server, which is fine for local use.

pip install gunicorn   # optional

2. Configure providers

Run the interactive setup wizard. It creates ~/.config/llmproxy/config.json and prompts you for each provider's name, base URL, API key, and optional model filter.

python run.py --setup

You can re-run --setup at any time to add, edit, or remove providers.

3. Start the server

python run.py

The server binds to 0.0.0.0:8080 by default. Override host or port without editing the config:

python run.py --port 9000 --log-level DEBUG

run.py resolves its own location via os.path.abspath(__file__), so it works correctly regardless of which directory you invoke it from:

python /path/to/llmproxy/run.py --setup
python /path/to/llmproxy/run.py

4. Reconfigure at any time

python run.py --setup

The server hot-reloads config on each request (modification-time cache), so provider changes take effect immediately without a restart. Only host or port changes require a restart.


Test client

llmproxy_test_client.py is a standalone script with no dependencies beyond requests. It connects to a running llmproxy instance and exercises all endpoints, printing a pass/fail/skip report.

Basic usage

# Run all test suites against the default localhost:8080
python llmproxy_test_client.py

# Target a different host or port
python llmproxy_test_client.py --base-url http://localhost:9000/v1

# Force a specific model for chat/embedding/streaming tests
python llmproxy_test_client.py --model openrouter/openrouter/free

# Run only the structural tests (no live LLM calls required)
python llmproxy_test_client.py --suite health --suite errors

# Skip streaming (useful in environments that buffer SSE)
python llmproxy_test_client.py --no-stream

# Include OpenAI SDK compatibility test (requires: pip install openai)
python llmproxy_test_client.py --use-sdk

Test suites

Suite What it checks Needs provider?
health GET /health returns 200 and lists active providers No
errors Missing model field, bad prefix, unknown provider, non-JSON body No
models GET /v1/models aggregates all providers; naming convention Yes
chat Non-streaming chat completion; checks response content Yes
streaming Streaming SSE chat; prints tokens live as they arrive Yes
embeddings Embedding request; accepts graceful 400/404 if unsupported Yes
sdk Same chat + stream tests via the openai Python package Yes

When no --model flag is given, the client auto-selects a model from the proxy's /v1/models list, preferring names that suggest a free or small model (free, mini, flash, haiku, small, 8b, etc.).

Example output (no providers configured)

llmproxy test client
Target: http://localhost:8080/v1
───────────────────────────────────────────────────────

══ Health Check ══
  ✓ GET /health returns 200  providers=[]
  No providers configured yet. Run: python run.py --setup

══ Error Handling ══
  ✓ Missing 'model' field → 400
  ✓ Non-prefixed model string → 400
  ✓ Unknown provider → 404
  ✓ Non-JSON body → 400
  ✓ GET /health JSON schema contains 'status'

───────────────────────────────────────────────────────
Results:  6 passed  0 failed  1 skipped  / 7 total

Installation via pip (optional)

If you prefer a system-wide llmproxy command, install the package:

pip install -e .        # editable install (recommended for development)
# or
pip install .

After installation, run.py is no longer needed; use the llmproxy command directly:

llmproxy --setup
llmproxy
llmproxy --port 9000 --log-level DEBUG
llmproxy --list-providers
llmproxy --version

Docker

Build the image

docker build -t llmproxy .

First-time setup

The configuration lives in a named Docker volume (llmproxy_config) mounted at /root/.config/llmproxy inside the container. You never need to map host filesystem paths into the container.

# Interactive setup wizard (creates/updates the config volume)
docker run -it --rm \
  -v llmproxy_config:/root/.config/llmproxy \
  llmproxy --setup

Start the server

docker run -d \
  -p 8080:8080 \
  -v llmproxy_config:/root/.config/llmproxy \
  --name llmproxy \
  llmproxy

Reconfigure without stopping the server

# Run setup in a temporary container sharing the same volume
docker run -it --rm \
  -v llmproxy_config:/root/.config/llmproxy \
  llmproxy --setup

# Restart only if host or port changed; otherwise hot-reload handles it
docker restart llmproxy

docker-compose

# Build and start the server (detached)
docker-compose up -d

# First-time setup or reconfigure (interactive)
docker-compose run --rm setup

# Restart to apply host/port changes
docker-compose restart llmproxy

# View logs
docker-compose logs -f llmproxy

# Tear down containers (config volume is preserved)
docker-compose down

# Tear down everything including the config volume
docker-compose down -v

The setup service shares the llmproxy_config named volume with the server service. It is declared with profiles: [setup] so it is never started by a plain docker-compose up.


API endpoints

All endpoints mirror the OpenAI API.

Method Path Description
GET /health Health check; returns provider list
GET /v1/models Aggregate model list from all providers
GET /v1/models/<model_id> Single model lookup
POST /v1/chat/completions Chat completions (streaming supported)
POST /v1/completions Legacy text completions
POST /v1/embeddings Embeddings
* /v1/<anything> Pass-through to upstream (see note below)

For pass-through endpoints not listed above (e.g., /v1/audio/transcriptions), the proxy routes based on the model field in the request body. For GET/DELETE requests without a model field, append ?provider=<name> to the URL.


Client configuration examples

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-used",           # llmproxy uses the upstream key from config
)

response = client.chat.completions.create(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)

opencode

Add the following to ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",

  "plugin": [
    "opencode-lmstudio"
  ],

  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llmproxy",
      "options": {
        "baseURL": "http://localhost:8080/v1",
        "apiKey": "sk-local"
      }
    }
  }
}

The opencode-lmstudio plugin provides the @ai-sdk/openai-compatible adapter. The apiKey value is not used by llmproxy but is required by the adapter; any non-empty string works.

pi.dev

Install the pi-openai-compat plugin and point it at http://localhost:8080. No API key is required.

curl

# List all available models
curl http://localhost:8080/v1/models | jq '.data[].id'

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/openrouter/free",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

CLI reference

All flags apply equally to python run.py and the installed llmproxy command.

usage: run.py [--setup] [--config PATH] [--host HOST] [--port PORT]
              [--log-level LEVEL] [--list-providers] [--version]

  (no flags)           Start the proxy server.
  --setup              Interactive configuration wizard.
  --config PATH        Override config file location.
  --host HOST          Bind host (overrides config).
  --port PORT          Bind port (overrides config).
  --log-level LEVEL    DEBUG | INFO | WARNING | ERROR.
  --list-providers     Print configured providers and exit.
  --version            Print version and exit.

Environment variables

Variable Purpose
LLMPROXY_CONFIG Override the default config file path.

Architecture notes

  • The server is a thin Flask application backed by gunicorn (gthread workers) when gunicorn is installed, falling back to the Flask development server.
  • /v1/models queries all providers concurrently via ThreadPoolExecutor. A single unreachable provider is logged as a warning and omitted from the aggregate response rather than causing an overall failure.
  • Config is hot-reloaded on each request via an mtime cache; provider changes take effect without a server restart. Only host and port changes require one.
  • Streaming responses are relayed as raw SSE byte streams via stream_with_context, preserving upstream chunk boundaries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors