A lightweight reverse proxy for LLM APIs. It routes requests to different upstream providers, rewrites model names and auth tokens, and translates between API formats — all driven by a single YAML config file.
- Proxy mode: forward requests to any LLM API, rewriting the model name and auth token transparently.
- Translation mode: convert between API formats on the fly (e.g., use Claude from an OpenAI-compatible client, or use GPT-4o from Claude Code).
- Streaming support: SSE responses are streamed chunk-by-chunk with zero buffering.
- Config-driven routing: add a new provider or model alias by editing the config file — no code changes needed.
- Kubernetes-ready: includes a Dockerfile for container deployment.
| Dialect ID | API | Used by |
|---|---|---|
openai-chat |
Chat Completions /v1/chat/completions |
OpenAI SDK, most OpenAI-compatible clients |
openai-responses |
Responses API /v1/responses |
Codex CLI (2025) |
claude |
Messages API /v1/messages |
Claude Code, Anthropic SDK |
bedrock |
AWS Bedrock InvokeModel | AWS SDK, Bedrock clients |
1. Install
git clone <repo>
cd python-llm-proxy
pip install .2. Configure
Copy the example config and fill in your credentials:
cp config/config.example.yaml config/config.yaml3. Run
llm-proxy --config config/config.yamlThe server starts on port 8080 by default. A /healthz endpoint returns 200 OK for health checks.
server:
port: 8080
models:
- frontend_model_name: <client-facing model name>
frontend_base_url: <route prefix> # must start with "/" (e.g., "/openai")
frontend_dialect: "" # set for translation (see below)
backend_model_name: <real upstream model name>
backend_base_url: <upstream base URL>
backend_api_key: "" # leave empty to pass client token through
backend_auth_header: Authorization # header used to send the token upstream
backend_auth_schema: Bearer # token prefix; empty = raw token
backend_dialect: "" # set for translation (see below)
system_replacements: {} # old→new text substitutions on system prompt
system_append: "" # text appended to system prompt
inject_progress: false # inject session progress into system promptForward requests as-is, only rewriting the model name and auth token:
models:
- frontend_model_name: gpt-4-proxy
frontend_base_url: /openai
backend_model_name: gpt-4
backend_base_url: https://api.openai.com
backend_api_key: "sk-..."
backend_auth_header: Authorization
backend_auth_schema: BearerClient points at http://localhost:8080/openai/v1/chat/completions with model gpt-4-proxy. The proxy forwards to https://api.openai.com/v1/chat/completions with model gpt-4.
Set both frontend_dialect and backend_dialect to enable API format translation:
models:
# Use Claude from any OpenAI Chat client:
- frontend_model_name: claude-via-openai
frontend_base_url: /openai-compat
frontend_dialect: openai-chat
backend_model_name: claude-3-5-sonnet-20241022
backend_base_url: https://api.anthropic.com
backend_api_key: "sk-ant-..."
backend_auth_header: x-api-key
backend_auth_schema: ""
backend_dialect: claude
# Use GPT-4o from Claude Code:
- frontend_model_name: gpt4-via-claude
frontend_base_url: /claude-compat
frontend_dialect: claude
backend_model_name: gpt-4o
backend_base_url: https://api.openai.com
backend_api_key: "sk-..."
backend_auth_header: Authorization
backend_auth_schema: Bearer
backend_dialect: openai-responses| Field | Required | Description |
|---|---|---|
frontend_model_name |
Yes | Model name the client sends |
frontend_base_url |
Yes | URL path prefix for this group (e.g., /openai → routes /openai/*) |
frontend_dialect |
No | API format from the client. Required for translation. |
backend_model_name |
Yes | Actual model name sent to upstream |
backend_base_url |
Yes | Real upstream base URL |
backend_api_key |
No | If set, replaces the client's token; if empty, client token is forwarded |
backend_auth_header |
No | Header name for the auth token |
backend_auth_schema |
No | Token prefix (e.g., Bearer). Empty = raw token |
backend_dialect |
No | API format for the upstream. Required for translation. |
system_replacements |
No | Map of old→new text substitutions applied to the system prompt |
system_append |
No | Text appended to the system prompt after replacements |
inject_progress |
No | If true, injects a session progress summary into the system prompt |
When both dialects are configured, the proxy:
- Decodes the incoming request into a dialect-neutral canonical format.
- Applies system prompt modifications (replacements, append, progress injection).
- Re-encodes it into the upstream dialect's format.
- Sends it to the upstream and decodes the response back to canonical.
- Re-encodes the response into the incoming dialect's format for the client.
For streaming, this happens event-by-event with no full-response buffering.
Routes are registered dynamically at startup based on the unique frontend_base_url values in the config. The URL structure is:
http://localhost:8080/<frontend_base_url>/<api-path>
For example, with frontend_base_url: /openai, configure your client's base URL as http://localhost:8080/openai.
| Scenario | Behavior |
|---|---|
backend_api_key is set |
All client auth headers are stripped; the configured token is sent upstream |
backend_api_key is empty |
All client headers (including auth) are forwarded unchanged |
pip install -e ".[dev]" # install with dev dependencies
pytest # run all tests
llm-proxy --config config/config.yamlSee docs/ for architecture, development, and deployment documentation.
A Dockerfile is provided. See docs/deploy/deployment.md for details.