LLM Proxy

A lightweight reverse proxy for LLM APIs. It routes requests to different upstream providers, rewrites model names and auth tokens, and translates between API formats — all driven by a single YAML config file.

Features

Proxy mode: forward requests to any LLM API, rewriting the model name and auth token transparently.
Translation mode: convert between API formats on the fly (e.g., use Claude from an OpenAI-compatible client, or use GPT-4o from Claude Code).
Streaming support: SSE responses are streamed chunk-by-chunk with zero buffering.
Config-driven routing: add a new provider or model alias by editing the config file — no code changes needed.
Kubernetes-ready: includes a Dockerfile for container deployment.

Supported API Dialects

Dialect ID	API	Used by
`openai-chat`	Chat Completions `/v1/chat/completions`	OpenAI SDK, most OpenAI-compatible clients
`openai-responses`	Responses API `/v1/responses`	Codex CLI (2025)
`claude`	Messages API `/v1/messages`	Claude Code, Anthropic SDK
`bedrock`	AWS Bedrock InvokeModel	AWS SDK, Bedrock clients

Quick Start

1. Install

git clone <repo>
cd python-llm-proxy
pip install .

2. Configure

Copy the example config and fill in your credentials:

cp config/config.example.yaml config/config.yaml

3. Run

llm-proxy --config config/config.yaml

The server starts on port 8080 by default. A /healthz endpoint returns 200 OK for health checks.

Configuration

server:
  port: 8080

models:
  - frontend_model_name: <client-facing model name>
    frontend_base_url: <route prefix>           # must start with "/" (e.g., "/openai")
    frontend_dialect: ""                        # set for translation (see below)
    backend_model_name: <real upstream model name>
    backend_base_url: <upstream base URL>
    backend_api_key: ""                         # leave empty to pass client token through
    backend_auth_header: Authorization          # header used to send the token upstream
    backend_auth_schema: Bearer                 # token prefix; empty = raw token
    backend_dialect: ""                         # set for translation (see below)
    system_replacements: {}                     # old→new text substitutions on system prompt
    system_append: ""                           # text appended to system prompt
    inject_progress: false                      # inject session progress into system prompt

Proxy Mode (no translation)

Forward requests as-is, only rewriting the model name and auth token:

models:
  - frontend_model_name: gpt-4-proxy
    frontend_base_url: /openai
    backend_model_name: gpt-4
    backend_base_url: https://api.openai.com
    backend_api_key: "sk-..."
    backend_auth_header: Authorization
    backend_auth_schema: Bearer

Client points at http://localhost:8080/openai/v1/chat/completions with model gpt-4-proxy. The proxy forwards to https://api.openai.com/v1/chat/completions with model gpt-4.

Translation Mode

Set both frontend_dialect and backend_dialect to enable API format translation:

models:
  # Use Claude from any OpenAI Chat client:
  - frontend_model_name: claude-via-openai
    frontend_base_url: /openai-compat
    frontend_dialect: openai-chat
    backend_model_name: claude-3-5-sonnet-20241022
    backend_base_url: https://api.anthropic.com
    backend_api_key: "sk-ant-..."
    backend_auth_header: x-api-key
    backend_auth_schema: ""
    backend_dialect: claude

  # Use GPT-4o from Claude Code:
  - frontend_model_name: gpt4-via-claude
    frontend_base_url: /claude-compat
    frontend_dialect: claude
    backend_model_name: gpt-4o
    backend_base_url: https://api.openai.com
    backend_api_key: "sk-..."
    backend_auth_header: Authorization
    backend_auth_schema: Bearer
    backend_dialect: openai-responses

Config Fields

Field	Required	Description
`frontend_model_name`	Yes	Model name the client sends
`frontend_base_url`	Yes	URL path prefix for this group (e.g., `/openai` → routes `/openai/*`)
`frontend_dialect`	No	API format from the client. Required for translation.
`backend_model_name`	Yes	Actual model name sent to upstream
`backend_base_url`	Yes	Real upstream base URL
`backend_api_key`	No	If set, replaces the client's token; if empty, client token is forwarded
`backend_auth_header`	No	Header name for the auth token
`backend_auth_schema`	No	Token prefix (e.g., `Bearer`). Empty = raw token
`backend_dialect`	No	API format for the upstream. Required for translation.
`system_replacements`	No	Map of old→new text substitutions applied to the system prompt
`system_append`	No	Text appended to the system prompt after replacements
`inject_progress`	No	If true, injects a session progress summary into the system prompt

How Translation Works

When both dialects are configured, the proxy:

Decodes the incoming request into a dialect-neutral canonical format.
Applies system prompt modifications (replacements, append, progress injection).
Re-encodes it into the upstream dialect's format.
Sends it to the upstream and decodes the response back to canonical.
Re-encodes the response into the incoming dialect's format for the client.

For streaming, this happens event-by-event with no full-response buffering.

Routing

Routes are registered dynamically at startup based on the unique frontend_base_url values in the config. The URL structure is:

http://localhost:8080/<frontend_base_url>/<api-path>

For example, with frontend_base_url: /openai, configure your client's base URL as http://localhost:8080/openai.

Auth

Scenario	Behavior
`backend_api_key` is set	All client auth headers are stripped; the configured token is sent upstream
`backend_api_key` is empty	All client headers (including auth) are forwarded unchanged

Development

pip install -e ".[dev]"    # install with dev dependencies
pytest                     # run all tests
llm-proxy --config config/config.yaml

See docs/ for architecture, development, and deployment documentation.

Deployment

A Dockerfile is provided. See docs/deploy/deployment.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
docs		docs
llm_proxy		llm_proxy
tests		tests
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Proxy

Features

Supported API Dialects

Quick Start

Configuration

Proxy Mode (no translation)

Translation Mode

Config Fields

How Translation Works

Routing

Auth

Development

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Proxy

Features

Supported API Dialects

Quick Start

Configuration

Proxy Mode (no translation)

Translation Mode

Config Fields

How Translation Works

Routing

Auth

Development

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages