Skip to content

bulldba/python-llm-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Proxy

A lightweight reverse proxy for LLM APIs. It routes requests to different upstream providers, rewrites model names and auth tokens, and translates between API formats — all driven by a single YAML config file.

Features

  • Proxy mode: forward requests to any LLM API, rewriting the model name and auth token transparently.
  • Translation mode: convert between API formats on the fly (e.g., use Claude from an OpenAI-compatible client, or use GPT-4o from Claude Code).
  • Streaming support: SSE responses are streamed chunk-by-chunk with zero buffering.
  • Config-driven routing: add a new provider or model alias by editing the config file — no code changes needed.
  • Kubernetes-ready: includes a Dockerfile for container deployment.

Supported API Dialects

Dialect ID API Used by
openai-chat Chat Completions /v1/chat/completions OpenAI SDK, most OpenAI-compatible clients
openai-responses Responses API /v1/responses Codex CLI (2025)
claude Messages API /v1/messages Claude Code, Anthropic SDK
bedrock AWS Bedrock InvokeModel AWS SDK, Bedrock clients

Quick Start

1. Install

git clone <repo>
cd python-llm-proxy
pip install .

2. Configure

Copy the example config and fill in your credentials:

cp config/config.example.yaml config/config.yaml

3. Run

llm-proxy --config config/config.yaml

The server starts on port 8080 by default. A /healthz endpoint returns 200 OK for health checks.

Configuration

server:
  port: 8080

models:
  - frontend_model_name: <client-facing model name>
    frontend_base_url: <route prefix>           # must start with "/" (e.g., "/openai")
    frontend_dialect: ""                        # set for translation (see below)
    backend_model_name: <real upstream model name>
    backend_base_url: <upstream base URL>
    backend_api_key: ""                         # leave empty to pass client token through
    backend_auth_header: Authorization          # header used to send the token upstream
    backend_auth_schema: Bearer                 # token prefix; empty = raw token
    backend_dialect: ""                         # set for translation (see below)
    system_replacements: {}                     # old→new text substitutions on system prompt
    system_append: ""                           # text appended to system prompt
    inject_progress: false                      # inject session progress into system prompt

Proxy Mode (no translation)

Forward requests as-is, only rewriting the model name and auth token:

models:
  - frontend_model_name: gpt-4-proxy
    frontend_base_url: /openai
    backend_model_name: gpt-4
    backend_base_url: https://api.openai.com
    backend_api_key: "sk-..."
    backend_auth_header: Authorization
    backend_auth_schema: Bearer

Client points at http://localhost:8080/openai/v1/chat/completions with model gpt-4-proxy. The proxy forwards to https://api.openai.com/v1/chat/completions with model gpt-4.

Translation Mode

Set both frontend_dialect and backend_dialect to enable API format translation:

models:
  # Use Claude from any OpenAI Chat client:
  - frontend_model_name: claude-via-openai
    frontend_base_url: /openai-compat
    frontend_dialect: openai-chat
    backend_model_name: claude-3-5-sonnet-20241022
    backend_base_url: https://api.anthropic.com
    backend_api_key: "sk-ant-..."
    backend_auth_header: x-api-key
    backend_auth_schema: ""
    backend_dialect: claude

  # Use GPT-4o from Claude Code:
  - frontend_model_name: gpt4-via-claude
    frontend_base_url: /claude-compat
    frontend_dialect: claude
    backend_model_name: gpt-4o
    backend_base_url: https://api.openai.com
    backend_api_key: "sk-..."
    backend_auth_header: Authorization
    backend_auth_schema: Bearer
    backend_dialect: openai-responses

Config Fields

Field Required Description
frontend_model_name Yes Model name the client sends
frontend_base_url Yes URL path prefix for this group (e.g., /openai → routes /openai/*)
frontend_dialect No API format from the client. Required for translation.
backend_model_name Yes Actual model name sent to upstream
backend_base_url Yes Real upstream base URL
backend_api_key No If set, replaces the client's token; if empty, client token is forwarded
backend_auth_header No Header name for the auth token
backend_auth_schema No Token prefix (e.g., Bearer). Empty = raw token
backend_dialect No API format for the upstream. Required for translation.
system_replacements No Map of old→new text substitutions applied to the system prompt
system_append No Text appended to the system prompt after replacements
inject_progress No If true, injects a session progress summary into the system prompt

How Translation Works

When both dialects are configured, the proxy:

  1. Decodes the incoming request into a dialect-neutral canonical format.
  2. Applies system prompt modifications (replacements, append, progress injection).
  3. Re-encodes it into the upstream dialect's format.
  4. Sends it to the upstream and decodes the response back to canonical.
  5. Re-encodes the response into the incoming dialect's format for the client.

For streaming, this happens event-by-event with no full-response buffering.

Routing

Routes are registered dynamically at startup based on the unique frontend_base_url values in the config. The URL structure is:

http://localhost:8080/<frontend_base_url>/<api-path>

For example, with frontend_base_url: /openai, configure your client's base URL as http://localhost:8080/openai.

Auth

Scenario Behavior
backend_api_key is set All client auth headers are stripped; the configured token is sent upstream
backend_api_key is empty All client headers (including auth) are forwarded unchanged

Development

pip install -e ".[dev]"    # install with dev dependencies
pytest                     # run all tests
llm-proxy --config config/config.yaml

See docs/ for architecture, development, and deployment documentation.

Deployment

A Dockerfile is provided. See docs/deploy/deployment.md for details.

About

A lightweight reverse proxy for LLM APIs. It routes requests to different upstream providers, rewrites model names and auth tokens, and translates between API formats — all driven by a single YAML config file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors