llmproxy

A Go library for proxying requests to upstream LLM providers with pluggable, composable architecture.

Install

go get github.com/agentuity/llmproxy

Quick Start

Simple Proxy

package main

import (
    "context"
    "io"
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/interceptors"
    "github.com/agentuity/llmproxy/providers/openai"
)

func main() {
    ctx := context.Background()

    provider, _ := openai.New("sk-your-key")

    proxy := llmproxy.NewProxy(provider,
        llmproxy.WithInterceptor(interceptors.NewLogging(nil)),
    )

    http.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r *http.Request) {
        resp, meta, err := proxy.Forward(ctx, r)
        if err != nil {
            http.Error(w, err.Error(), 500)
            return
        }
        defer resp.Body.Close()

        // Response includes token usage
        _ = meta.Usage.PromptTokens
        _ = meta.Usage.CompletionTokens

        io.Copy(w, resp.Body)
    })

    http.ListenAndServe(":8080", nil)
}

AutoRouter (Recommended)

Single endpoint that auto-detects provider and API type:

package main

import (
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/providers/openai"
    "github.com/agentuity/llmproxy/providers/anthropic"
)

func main() {
    openaiProvider, _ := openai.New("sk-openai-key")
    anthropicProvider, _ := anthropic.New("sk-ant-key")

    router := llmproxy.NewAutoRouter(
        llmproxy.WithAutoRouterFallbackProvider(openaiProvider),
    )
    router.RegisterProvider(openaiProvider)
    router.RegisterProvider(anthropicProvider)

    // Single endpoint handles all providers and APIs
    http.Handle("/", router)
    http.ListenAndServe(":8080", nil)
}

POST to / with any model - provider and API are auto-detected:

# Auto-detect OpenAI from gpt-4 model name
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

# Auto-detect Anthropic from claude model name  
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Auto-detect Responses API from input field
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4o","input":"Hello"}'

Features

9 Provider Implementations: OpenAI, Anthropic, Groq, Fireworks, x.AI, Google AI, AWS Bedrock, Azure OpenAI, OpenAI-compatible base
AutoRouter: Single endpoint with automatic provider/API detection
Responses API: Full support for OpenAI's Responses API (HTTP streaming and WebSocket mode)
WebSocket Mode: Persistent connections for multi-turn Responses API workflows with per-turn billing
SSE Streaming: Full streaming support with efficient token usage extraction
8 Built-in Interceptors: Logging, Metrics, Retry, Billing, Tracing (OTel), HeaderBan, AddHeader, PromptCaching
Pricing Integration: models.dev adapter with markup support
Prompt Caching: prompt caching support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock
Raw Body Preservation: Custom JSON fields pass through unchanged

AutoRouter

The AutoRouter provides automatic routing from a single endpoint:

Detection Order

Path-based - /v1/messages → Messages API, /v1/responses → Responses API
Body + Provider - When path is / or unknown:
- input field → Responses API
- prompt field → Completions API
- contents field → GenerateContent API
- messages + Anthropic → Messages API
- messages + other → Chat Completions

Provider Detection

X-Provider header - Explicit override
Model prefix - openai/gpt-4 → OpenAI (strips prefix before forwarding)
Model pattern - gpt-* → OpenAI, claude-* → Anthropic, etc.

Examples

# Explicit provider via header
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -H 'X-Provider: anthropic' \
  -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Provider prefix in model (gets stripped)
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"anthropic/claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Traditional path still works
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

Streaming

SSE streaming is fully supported with automatic token usage extraction for billing:

# Streaming with automatic usage extraction
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

Key Features:

Efficient flushing: Uses http.ResponseController for immediate SSE delivery
Token extraction: Extracts usage from streaming responses for billing
Auto stream_options: Automatically injects stream_options.include_usage when billing is configured
Works with billing: Billing is calculated after stream completes

Example with billing:

adapter, _ := modelsdev.LoadFromURL()
billingCallback := func(r llmproxy.BillingResult) {
    log.Printf("Cost: $%.6f (tokens: %d/%d)", r.TotalCost, r.PromptTokens, r.CompletionTokens)
}

router := llmproxy.NewAutoRouter(
    llmproxy.WithAutoRouterBillingCalculator(llmproxy.NewBillingCalculator(adapter.GetCostLookup(), billingCallback)),
)

WebSocket Mode

The Responses API supports persistent WebSocket connections for multi-turn, tool-call-heavy workflows. WebSocket support is opt-in with a zero-dependency adapter pattern — bring your own WebSocket library.

gorilla/websocket Example

package main

import (
    "context"
    "log"
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/providers/openai"
    "github.com/gorilla/websocket"
)

// Configure allowed origins for WebSocket upgrades.
var trustedOrigins = []string{"https://myapp.example.com"}

// Thin adapters — gorilla's *Conn already satisfies llmproxy.WSConn

type gorillaUpgrader struct{ websocket.Upgrader }

func (u *gorillaUpgrader) Upgrade(w http.ResponseWriter, r *http.Request, h http.Header) (llmproxy.WSConn, error) {
    conn, err := u.Upgrader.Upgrade(w, r, h)
    return conn, err
}

type gorillaDialer struct{ websocket.Dialer }

func (d *gorillaDialer) DialContext(ctx context.Context, urlStr string, h http.Header) (llmproxy.WSConn, *http.Response, error) {
    conn, resp, err := d.Dialer.DialContext(ctx, urlStr, h)
    return conn, resp, err
}

func main() {
    // In production, validate the Origin header against trusted origins.
    // This example allows all origins for brevity.
    upgrader := websocket.Upgrader{
        CheckOrigin: func(r *http.Request) bool {
            origin := r.Header.Get("Origin")
            for _, allowed := range trustedOrigins {
                if origin == allowed {
                    return true
                }
            }
            return false
        },
    }

    provider, _ := openai.New("sk-your-key")

    router := llmproxy.NewAutoRouter(
        llmproxy.WithAutoRouterFallbackProvider(provider),
        llmproxy.WithAutoRouterWebSocket(
            &gorillaUpgrader{upgrader},
            &gorillaDialer{websocket.Dialer{}},
        ),
        llmproxy.WithAutoRouterWSBillingCallback(func(turn int, meta llmproxy.ResponseMetadata, billing *llmproxy.BillingResult) {
            log.Printf("Turn %d: %d prompt + %d completion tokens",
                turn, meta.Usage.PromptTokens, meta.Usage.CompletionTokens)
        }),
    )
    router.RegisterProvider(provider)

    http.Handle("/", router)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Clients connect with any WebSocket library:

from websocket import create_connection
import json

ws = create_connection("ws://localhost:8080/v1/responses",
    header=["Authorization: Bearer sk-your-key"])

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-4o",
    "input": [{"type": "message", "role": "user",
               "content": [{"type": "input_text", "text": "Hello!"}]}],
}))

for msg in ws:
    event = json.loads(msg)
    print(event["type"], event.get("delta", ""))
    if event["type"] == "response.completed":
        break

The proxy handles model prefix stripping, auth header forwarding, usage extraction, and per-turn billing automatically. See DESIGN.md for full protocol details.

Providers

Provider	Auth	API Format	Notes
OpenAI	Bearer token	Chat completions, Responses, WebSocket	HTTP + WebSocket for `/v1/responses`
Anthropic	`x-api-key`	Messages API
Groq	Bearer token	OpenAI-compatible
Fireworks	Bearer token	OpenAI-compatible
x.AI	Bearer token	OpenAI-compatible
Google AI	API key query param	Gemini generateContent
AWS Bedrock	AWS Signature V4	Converse API
Azure OpenAI	`api-key` or Azure AD	Chat completions (deployments)

Interceptors

// Logging
llmproxy.WithInterceptor(interceptors.NewLogging(logger))

// Metrics (thread-safe)
metrics := &interceptors.Metrics{}
llmproxy.WithInterceptor(interceptors.NewMetrics(metrics))

// Retry on 429/5xx
llmproxy.WithInterceptor(interceptors.NewRetry(3, time.Second))

// Billing with models.dev pricing
adapter, _ := modelsdev.LoadFromURL()
llmproxy.WithInterceptor(interceptors.NewBilling(adapter.GetCostLookup(), func(r llmproxy.BillingResult) {
    log.Printf("Cost: $%.6f", r.TotalCost)
}))

// OTel tracing
llmproxy.WithInterceptor(interceptors.NewTracing(otelExtractor))

// Strip sensitive headers
llmproxy.WithInterceptor(interceptors.NewResponseHeaderBan("Openai-Organization"))

// Add custom headers
llmproxy.WithInterceptor(interceptors.NewAddResponseHeader(
    interceptors.NewHeader("X-Gateway", "llmproxy"),
))

// Anthropic prompt caching (default 5 min, free)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetentionDefault))

// Anthropic prompt caching with 1h retention (costs more)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetention1h))

// OpenAI prompt caching with explicit cache key
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCaching(interceptors.CacheRetention24h, "my-cache-key"))

// OpenAI prompt caching with auto-derived key and tenant namespace
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCachingAuto("tenant-123", interceptors.CacheRetentionDefault))

// xAI/Grok prompt caching (uses x-grok-conv-id header)
llmproxy.WithInterceptor(interceptors.NewXAIPromptCaching("conv-abc123"))

// Fireworks prompt caching (uses x-session-affinity and x-prompt-cache-isolation-key headers)
llmproxy.WithInterceptor(interceptors.NewFireworksPromptCaching("session-123"))

Architecture

The library uses small, focused interfaces that compose into providers:

Parse → Enrich → Resolve → Forward → Extract

BodyParser — Extract metadata from request body
RequestEnricher — Add auth headers
URLResolver — Determine upstream URL
ResponseExtractor — Parse response metadata
Provider — Composes the above
Interceptor — Wrap request/response for cross-cutting concerns

See DESIGN.md for full architecture details.

Example

A complete multi-provider proxy server:

cd examples/basic
go run main.go

Environment variables:

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GROQ_API_KEY`	Groq
`FIREWORKS_API_KEY`	Fireworks
`XAI_API_KEY`	x.AI
`GOOGLE_AI_API_KEY`	Google AI
`AZURE_OPENAI_RESOURCE`	Azure OpenAI
`AZURE_OPENAI_API_KEY`	Azure OpenAI
`AWS_REGION` + `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`	AWS Bedrock

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
examples		examples
interceptors		interceptors
internal/fastjson		internal/fastjson
pricing/modelsdev		pricing/modelsdev
providers		providers
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
DESIGN.md		DESIGN.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
apitype.go		apitype.go
autorouter.go		autorouter.go
autorouter_test.go		autorouter_test.go
autorouter_websocket.go		autorouter_websocket.go
autorouter_websocket_test.go		autorouter_websocket_test.go
billing.go		billing.go
billing_calculator.go		billing_calculator.go
billing_test.go		billing_test.go
detection.go		detection.go
detection_test.go		detection_test.go
enricher.go		enricher.go
extractor.go		extractor.go
go.mod		go.mod
go.sum		go.sum
interceptor.go		interceptor.go
logger.go		logger.go
metadata.go		metadata.go
parser.go		parser.go
parser_test.go		parser_test.go
provider.go		provider.go
proxy.go		proxy.go
registry.go		registry.go
resolver.go		resolver.go
streaming.go		streaming.go
streaming_test.go		streaming_test.go
websocket.go		websocket.go
websocket_test.go		websocket_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmproxy

Install

Quick Start

Simple Proxy

AutoRouter (Recommended)

Features

AutoRouter

Detection Order

Provider Detection

Examples

Streaming

WebSocket Mode

gorilla/websocket Example

Providers

Interceptors

Architecture

Example

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmproxy

Install

Quick Start

Simple Proxy

AutoRouter (Recommended)

Features

AutoRouter

Detection Order

Provider Detection

Examples

Streaming

WebSocket Mode

gorilla/websocket Example

Providers

Interceptors

Architecture

Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages