A Go library for proxying requests to upstream LLM providers with pluggable, composable architecture.
go get github.com/agentuity/llmproxypackage main
import (
"context"
"io"
"net/http"
"github.com/agentuity/llmproxy"
"github.com/agentuity/llmproxy/interceptors"
"github.com/agentuity/llmproxy/providers/openai"
)
func main() {
ctx := context.Background()
provider, _ := openai.New("sk-your-key")
proxy := llmproxy.NewProxy(provider,
llmproxy.WithInterceptor(interceptors.NewLogging(nil)),
)
http.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r *http.Request) {
resp, meta, err := proxy.Forward(ctx, r)
if err != nil {
http.Error(w, err.Error(), 500)
return
}
defer resp.Body.Close()
// Response includes token usage
_ = meta.Usage.PromptTokens
_ = meta.Usage.CompletionTokens
io.Copy(w, resp.Body)
})
http.ListenAndServe(":8080", nil)
}Single endpoint that auto-detects provider and API type:
package main
import (
"net/http"
"github.com/agentuity/llmproxy"
"github.com/agentuity/llmproxy/providers/openai"
"github.com/agentuity/llmproxy/providers/anthropic"
)
func main() {
openaiProvider, _ := openai.New("sk-openai-key")
anthropicProvider, _ := anthropic.New("sk-ant-key")
router := llmproxy.NewAutoRouter(
llmproxy.WithAutoRouterFallbackProvider(openaiProvider),
)
router.RegisterProvider(openaiProvider)
router.RegisterProvider(anthropicProvider)
// Single endpoint handles all providers and APIs
http.Handle("/", router)
http.ListenAndServe(":8080", nil)
}POST to / with any model - provider and API are auto-detected:
# Auto-detect OpenAI from gpt-4 model name
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'
# Auto-detect Anthropic from claude model name
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'
# Auto-detect Responses API from input field
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4o","input":"Hello"}'- 9 Provider Implementations: OpenAI, Anthropic, Groq, Fireworks, x.AI, Google AI, AWS Bedrock, Azure OpenAI, OpenAI-compatible base
- AutoRouter: Single endpoint with automatic provider/API detection
- Responses API: Full support for OpenAI's Responses API (HTTP streaming and WebSocket mode)
- WebSocket Mode: Persistent connections for multi-turn Responses API workflows with per-turn billing
- SSE Streaming: Full streaming support with efficient token usage extraction
- 8 Built-in Interceptors: Logging, Metrics, Retry, Billing, Tracing (OTel), HeaderBan, AddHeader, PromptCaching
- Pricing Integration: models.dev adapter with markup support
- Prompt Caching: prompt caching support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock
- Raw Body Preservation: Custom JSON fields pass through unchanged
The AutoRouter provides automatic routing from a single endpoint:
- Path-based -
/v1/messages→ Messages API,/v1/responses→ Responses API - Body + Provider - When path is
/or unknown:inputfield → Responses APIpromptfield → Completions APIcontentsfield → GenerateContent APImessages+ Anthropic → Messages APImessages+ other → Chat Completions
- X-Provider header - Explicit override
- Model prefix -
openai/gpt-4→ OpenAI (strips prefix before forwarding) - Model pattern -
gpt-*→ OpenAI,claude-*→ Anthropic, etc.
# Explicit provider via header
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-H 'X-Provider: anthropic' \
-d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'
# Provider prefix in model (gets stripped)
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-d '{"model":"anthropic/claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'
# Traditional path still works
curl -X POST http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'SSE streaming is fully supported with automatic token usage extraction for billing:
# Streaming with automatic usage extraction
curl -X POST http://localhost:8080/ \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4","stream":true,"messages":[{"role":"user","content":"Hello"}]}'Key Features:
- Efficient flushing: Uses
http.ResponseControllerfor immediate SSE delivery - Token extraction: Extracts usage from streaming responses for billing
- Auto stream_options: Automatically injects
stream_options.include_usagewhen billing is configured - Works with billing: Billing is calculated after stream completes
Example with billing:
adapter, _ := modelsdev.LoadFromURL()
billingCallback := func(r llmproxy.BillingResult) {
log.Printf("Cost: $%.6f (tokens: %d/%d)", r.TotalCost, r.PromptTokens, r.CompletionTokens)
}
router := llmproxy.NewAutoRouter(
llmproxy.WithAutoRouterBillingCalculator(llmproxy.NewBillingCalculator(adapter.GetCostLookup(), billingCallback)),
)The Responses API supports persistent WebSocket connections for multi-turn, tool-call-heavy workflows. WebSocket support is opt-in with a zero-dependency adapter pattern — bring your own WebSocket library.
package main
import (
"context"
"log"
"net/http"
"github.com/agentuity/llmproxy"
"github.com/agentuity/llmproxy/providers/openai"
"github.com/gorilla/websocket"
)
// Configure allowed origins for WebSocket upgrades.
var trustedOrigins = []string{"https://myapp.example.com"}
// Thin adapters — gorilla's *Conn already satisfies llmproxy.WSConn
type gorillaUpgrader struct{ websocket.Upgrader }
func (u *gorillaUpgrader) Upgrade(w http.ResponseWriter, r *http.Request, h http.Header) (llmproxy.WSConn, error) {
conn, err := u.Upgrader.Upgrade(w, r, h)
return conn, err
}
type gorillaDialer struct{ websocket.Dialer }
func (d *gorillaDialer) DialContext(ctx context.Context, urlStr string, h http.Header) (llmproxy.WSConn, *http.Response, error) {
conn, resp, err := d.Dialer.DialContext(ctx, urlStr, h)
return conn, resp, err
}
func main() {
// In production, validate the Origin header against trusted origins.
// This example allows all origins for brevity.
upgrader := websocket.Upgrader{
CheckOrigin: func(r *http.Request) bool {
origin := r.Header.Get("Origin")
for _, allowed := range trustedOrigins {
if origin == allowed {
return true
}
}
return false
},
}
provider, _ := openai.New("sk-your-key")
router := llmproxy.NewAutoRouter(
llmproxy.WithAutoRouterFallbackProvider(provider),
llmproxy.WithAutoRouterWebSocket(
&gorillaUpgrader{upgrader},
&gorillaDialer{websocket.Dialer{}},
),
llmproxy.WithAutoRouterWSBillingCallback(func(turn int, meta llmproxy.ResponseMetadata, billing *llmproxy.BillingResult) {
log.Printf("Turn %d: %d prompt + %d completion tokens",
turn, meta.Usage.PromptTokens, meta.Usage.CompletionTokens)
}),
)
router.RegisterProvider(provider)
http.Handle("/", router)
log.Fatal(http.ListenAndServe(":8080", nil))
}Clients connect with any WebSocket library:
from websocket import create_connection
import json
ws = create_connection("ws://localhost:8080/v1/responses",
header=["Authorization: Bearer sk-your-key"])
ws.send(json.dumps({
"type": "response.create",
"model": "gpt-4o",
"input": [{"type": "message", "role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]}],
}))
for msg in ws:
event = json.loads(msg)
print(event["type"], event.get("delta", ""))
if event["type"] == "response.completed":
breakThe proxy handles model prefix stripping, auth header forwarding, usage extraction, and per-turn billing automatically. See DESIGN.md for full protocol details.
| Provider | Auth | API Format | Notes |
|---|---|---|---|
| OpenAI | Bearer token | Chat completions, Responses, WebSocket | HTTP + WebSocket for /v1/responses |
| Anthropic | x-api-key |
Messages API | |
| Groq | Bearer token | OpenAI-compatible | |
| Fireworks | Bearer token | OpenAI-compatible | |
| x.AI | Bearer token | OpenAI-compatible | |
| Google AI | API key query param | Gemini generateContent | |
| AWS Bedrock | AWS Signature V4 | Converse API | |
| Azure OpenAI | api-key or Azure AD |
Chat completions (deployments) |
// Logging
llmproxy.WithInterceptor(interceptors.NewLogging(logger))
// Metrics (thread-safe)
metrics := &interceptors.Metrics{}
llmproxy.WithInterceptor(interceptors.NewMetrics(metrics))
// Retry on 429/5xx
llmproxy.WithInterceptor(interceptors.NewRetry(3, time.Second))
// Billing with models.dev pricing
adapter, _ := modelsdev.LoadFromURL()
llmproxy.WithInterceptor(interceptors.NewBilling(adapter.GetCostLookup(), func(r llmproxy.BillingResult) {
log.Printf("Cost: $%.6f", r.TotalCost)
}))
// OTel tracing
llmproxy.WithInterceptor(interceptors.NewTracing(otelExtractor))
// Strip sensitive headers
llmproxy.WithInterceptor(interceptors.NewResponseHeaderBan("Openai-Organization"))
// Add custom headers
llmproxy.WithInterceptor(interceptors.NewAddResponseHeader(
interceptors.NewHeader("X-Gateway", "llmproxy"),
))
// Anthropic prompt caching (default 5 min, free)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetentionDefault))
// Anthropic prompt caching with 1h retention (costs more)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetention1h))
// OpenAI prompt caching with explicit cache key
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCaching(interceptors.CacheRetention24h, "my-cache-key"))
// OpenAI prompt caching with auto-derived key and tenant namespace
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCachingAuto("tenant-123", interceptors.CacheRetentionDefault))
// xAI/Grok prompt caching (uses x-grok-conv-id header)
llmproxy.WithInterceptor(interceptors.NewXAIPromptCaching("conv-abc123"))
// Fireworks prompt caching (uses x-session-affinity and x-prompt-cache-isolation-key headers)
llmproxy.WithInterceptor(interceptors.NewFireworksPromptCaching("session-123"))The library uses small, focused interfaces that compose into providers:
Parse → Enrich → Resolve → Forward → Extract
- BodyParser — Extract metadata from request body
- RequestEnricher — Add auth headers
- URLResolver — Determine upstream URL
- ResponseExtractor — Parse response metadata
- Provider — Composes the above
- Interceptor — Wrap request/response for cross-cutting concerns
See DESIGN.md for full architecture details.
A complete multi-provider proxy server:
cd examples/basic
go run main.goEnvironment variables:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GROQ_API_KEY |
Groq |
FIREWORKS_API_KEY |
Fireworks |
XAI_API_KEY |
x.AI |
GOOGLE_AI_API_KEY |
Google AI |
AZURE_OPENAI_RESOURCE |
Azure OpenAI |
AZURE_OPENAI_API_KEY |
Azure OpenAI |
AWS_REGION + AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
AWS Bedrock |