SmartProxy is a local LLM proxy for people who use OpenAI-compatible clients but do not want provider choice, failover, cost tracking, and caching scattered across shell scripts.
It runs as one Go binary, listens on 127.0.0.1:4100, and keeps its ledger in
SQLite. Point Cursor, Continue, the OpenAI SDK, or any compatible client at
http://127.0.0.1:4100/v1; SmartProxy decides where each chat request should go.
The checked-in image above is a dashboard preview with sample numbers. Replace it with a real terminal capture when you publish release screenshots.
Local LLM workflows tend to grow small pieces of infrastructure: one script for DeepSeek, another for OpenAI, a hand-written retry loop, a spreadsheet for token costs, and a half-remembered budget check. SmartProxy puts those boring parts in one place.
The useful bit is not that it hides providers. It is that every request gets a recorded decision: which route matched, which target was tried, whether failover happened, how many tokens came back, what it probably cost, and whether a cache hit saved the call.
- Accepts OpenAI Chat Completions requests at
POST /v1/chat/completions. - Routes by exact model, model aliases, ordered rules, request shape, tools, vision input, estimated token size, and target capabilities.
- Talks to OpenAI-compatible providers directly and translates Anthropic Messages responses back into OpenAI-shaped responses.
- Retries transient upstream failures, respects
Retry-After, and opens circuit breakers around bad targets. - Records local SQLite telemetry for usage, cost, routing, cache, latency, status, request fingerprints, and anomalies.
- Keeps prompts, responses, tool arguments, request bodies, API keys, and full client IP addresses out of telemetry tables.
- Supports an optional exact-match response cache. When enabled, cache entries can be gzipped and encrypted at rest.
- Enforces daily, monthly, and session budgets before sending work upstream.
- Ships CLI tools for setup, config validation, provider checks, route explanation, stats, export, pruning, cache maintenance, and a Bubble Tea terminal dashboard.
SmartProxy is currently 0.1.0. The v1 scope is intentionally narrow:
OpenAI-compatible Chat Completions in, provider-specific HTTP out, local
observability around the trip. The Responses API, embeddings, image generation,
audio generation, hosted dashboards, and multi-tenant billing are outside this
version.
Build the binary from a checkout:
go build -trimpath -o dist/smartproxy ./cmd/smartproxyRun it with zero config by setting at least one supported provider key:
export OPENAI_API_KEY=sk-...
./dist/smartproxySmartProxy will listen at:
http://127.0.0.1:4100/v1
On loopback, client authentication is optional by default. If your SDK requires an API key field, use any non-empty placeholder value there; provider keys still come from the environment variables configured for SmartProxy.
Try a request:
curl -s http://127.0.0.1:4100/v1/chat/completions \
-H 'content-type: application/json' \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Reply with one short sentence."}]
}'Then inspect what happened:
./dist/smartproxy stats
./dist/smartproxy dashFor anything beyond the simplest local setup, create a config file:
./dist/smartproxy setup
./dist/smartproxy start --watchOr start from the checked-in example:
cp smartproxy.yaml.example smartproxy.yaml
$EDITOR smartproxy.yaml
./dist/smartproxy config validate --config smartproxy.yaml
./dist/smartproxy --config smartproxy.yaml start --watchSmartProxy discovers config files in this order:
./smartproxy.yaml
~/.config/smartproxy/smartproxy.yaml
~/.smartproxy.yaml
The full example lives in smartproxy.yaml.example.
It shows three common provider entries, ordered route rules, failover policy,
cache settings, budget limits, pricing overrides, and local auth defaults.
Useful environment overrides:
SMARTPROXY_LISTEN=127.0.0.1:4100
SMARTPROXY_ADMIN_LISTEN=127.0.0.1:4101
SMARTPROXY_DB=~/.smartproxy/smartproxy.db
SMARTPROXY_LOG_LEVEL=info
SMARTPROXY_CACHE_ENABLED=false
SMARTPROXY_BUDGET_DAILY_LIMIT_USD=10
SMARTPROXY_BUDGET_MONTHLY_LIMIT_USD=200
SMARTPROXY_BUDGET_SESSION_LIMIT_USD=0Provider keys are read from the env vars named in the config, for example
OPENAI_API_KEY, DEEPSEEK_API_KEY, and ANTHROPIC_API_KEY.
Routing is deterministic. For each chat request SmartProxy:
- Resolves model aliases.
- Sends an exact configured model match directly to the first enabled target.
- Otherwise evaluates
routesfrom top to bottom. - Skips targets that are disabled, circuit-open, missing credentials, too small for the request, or missing required capabilities.
- Retries transient failures on the same target.
- Fails over to the next capable target when retries are exhausted.
You can ask SmartProxy to explain a request without sending it upstream:
./dist/smartproxy route explain \
--config smartproxy.yaml.example \
--file testdata/openai_chat.jsonResponses include routing headers when observability.response_headers is true:
x-smartproxy-request-id
x-smartproxy-route
x-smartproxy-target
x-smartproxy-provider
x-smartproxy-model
x-smartproxy-attempts
x-smartproxy-failover
x-smartproxy-cache
The day-to-day commands are small on purpose:
| Need | Command |
|---|---|
| Create an interactive config | smartproxy setup |
| Start the proxy | smartproxy start --watch |
| Start with the dashboard | smartproxy start --dash |
| Validate config | smartproxy config validate --config smartproxy.yaml |
| Check local setup | smartproxy doctor --config smartproxy.yaml |
| Check providers | smartproxy providers check --config smartproxy.yaml |
| Explain routing | smartproxy route explain --file request.json |
| Send one request from a file | smartproxy request --file request.json |
| Show local usage | smartproxy stats --today |
| Export telemetry | smartproxy export --format csv |
| Show cache stats | smartproxy cache stats |
| Clear cache entries | smartproxy cache clear |
| Prune old telemetry | smartproxy prune --execute |
Public listener:
POST /v1/chat/completionsGET /v1/modelsGET /healthzGET /readyzGET /stats
Admin listener:
POST /admin/reloadGET /admin/statsGET /admin/budgetGET /metricsGET /debug/routesGET /debug/status
POST /v1/responses is intentionally rejected in v1 with an
unsupported_endpoint error.
Telemetry is designed for debugging spend and routing, not for storing user content. Request logs include metadata such as route, target, provider, model, status, token counts, latency, cache status, cost estimate, retry attempts, user agent, salted client address hash, trace ID, and a salted request fingerprint.
Telemetry does not store:
- prompt text;
- response text;
- request bodies;
- tool arguments;
- API keys;
- full client IP addresses.
The response cache is different: if you enable it, SmartProxy stores response
bodies locally so it can replay exact matches. Use cache.encrypt: true and set
SMARTPROXY_CACHE_KEY when cached bodies need encryption at rest.
The default listener binds to loopback. In Docker, bind the public listener to
0.0.0.0 and set a SmartProxy client key:
docker build -t smartproxy .
docker run --rm \
-p 127.0.0.1:4100:4100 \
-e OPENAI_API_KEY \
-e SMARTPROXY_LISTEN=0.0.0.0:4100 \
-e SMARTPROXY_API_KEY=local-dev-key \
smartproxyClients should then send Authorization: Bearer local-dev-key to the local
proxy. The upstream provider key remains inside the container environment.
Requirements:
- Go 1.23 or newer.
- No Redis, Postgres, CGO SQLite driver, or external worker process.
Common checks:
make test
make race
make coverage
make release-checkThe release check runs the regular tests, race tests, coverage script, and a trimmed binary build.
cmd/smartproxy/ CLI entrypoint
internal/api/ OpenAI request parsing and request analysis
internal/provider/ OpenAI-compatible and Anthropic upstream adapters
internal/translate/ OpenAI <-> Anthropic conversion
internal/routing/ Rules, retries, failover, circuit state
internal/runtime/ HTTP listeners, auth, headers, reload
internal/store/ SQLite schema, WAL store, async logging
internal/report/ Stats, export, assertions
internal/dashboard/ Terminal dashboard
internal/cache/ Exact-match response cache
internal/budget/ Daily, monthly, and session limits
MIT. See LICENSE.