llm-budget-proxy

Lightweight reverse proxy for OpenAI API rate limiting, per-key token budgets, and cost dashboards. Deploy in 5 minutes with Docker.

Companion resources: Blog post · Tutorial

Architecture

┌──────────┐     ┌──────────────────────────────────────────────┐     ┌──────────┐
│  Client   │────▶│  llm-budget-proxy                            │────▶│  OpenAI  │
│           │◀────│                                              │◀────│  API     │
└──────────┘     │  ┌──────┐ ┌───────────┐ ┌────────┐ ┌───────┐│     └──────────┘
                 │  │ Auth │▶│Rate Limit │▶│Budget  │▶│Cache  ││
                 │  └──────┘ └───────────┘ └────────┘ └───────┘│
                 │                                              │
                 │  ┌───────────┐  ┌───────────┐  ┌──────────┐ │
                 │  │ SQLite DB │  │ Dashboard │  │ Webhooks │ │
                 │  └───────────┘  └───────────┘  └──────────┘ │
                 └──────────────────────────────────────────────┘

Quick start

git clone https://github.com/yourusername/llm-budget-proxy.git
cd llm-budget-proxy
cp .env.example .env
# Edit .env — add your OPENAI_API_KEY

# Option A: Docker (recommended)
docker compose up --build

# Option B: Local development
npm install
npm run seed -- alice team-a 10.00   # Create an API key
npm run dev

API key management

API keys use the lbp_ prefix and are stored as SHA-256 hashes. The plaintext key is shown once at creation time.

Create a key via seed script:

npm run seed -- <name> [team] [daily-budget]
npm run seed -- alice team-a 10.00

Create a key via API:

curl -X POST http://localhost:3000/api/keys \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "alice", "team": "team-a", "budgetPeriod": "daily", "budgetLimit": 10.00}'

Use a key:

curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer lbp_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configuration

All configuration is in config/config.yml. Environment variables are substituted using ${VAR_NAME} syntax.

Section	Key	Default	Description
`server.port`	Port	`3000`	Server port
`server.adminKey`	—	required	Admin API key for dashboard and management
`provider.apiKey`	—	required	OpenAI API key
`rateLimits.default.rpm`	RPM	`60`	Requests per minute per key
`rateLimits.default.tpm`	TPM	`100000`	Tokens per minute per key
`budgets.defaultDaily`	Budget	`10.00`	Default daily budget in USD
`budgets.defaultMonthly`	Budget	`100.00`	Default monthly budget in USD
`modelDowngrade.enabled`	Flag	`false`	Enable model downgrade on budget pressure
`cache.enabled`	Flag	`true`	Enable exact-match response caching
`cache.defaultTtlSeconds`	TTL	`3600`	Cache entry lifetime
`alerts.webhookUrl`	URL	—	Webhook URL for budget alerts

Rate limit overrides

Match API keys by name pattern:

rateLimits:
  overrides:
    - keyPattern: "premium-*"
      rpm: 120
      tpm: 500000

Budget alert thresholds

budgets:
  alertThresholds:
    - percent: 80
      action: warn        # X-Budget-Warning header
    - percent: 95
      action: downgrade   # Switch to cheaper model (if enabled)
    - percent: 100
      action: block       # Reject request (402)

Model downgrade rules

Disabled by default. Opt-in via config:

modelDowngrade:
  enabled: true
  rules:
    - from: "gpt-4o"
      to: "gpt-4o-mini"

When triggered, the response includes X-Model-Downgraded: true and X-Original-Model headers.

Pricing manifest

Model pricing is in config/pricing.yml. Update this file when OpenAI changes pricing.

version: "2026-03-14"
provider: openai
models:
  gpt-4o:
    inputPer1k: 0.0025
    outputPer1k: 0.01
    cachedInputPer1k: 0.00125
    maxOutputTokens: 16384

Response headers

Every proxied response includes:

Header	Description
`X-Request-Cost`	Actual cost of this request in USD
`X-Estimated-Cost`	Pre-flight estimated worst-case cost
`X-Input-Tokens`	Input token count
`X-Tokens-Used`	Total tokens (input + output)
`X-Budget-Remaining`	Remaining budget in USD
`X-Budget-Period`	Budget period (daily/monthly)
`X-Budget-Warning`	Set when approaching budget limit
`X-Cache`	`HIT` or `MISS`
`X-Model-Downgraded`	`true` if model was downgraded
`X-RateLimit-Limit-RPM`	RPM limit for this key
`X-RateLimit-Remaining-RPM`	Remaining RPM

Dashboard

Open http://localhost:3000/dashboard and enter your admin API key. Shows:

Cost by API key (bar chart)
Cost over time (line chart)
Budget status (doughnut chart)
Recent requests table

Alerting

Configure a webhook URL to receive budget notifications:

alerts:
  webhookUrl: "https://hooks.slack.com/services/xxx/yyy/zzz"
  events:
    - budgetWarning
    - budgetExceeded

Alerts are debounced (same event + key fires at most once per hour).

How it compares to LiteLLM

LiteLLM (~39k stars) is a mature, full-featured LLMOps platform with 100+ provider integrations, virtual keys, per-key budgets, load balancing, guardrails, and a Postgres-backed dashboard.

llm-budget-proxy is deliberately simpler:

	LiteLLM	llm-budget-proxy
Providers	100+	OpenAI only (MVP)
Database	Postgres/Redis	SQLite
Deployment	Multi-service	Single container
Setup time	~30 min	~5 min
Dashboard	Full admin UI	Single-page Chart.js
Use case	Enterprise, multi-provider	Dev/staging, single-provider, learning

Use LiteLLM when you need enterprise scale, multi-provider support, or a full observability platform.

Use llm-budget-proxy when you want a lightweight, self-contained proxy you can understand, modify, and deploy in minutes.

Limitations

Single-instance only — SQLite does not support multi-node deployment. For horizontal scaling, migrate to Postgres or Redis.
OpenAI only — This MVP proxies OpenAI's /v1/chat/completions endpoint. Anthropic support is a documented future extension.
Estimated cost — Pre-flight cost checks use estimated input tokens + worst-case output ceiling. Actual cost is recorded after the response completes.
No semantic caching — Cache uses exact request-body matching only. Semantic similarity caching requires embeddings and vector search, which is out of scope.

Development

npm install
npm run dev         # Start with hot reload
npm test            # Run tests
npm run test:watch  # Watch mode
npm run build       # Compile TypeScript

License

MIT — AGR Group

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-budget-proxy

Architecture

Quick start

API key management

Configuration

Rate limit overrides

Budget alert thresholds

Model downgrade rules

Pricing manifest

Response headers

Dashboard

Alerting

How it compares to LiteLLM

Limitations

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-budget-proxy

Architecture

Quick start

API key management

Configuration

Rate limit overrides

Budget alert thresholds

Model downgrade rules

Pricing manifest

Response headers

Dashboard

Alerting

How it compares to LiteLLM

Limitations

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages