A local proxy that intercepts LLM API traffic, tracks costs in real-time, and provides a terminal dashboard. Think of it as tcpdump for your OpenAI and Anthropic API calls.
- Cost Visibility: See exactly how much each request costs, broken down by model
- Token Tracking: Monitor input/output tokens across all your LLM calls
- Latency Monitoring: Track response times to identify slow requests
- Debugging: Inspect request/response payloads without modifying your code
- Zero Code Changes: Just set environment variables to route traffic through the proxy
npm install -g llm-tapOr run locally:
git clone https://github.com/dabit3/llm-tap
cd llm-tap
npm install
npm linkllm-tap startThis starts the proxy on port 8787 with a real-time TUI dashboard showing:
- Total cost, requests, and tokens
- Cost breakdown by model
- Recent request log with timing
- Token and latency sparklines
- Provider breakdown
Set these environment variables to route traffic through llm-tap:
# For OpenAI SDK
export OPENAI_BASE_URL=http://localhost:8787/v1
# For Anthropic SDK
export ANTHROPIC_BASE_URL=http://localhost:8787/anthropicOr print the exports:
llm-tap env# Custom port
llm-tap start --port 9000
# Verbose mode (logs to stdout instead of dashboard)
llm-tap start --verbose --no-dashboard
# Show pricing table
llm-tap pricingThe proxy exposes these endpoints for programmatic access:
GET /stats- Aggregated statistics (costs, tokens, latency)GET /requests?limit=50- Recent requests with detailsGET /export- Export all data as JSONGET /health- Health check
Example:
curl http://localhost:8787/stats | jqqorEsc- Quitr- Force refresh
- OpenAI: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5, o1, o1-mini, o3-mini
- Anthropic: Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, Claude Sonnet 4, Claude Opus 4
Pricing is updated as of February 2026. Unknown models use a conservative default estimate.
- llm-tap runs a local HTTP proxy server
- You point your LLM SDK at the proxy via
*_BASE_URLenvironment variables - The proxy forwards requests to the real API, capturing request/response data
- Token usage and costs are calculated based on the response's usage field
- Stats are aggregated and displayed in the TUI or exposed via the API
Your App --> llm-tap (localhost:8787) --> OpenAI/Anthropic API
[logs, calculates cost]
- Cost Optimization: Identify which parts of your app consume the most tokens
- Debugging Agent Loops: See every API call your agent makes
- Comparing Models: A/B test different models and compare costs
- Budget Monitoring: Track spend during development
- Latency Analysis: Find slow requests that impact user experience
MIT