A lightweight Python library for LLM cost attribution — track, tag, and report LLM API costs by feature, user, and model.
llmwatch is not an observability platform or proxy server. It's a lightweight Python library that integrates directly into your existing codebase.
Unlike solutions like Langfuse, LangSmith, or LiteLLM, llmwatch requires no external infrastructure, no API gateway, and no proxy setup. Just pip install llmwatch and add 3 lines of code to start tracking LLM costs.
Key differentiators:
- No proxy or gateway needed — Unlike LiteLLM and Helicone, which sit between your code and LLM APIs
- No external platform — Unlike Langfuse and LangSmith, which require cloud infrastructure
- Works with your existing SDK — Patch your OpenAI, Anthropic, or Google clients with
instrument(client) - Feature-level cost attribution — Tag LLM calls by feature, user, environment, and any custom dimension
- Minimal setup — 3 lines of code to get started
- 1000+ models — Bundled pricing data covering OpenAI, Anthropic, Google, and more
from openai import AsyncOpenAI
from llmwatch.tracker import LLMWatch
client = AsyncOpenAI()
watcher = LLMWatch(client=client)
@watcher.tracked(feature="summarize", user_id="alice")
async def summarize(text: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return response.choices[0].message.content
result = await summarize("Long document text...")from openai import OpenAI
from llmwatch.tracker import LLMWatch
client = OpenAI()
watcher = LLMWatch(client=client)
@watcher.tracked(feature="summarize", user_id="alice")
def summarize(text: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return response.choices[0].message.content
result = summarize("Long document text...")- Automatic cost tracking — Instrument SDK clients to capture token usage and calculate costs without modifying your LLM calls
- Flexible tagging — Attach metadata to tracked calls with
@watcher.tracked(feature=..., user_id=..., environment=...) - Multi-provider support — OpenAI, Anthropic, Google Generative AI (sync, async, and streaming)
- Bundled pricing — 1000+ models with up-to-date pricing data synced from pydantic/genai-prices
- Multiple database backends — SQLite (default), PostgreSQL, MySQL, MongoDB (Beanie ODM), Oracle, MSSQL
- Budget alerts — Set thresholds and trigger callbacks when spending exceeds limits
- Reporting and export — Generate cost summaries by feature, user, model, or provider (CSV/JSON)
- CLI tools — View reports, manage data, sync pricing
- Web dashboard — Optional interactive dashboard for cost visualization (
llmwatch dashboard) - Streaming support — Track costs for streaming responses (SSE, async streams)
| Provider | Sync | Async | Streaming | Models |
|---|---|---|---|---|
| OpenAI | O | O | O | GPT-5.4, o4-mini, o3, o1, GPT-4o, etc. |
| Anthropic | O | O | O | Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5, etc. |
| O | O | O | Gemini 3.1, Gemini 2.5, Gemini 2.0, etc. |
pip install llmwatch
# or
uv add llmwatchpip install llmwatch[pg] # PostgreSQL
pip install llmwatch[mysql] # MySQL
pip install llmwatch[mongo] # MongoDB (Beanie ODM)
pip install llmwatch[dashboard] # Web dashboard (Starlette + Uvicorn)from openai import AsyncOpenAI
from llmwatch.tracker import LLMWatch
client = AsyncOpenAI()
watcher = LLMWatch(client=client)
@watcher.tracked(feature="chat", user_id="user123", environment="production")
async def chat_response(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
# Costs are tracked automatically
result = await chat_response("Hello, how are you?")watcher = LLMWatch(client=client)
async def on_budget_exceeded(record):
print(f"Budget exceeded: ${record.cost_usd:.4f} on feature={record.tags.feature}")
watcher.budget.add_rule(
max_cost_usd=0.50,
callback=on_budget_exceeded,
feature="summarize",
)summary = await watcher.report.by_feature(period="7d")
print(f"Total cost: ${summary.total_cost_usd:.4f}")
for b in summary.breakdowns:
print(f" {b.group_value}: ${b.total_cost_usd:.4f} ({b.total_requests} calls)")
# Also available: by_user_id(), by_model(), by_provider()
await watcher.report.export_csv("costs.csv", group_by="feature", period="30d")
await watcher.report.export_json("costs.json", group_by="model", period="7d")llmwatch report --group-by feature --period 7d
llmwatch export costs.csv --format csv
llmwatch pricing list --provider openai
llmwatch pricing syncpip install llmwatch[dashboard]
llmwatch dashboard
# Opens at http://localhost:8000By default, llmwatch uses SQLite (~/.llmwatch/usage.db). Switch to other backends by passing a storage instance:
from llmwatch.tracker import LLMWatch
from llmwatch.databases.sqlalchemy import Storage
# PostgreSQL
watcher = LLMWatch(
client=client,
storage=Storage("postgresql+asyncpg://user:password@localhost/llmwatch"),
)
# MySQL
watcher = LLMWatch(
client=client,
storage=Storage("mysql+aiomysql://user:password@localhost/llmwatch"),
)from llmwatch.tracker import LLMWatch
from llmwatch.databases.mongo import MongoStorage
watcher = LLMWatch(
client=client,
storage=MongoStorage("mongodb://localhost:27017", database="llmwatch"),
)| Command | Description |
|---|---|
llmwatch report |
Generate cost report (--group-by, --period) |
llmwatch export |
Export usage records to CSV or JSON |
llmwatch prune |
Delete old records by date |
llmwatch stats |
Show database statistics |
llmwatch pricing list |
List pricing data by provider |
llmwatch pricing sync |
Sync pricing data from upstream |
llmwatch dashboard |
Start interactive web dashboard |
- Instrument —
LLMWatch(client=client)patches the SDK client's methods - Extract — On each LLM call, extractors normalize the response (handles OpenAI, Anthropic, Google, streaming)
- Calculate —
calculate_cost()computes USD cost using bundled pricing data - Store —
Storage.save()persists theUsageRecordto your database - Tag —
@watcher.tracked()provides tag context (feature, user_id, environment) - Alert — Optional
BudgetAlertcallbacks trigger when thresholds are exceeded - Report —
Reportergenerates cost summaries grouped by feature, user, model, or provider
LLM Call
| (via instrumented SDK client)
Extract Response -> Calculate Cost -> Save Record + Tags
|
Database
| (queried by Reporter)
Reports, Dashboards, Exports
uv sync --group dev
uv run pytest tests/ -v
uv run ruff check src/ tests/
uv run mypy src/llmwatch/MIT
Pricing data sourced from pydantic/genai-prices.