Skip to content

WyrdWerk/tokenwatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💰 TokenWatch

Compare pay-as-you-go LLM inference pricing across inference providers. Enter your token volumes and find the cheapest option.

Live site: https://payg-inference-calculator.pages.dev

How it works

  1. scripts/fetch-pricing.mjs fetches pricing from 3 tiers: direct providers (DeepInfra, Crof, EmberCloud, Wafer, Synthetic, Lilac), OpenRouter's de-aggregated /endpoints API (per-backend pricing for Fireworks, Together, Novita, SiliconFlow, etc.), and CSV-sourced static providers (Hyper, Makora, Xiaomimimo, OpenCode Go). Normalizes all pricing to $/M tokens and writes public/pricing.json.
  2. public/ is a zero-dependency static site (HTML/CSS/JS) that loads pricing.json client-side and computes costs in-browser.
  3. GitHub Actions runs the fetch script daily (0 0 * * * UTC), commits updated pricing, and deploys to Cloudflare Pages.

Usage

  • Search by provider: Type a provider name (e.g. "deepinfra", "fireworks", "wafer") to filter results to that inference provider across all models.
  • Search by model: Type a model name (e.g. "glm", "kimi", "gpt-4o") to filter results to matching models across all providers.
  • Both together: Use both search fields simultaneously (AND filter).
  • Token input: Enter total tokens (in millions) and set the percentage breakdown across input, cached input, and output. The calculator computes costs per offering and sorts cheapest-first.
  • Promo badges: Discounted offerings show a "promo" badge with the discount percentage. These are temporary prices — structural prices have no badge.

Token calculation

Costs are computed from a total token volume + percentage breakdown:

Field Default Description
Total tokens 1000 (M) Total tokens in millions (1000 = 1B tokens)
Input % 2.5% Tokens sent to the model
Cached input % 97% Cached prompt tokens (discounted input)
Output % 0.5% Tokens generated by the model

Example: 1000M tokens × 2.5% = 25M input tokens. Cost = (25M × $/M) / 1e6.

Presets: Agentic (2.5/97/0.5), Balanced (30/50/20), Heavy output (10/0/90), No cache (70/0/30).

Data sources

Source Tier Description
Direct providers Tier 1 DeepInfra, Crof, EmberCloud, Wafer, Synthetic, Lilac — fetched via their own /v1/models endpoints
OpenRouter /endpoints Tier 2 De-aggregated per-backend pricing — each backend (Fireworks, Together, Novita, SiliconFlow, etc.) becomes its own row
CSV-sourced Tier 3 Hyper, Makora, Xiaomimimo (from data/manual-pricing.csv)
Hardcoded Tier 3 OpenCode Go (16 models with user-provided pricing)

3-tier precedence: when the same (model, provider) appears in multiple tiers, the higher-authority tier wins — direct > OpenRouter > CSV/hardcoded. Quantization is not part of the dedup key — same model+provider at different quants collapses to one row.

Total: ~892 text-generation models across ~75 inference providers and 60+ underlying orgs (Anthropic, OpenAI, Google, DeepSeek, Z.ai, Qwen, Meta, Mistral, etc.)

Only text-generation models are included. TTS, image generation, video generation, and embeddings are filtered out. Multimodal input (text+image→text) is allowed.

Development

# Fetch pricing data (~317 API calls, ~15-20s)
npm run fetch

# Serve locally
npm run serve

Requires Node ≥18 (uses native fetch). No dependencies.

Project structure

scripts/
  fetch-pricing.mjs          # 3-tier fetch + OpenRouter de-aggregation + org extraction + dedup
data/
  manual-pricing.csv          # Static pricing for CSV-sourced providers
public/
  index.html                 # UI: dual search, usage inputs, results table (8 columns)
  app.js                     # State, search, cost computation, rendering (promo badges)
  styles.css                 # Dark/light theme, promo-badge, header-row, responsive
  pricing.json               # Generated data (refreshed daily by CI)
.github/workflows/
  refresh-pricing.yml        # Daily cron: fetch → commit → deploy to Cloudflare

CI/CD

The refresh-pricing.yml workflow runs daily at 00:00 UTC:

  1. Fetches pricing from all sources (~317 API calls)
  2. Filters to text-generation models only
  3. Applies 3-tier dedup precedence
  4. Aborts if >20% of API calls fail or model count drops >15% vs previous run
  5. Commits pricing.json if changed
  6. Deploys public/ to Cloudflare Pages

GitHub secrets required: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID.

License

MIT

About

Compare pay-as-you-go LLM inference pricing across providers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors