Compare pay-as-you-go LLM inference pricing across inference providers. Enter your token volumes and find the cheapest option.
Live site: https://payg-inference-calculator.pages.dev
scripts/fetch-pricing.mjsfetches pricing from 3 tiers: direct providers (DeepInfra, Crof, EmberCloud, Wafer, Synthetic, Lilac), OpenRouter's de-aggregated/endpointsAPI (per-backend pricing for Fireworks, Together, Novita, SiliconFlow, etc.), and CSV-sourced static providers (Hyper, Makora, Xiaomimimo, OpenCode Go). Normalizes all pricing to $/M tokens and writespublic/pricing.json.public/is a zero-dependency static site (HTML/CSS/JS) that loadspricing.jsonclient-side and computes costs in-browser.- GitHub Actions runs the fetch script daily (
0 0 * * *UTC), commits updated pricing, and deploys to Cloudflare Pages.
- Search by provider: Type a provider name (e.g. "deepinfra", "fireworks", "wafer") to filter results to that inference provider across all models.
- Search by model: Type a model name (e.g. "glm", "kimi", "gpt-4o") to filter results to matching models across all providers.
- Both together: Use both search fields simultaneously (AND filter).
- Token input: Enter total tokens (in millions) and set the percentage breakdown across input, cached input, and output. The calculator computes costs per offering and sorts cheapest-first.
- Promo badges: Discounted offerings show a "promo" badge with the discount percentage. These are temporary prices — structural prices have no badge.
Costs are computed from a total token volume + percentage breakdown:
| Field | Default | Description |
|---|---|---|
| Total tokens | 1000 (M) | Total tokens in millions (1000 = 1B tokens) |
| Input % | 2.5% | Tokens sent to the model |
| Cached input % | 97% | Cached prompt tokens (discounted input) |
| Output % | 0.5% | Tokens generated by the model |
Example: 1000M tokens × 2.5% = 25M input tokens. Cost = (25M × $/M) / 1e6.
Presets: Agentic (2.5/97/0.5), Balanced (30/50/20), Heavy output (10/0/90), No cache (70/0/30).
| Source | Tier | Description |
|---|---|---|
| Direct providers | Tier 1 | DeepInfra, Crof, EmberCloud, Wafer, Synthetic, Lilac — fetched via their own /v1/models endpoints |
OpenRouter /endpoints |
Tier 2 | De-aggregated per-backend pricing — each backend (Fireworks, Together, Novita, SiliconFlow, etc.) becomes its own row |
| CSV-sourced | Tier 3 | Hyper, Makora, Xiaomimimo (from data/manual-pricing.csv) |
| Hardcoded | Tier 3 | OpenCode Go (16 models with user-provided pricing) |
3-tier precedence: when the same (model, provider) appears in multiple tiers, the higher-authority tier wins — direct > OpenRouter > CSV/hardcoded. Quantization is not part of the dedup key — same model+provider at different quants collapses to one row.
Total: ~892 text-generation models across ~75 inference providers and 60+ underlying orgs (Anthropic, OpenAI, Google, DeepSeek, Z.ai, Qwen, Meta, Mistral, etc.)
Only text-generation models are included. TTS, image generation, video generation, and embeddings are filtered out. Multimodal input (text+image→text) is allowed.
# Fetch pricing data (~317 API calls, ~15-20s)
npm run fetch
# Serve locally
npm run serveRequires Node ≥18 (uses native fetch). No dependencies.
scripts/
fetch-pricing.mjs # 3-tier fetch + OpenRouter de-aggregation + org extraction + dedup
data/
manual-pricing.csv # Static pricing for CSV-sourced providers
public/
index.html # UI: dual search, usage inputs, results table (8 columns)
app.js # State, search, cost computation, rendering (promo badges)
styles.css # Dark/light theme, promo-badge, header-row, responsive
pricing.json # Generated data (refreshed daily by CI)
.github/workflows/
refresh-pricing.yml # Daily cron: fetch → commit → deploy to Cloudflare
The refresh-pricing.yml workflow runs daily at 00:00 UTC:
- Fetches pricing from all sources (~317 API calls)
- Filters to text-generation models only
- Applies 3-tier dedup precedence
- Aborts if >20% of API calls fail or model count drops >15% vs previous run
- Commits
pricing.jsonif changed - Deploys
public/to Cloudflare Pages
GitHub secrets required: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID.
MIT