A Claude Code skill that resolves "which model should I use?" to a real, current answer. Joins the Artificial Analysis leaderboard (520+ models, intelligence/cost/benchmarks) with the OpenRouter catalog (slug availability, :free tier reality) into a single queryable dataset your agent can reason over. Refreshed daily.
/plugin marketplace add ariobarin/which-llm
/plugin install which-llm@which-llm
Auto-updates when this repo ships a new version. Requires Python 3.10+ and uv.
Direct install without the plugin system
git clone https://github.com/ariobarin/which-llm /tmp/which-llm
cp -r /tmp/which-llm/plugins/which-llm/skills/which-llm ~/.claude/skills/which-llm$ uv run python query.py models --intel-min 50 --max-cost 500 --modality text,image --top 5
slug name creator intel idx-run$ ctx free openrouter
-------------------- --------------------------------------- -------- ----- -------- ------- ---- --------------------------
deepseek-v4-pro DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek 51.5 $267.82 1000000 deepseek/deepseek-v4-pro
grok-4-3 Grok 4.3 (high) xAI 53.2 $395.17 1000000 x-ai/grok-4.3
mimo-v2-5-pro MiMo-V2.5-Pro Xiaomi 53.8 $461.59 1000000 xiaomi/mimo-v2.5-pro
idx-run$ = USD to run AA's full benchmark suite once on the model — a relative inference-cost proxy, not a per-call price. For actual API pricing, use price_1m_input_tokens / price_1m_output_tokens.
⚠ About
:freeOpenRouter slugs: These aren't "the free version of the model" — they're community / promotional endpoints (often via Chutes or similar) with aggressive rate limits, daily caps, and sometimes different quantization than the paid listing. Great for prototyping; don't wire them into production without testing throughput against your real load.
Trigger phrases that activate the skill:
"I need a vision model under $500 with reasoning. What are my options?" "Is there a free version of DeepSeek V4 Flash on OpenRouter?" "Cheapest model with intelligence > 50?" "Compare GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro."
Under the hood the agent runs short query.py commands and reasons over the output.
Three verbs, one consistent table schema.
| Command | Use |
|---|---|
query.py models [<pattern>] [filters] |
Filter / rank / list models. Default: top 20 by intel. |
query.py show <slug> |
Full per-model profile (benchmarks, pricing, OR slugs, modalities). Accepts fuzzy slug if unambiguous. |
query.py data status |
Data freshness, model count, OpenRouter enrichment status |
query.py data refresh |
Re-scrape AA + cross-reference OR (~10s) |
models flags: --top N, --sort intel|cost|ctx, --pareto, --free, --intel-min N, --max-cost N, --min-cost N, --context-min N, --modality text,image,audio,video, --reasoning/--no-reasoning, --open-weights/--no-open-weights, --json.
plot_pareto.py renders the Intelligence-vs-Cost Pareto chart as a PNG for visual exploration.
scrape.pyfetchesartificialanalysis.ai/models(an 8 MB HTML page) and parses the Next.js RSC payload, extracting every model object with its full schema — 60+ fields including individual benchmarks, pricing tiers, modality flags, context window, reasoning capability.enrich.pyfetches the OpenRouter catalog and matches each AA model against it by name, with token-multiset fallback for word-order differences. Current match rate ~51% — the rest are mostly models OpenRouter doesn't carry.query.pyreads the merged CSV and answers structured questions.- A daily GitHub Action re-runs steps 1-2 and commits any changes, so the shipped snapshot is rarely more than 24h stale.
No API keys, no auth, no rate-limited services — just public pages.
| File | Contents |
|---|---|
artifacts/models_enriched.csv |
The full merged dataset (60+ columns per row) |
artifacts/models.json |
Original AA fields, preserved exactly |
artifacts/openrouter.json |
Raw OpenRouter catalog |
- Benchmarks AA doesn't track (domain-specific evals).
- Models too new for AA to have indexed (<1 week post-release sometimes).
- For an authoritative per-API-call price on a non-OR provider — verify directly with that provider.
MIT. See LICENSE.
Data from Artificial Analysis and OpenRouter. Scrapes only public pages, no credentials required.