Skip to content

ariobarin/which-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

which-llm

License: MIT Daily refresh Last refresh GitHub stars

A Claude Code skill that resolves "which model should I use?" to a real, current answer. Joins the Artificial Analysis leaderboard (520+ models, intelligence/cost/benchmarks) with the OpenRouter catalog (slug availability, :free tier reality) into a single queryable dataset your agent can reason over. Refreshed daily.

Install

/plugin marketplace add ariobarin/which-llm
/plugin install which-llm@which-llm

Auto-updates when this repo ships a new version. Requires Python 3.10+ and uv.

Direct install without the plugin system
git clone https://github.com/ariobarin/which-llm /tmp/which-llm
cp -r /tmp/which-llm/plugins/which-llm/skills/which-llm ~/.claude/skills/which-llm

Example output

$ uv run python query.py models --intel-min 50 --max-cost 500 --modality text,image --top 5

slug                  name                                     creator   intel  idx-run$  ctx      free  openrouter
--------------------  ---------------------------------------  --------  -----  --------  -------  ----  --------------------------
deepseek-v4-pro       DeepSeek V4 Pro (Reasoning, Max Effort)  DeepSeek  51.5   $267.82   1000000        deepseek/deepseek-v4-pro
grok-4-3              Grok 4.3 (high)                          xAI       53.2   $395.17   1000000        x-ai/grok-4.3
mimo-v2-5-pro         MiMo-V2.5-Pro                            Xiaomi    53.8   $461.59   1000000        xiaomi/mimo-v2.5-pro

idx-run$ = USD to run AA's full benchmark suite once on the model — a relative inference-cost proxy, not a per-call price. For actual API pricing, use price_1m_input_tokens / price_1m_output_tokens.

About :free OpenRouter slugs: These aren't "the free version of the model" — they're community / promotional endpoints (often via Chutes or similar) with aggressive rate limits, daily caps, and sometimes different quantization than the paid listing. Great for prototyping; don't wire them into production without testing throughput against your real load.

What your agent will do with it

Trigger phrases that activate the skill:

"I need a vision model under $500 with reasoning. What are my options?" "Is there a free version of DeepSeek V4 Flash on OpenRouter?" "Cheapest model with intelligence > 50?" "Compare GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro."

Under the hood the agent runs short query.py commands and reasons over the output.

Commands

Three verbs, one consistent table schema.

Command Use
query.py models [<pattern>] [filters] Filter / rank / list models. Default: top 20 by intel.
query.py show <slug> Full per-model profile (benchmarks, pricing, OR slugs, modalities). Accepts fuzzy slug if unambiguous.
query.py data status Data freshness, model count, OpenRouter enrichment status
query.py data refresh Re-scrape AA + cross-reference OR (~10s)

models flags: --top N, --sort intel|cost|ctx, --pareto, --free, --intel-min N, --max-cost N, --min-cost N, --context-min N, --modality text,image,audio,video, --reasoning/--no-reasoning, --open-weights/--no-open-weights, --json.

plot_pareto.py renders the Intelligence-vs-Cost Pareto chart as a PNG for visual exploration.

How it works

  1. scrape.py fetches artificialanalysis.ai/models (an 8 MB HTML page) and parses the Next.js RSC payload, extracting every model object with its full schema — 60+ fields including individual benchmarks, pricing tiers, modality flags, context window, reasoning capability.
  2. enrich.py fetches the OpenRouter catalog and matches each AA model against it by name, with token-multiset fallback for word-order differences. Current match rate ~51% — the rest are mostly models OpenRouter doesn't carry.
  3. query.py reads the merged CSV and answers structured questions.
  4. A daily GitHub Action re-runs steps 1-2 and commits any changes, so the shipped snapshot is rarely more than 24h stale.

No API keys, no auth, no rate-limited services — just public pages.

Data files

File Contents
artifacts/models_enriched.csv The full merged dataset (60+ columns per row)
artifacts/models.json Original AA fields, preserved exactly
artifacts/openrouter.json Raw OpenRouter catalog

When NOT to use

  • Benchmarks AA doesn't track (domain-specific evals).
  • Models too new for AA to have indexed (<1 week post-release sometimes).
  • For an authoritative per-API-call price on a non-OR provider — verify directly with that provider.

License

MIT. See LICENSE.

Credits

Data from Artificial Analysis and OpenRouter. Scrapes only public pages, no credentials required.

About

Claude Code skill: joins the Artificial Analysis leaderboard with the OpenRouter catalog (slug availability, :free tier reality). 520+ models, refreshed daily.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages