Versuz

Skills go in. Only one wins.

The open public benchmark for AI agent skills.

Versuz indexes every public Claude skill and CLAUDE.md on GitHub, runs them through 5 held-out tasks, lets three frontier LLMs judge the outputs, and publishes a ranking based on actual performance, not stars.

Free · Open · Updated daily

🌐 versuz.dev · 📦 npx versuz · 🔌 @versuz/mcp · 📚 Methodology

Why it exists

In May 2026, the Claude Code ecosystem already lists 4,200+ skills on claudemarketplaces.com and 1.2M+ on skillsmp.com. Anthropic has shipped its own enterprise marketplace. None of them answer the only question a developer actually asks :

"Of the 47 skills claiming to do task X, which one actually works?"

Existing directories rank by stars and installs — popularity, not quality. Versuz benchmarks the outputs. Three frontier judges grade every skill against the same task suite, and the winners rank #1 in their category.

It's LMArena, but for agent skills.

What's indexed

	Count (May 2026)	Auto-discovered from
SKILL.md files	~2,590	GitHub Code Search · Sourcegraph · 14 awesome-* lists · 26 GitHub Topics
CLAUDE.md files	~3,474	Same pipeline
Quality-judged	~714 (and growing)	LLM 5-axis judge (Groq Llama / Gemini / OpenRouter)
Bench-judged Elo	Active (cycle #21 live)	Haiku 4.5 + DeepSeek V4 Flash + GPT-5 mini via OpenRouter
Categories	6 skill + 8 CLAUDE.md	document · sql · data · web · shell · code · nextjs · react · python-data · backend-api · mobile · devops · ml-training · generic
Official orgs flagged	~30	anthropics, openai, google, vercel, stripe, supabase, etc.

Three commercial tiers

Free — scraped public, verified progressively (5-level trust ladder : claimed → verified → reviewed → featured)
Premium — author-listed, fixed price, Versuz takes 30% / author keeps 70% via Stripe Connect Express
Featured — Versuz first-party curation, 100% Versuz

Consume Versuz from anywhere

Web UI

versuz.dev/marketplace — full registry with filters (kind, category, tier, trust level, quality, tokens, official, topic).

CLI (beta)

npx versuz                          # interactive mode
npx versuz search pdf               # cross-kind full-text search
npx versuz install <slug>           # download to .claude/skills/<slug>/
npx versuz login                    # auth with GitHub PAT
npx versuz submit <github-url>      # share your skill (own repo only)

→ Full CLI docs

MCP server (beta)

claude mcp add versuz npx -y @versuz/mcp

Claude Code now has 5 native tools : versuz_search, versuz_list_skills, versuz_list_claude_md, versuz_get, versuz_install. Ask "find me a PDF skill and install it" — Claude searches, inspects, downloads, all inline.

→ Full MCP docs

Open data

GET /api/v1/skills · GET /api/v1/skills/<slug> · /content
GET /api/v1/claude-md · same shape
GET /feed/skills · GET /feed/claude-md — RSS 2.0
GET /sitemap.xml
Per-item embed badges : /badge/<kind>/<slug> (SVG)

How it works (short)

Discover — multi-source scrape (GitHub Code Search + Sourcegraph + awesome-* aggregators) with multi-token rotation, content-hash dedup, official-org auto-flag.
Score — LLM 5-axis quality judge (clarity, specificity, completeness, structure, usefulness) on every item. Mean 67/100, target distribution N(65, 12).
Benchmark — for each category, every skill runs through 5 held-out tasks (N=5 = statistically validated sweet spot). Output dedup by content hash so identical runs are reused.
Judge — 3 frontier LLMs grade each output independently. Judge dedup by output hash. Prompt cache hits ~82%, cutting input cost by ~57%.
Aggregate — Bayesian Elo per category. Refreshed every 15 min via Vercel cron.

Full methodology : versuz.dev/methodology.

Stack

Layer	Tech
Frontend	Next.js 16 (App Router · Turbopack · React Compiler) · React 19 · JavaScript
Styling	Tailwind v4 (`@theme inline`) · Instrument Serif + Geist + JetBrains Mono
Backend	Next.js API routes · Postgres via Supabase · service-role admin client
Storage	Supabase Storage (`premium-content` bucket, private + signed URLs)
Hosting	Vercel · 6 daily/weekly crons
Auth	Supabase Auth + GitHub OAuth
Payments	Stripe Connect Express + destination charges (70/30)
Email	Resend SMTP (welcome + branded receipts)
Bench	Pure Node in `scripts/bench/` · OpenRouter (1 key, 200+ models) · multi-tier modes (`dev` / `v1` / `or-v1` / `prod` / `gold`)
CLI	Pure Node + figlet + chalk + cli-table3 + ora + prompts
MCP	`@modelcontextprotocol/sdk`

Local dev

git clone https://github.com/TomaTV/versuz
cd versuz
npm install
cp .env.local.example .env.local   # fill Supabase, Stripe, Resend, GitHub PATs
npm run dev                        # http://localhost:3000

Useful scripts

Command	What it does
`npm run dev`	Dev server (Turbopack)
`npm run build` / `start`	Prod build & run
`npm run scrape:skills`	Scrape SKILL.md via GitHub Code Search
`npm run scrape:claude-md`	Same for CLAUDE.md
`npm run scrape:aggregators`	14 awesome-* + 8 GitHub Topics
`npm run scrape:codesearch`	Sourcegraph stream API
`npm run scrape:max`	Exhaustive mode (40 sub-queries × 2 kinds)
`npm run scrape:all`	All scrapers chained
`npm run bench`	Run a queued cycle (agent + judge + refresh)
`npm run bench:quality`	LLM quality judge on un-judged items
`npm run seed`	Push fixtures to Supabase

Env vars (essentials)

NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=
GITHUB_TOKENS=ghp_xxx,ghp_yyy,ghp_zzz   # multi-token rotation
OPENROUTER_API_KEY=                     # bench mode or-v1
STRIPE_SECRET_KEY=
STRIPE_WEBHOOK_SECRET=
RESEND_API_KEY=
ADMIN_GITHUB_LOGINS=YourGitHubLogin

Repo layout

versuz/
├── src/                     # Next.js app + lib + components
│   ├── app/                 # App Router pages + /api routes
│   ├── components/          # Brand, site, marketplace, motion, …
│   └── lib/                 # queries, auth, stripe, supabase, …
├── scripts/                 # Node scrapers + bench engine
│   ├── scrape/              # GitHub Code Search (SKILL.md)
│   ├── scrape-claude-md/    # GitHub Code Search (CLAUDE.md)
│   ├── scrape-aggregators/  # 14 awesome-* + 8 topics
│   ├── scrape-codesearch/   # Sourcegraph adapter
│   └── bench/               # Agent + judge + cycle orchestrator
├── supabase/migrations/     # 22 SQL migrations
├── cli/                     # `npx versuz` CLI package
├── mcp-server/              # `@versuz/mcp` MCP server package
├── docs/                    # Go-live checklists (stripe, domain, cli-mcp)
├── .ui/                     # Design reference (Claude Design v1)
├── CONTEXT.md               # Long-form project context
├── CLAUDE.md                # Agent working agreement
└── TODO.md                  # Task list

Bench cost calculator

Mode or-v1 (default) : Haiku 4.5 + DeepSeek V4 Flash + GPT-5 mini via OpenRouter. One key, one dashboard.

Scale	Outputs	Judge scores	Cost (with ~82% cache hit)
100 skills × 5 tasks	500	1,500	~$1.60
1,000 skills × 5 tasks	5,000	15,000	~$16
5,000 skills × 5 tasks	25,000	75,000	~$80
Full catalog (~5,200)	26,000	78,000	~$83

Why some judges cost more : DeepSeek V4 Flash emits internal reasoning tokens (Chain-of-Thought) before the JSON score. These invisible tokens are billed but not shown. A cap of 900 output tokens keeps reasoning concise without truncating the JSON. Override via BENCH_JUDGE_MAX_TOKENS.

Optimisations available :

BENCH_JUDGE_COUNT=2 → cut judge calls by 33%, save ~$27 on 5k skills
BENCH_JUDGE_MAX_TOKENS=500 → squeeze DeepSeek reasoning further, save ~$15 on 5k skills
BENCH_MODE=or-thrift → single judge (GPT-5 nano), ~$0.86/day
BENCH_MODE=dev → 3 free Groq Llama models, $0 (3000 RPD cap)

Roadmap

✅ V0 — public marketplace, scraping pipeline, quality judge, CLI v0.1, MCP v0.1
✅ V0.5 — bench engine live (or-v1 with prompt cache), first cycles completed
⏳ V1 — domain launch, Stripe live mode, full catalog bench
🔮 V2 — real-time battles (LMArena-style), dark theme, API plans for companies

→ Full plan in TODO.md.

Built by

FlukX Studio — Toma, solo build from France. EdTech automation + design + full-stack.

License

MIT — see LICENSE.

_{made with ember}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.ads		.ads
.claude		.claude
.github/workflows		.github/workflows
.ui		.ui
cli		cli
docs		docs
mcp-server		mcp-server
public		public
scripts		scripts
src		src
supabase/migrations		supabase/migrations
vz-skills		vz-skills
.env.local.example		.env.local.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
.quality-judge-checkpoint.json		.quality-judge-checkpoint.json
AGENTS.md		AGENTS.md
BingSiteAuth.xml		BingSiteAuth.xml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
eslint.config.mjs		eslint.config.mjs
jsconfig.json		jsconfig.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Versuz

Skills go in. Only one wins.

Why it exists

What's indexed

Three commercial tiers

Consume Versuz from anywhere

Web UI

CLI (beta)

MCP server (beta)

Open data

How it works (short)

Stack

Local dev

Useful scripts

Env vars (essentials)

Repo layout

Bench cost calculator

Roadmap

Built by

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Versuz

Skills go in. Only one wins.

Why it exists

What's indexed

Three commercial tiers

Consume Versuz from anywhere

Web UI

CLI (beta)

MCP server (beta)

Open data

How it works (short)

Stack

Local dev

Useful scripts

Env vars (essentials)

Repo layout

Bench cost calculator

Roadmap

Built by

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages