SpecBench

Custom eval suite generator for language models. Describe your model, get a targeted benchmark, score your outputs, and compare architectures and training runs.

Built for ML engineers who build small models from scratch.

Live: https://specbench.vercel.app

What it does

Most benchmarks like MMLU or HellaSwag are useless for custom small models. SpecBench generates an eval suite specific to what your model actually does, scores your outputs against a rubric, and tells you where it is weak and why.

Eval Suite Generation Describe your model and dataset. SpecBench decomposes it into capability axes and generates targeted prompts across standard, adversarial, and edge case types. Supports multilingual datasets. Parallel workers speed up generation for large suites.

Scoring Paste your model outputs or upload a JSON batch file. SpecBench scores each output 1 to 5 against a generated rubric and produces a per-axis report with weaknesses called out and concrete recommendations.

Eval Script Paste your model architecture and SpecBench generates a runnable Python eval script tailored to your implementation.

Diff Compare two model output sets, architectures, eval suites, or training configs. Get an AI verdict on which is better and why.

Getting started

SpecBench runs entirely on your own API key. No key is stored on any server.

Supported providers:

Google AI Studio (Gemini 3.1 Flash Lite recommended)
Groq (LLaMA 3.3 70B recommended)

Get your key:

Gemini: https://aistudio.google.com/apikey
Groq: https://console.groq.com/keys

Open https://specbench.vercel.app, paste your key on the setup screen, and start.

Stack

Frontend: React, Vite, deployed on Vercel

Backend: Node.js, Express, deployed on Render

The backend runs on Render's free tier and may take 30 to 50 seconds to respond on the first request after a period of inactivity. Subsequent requests are fast.

Running locally

Backend

cd backend
npm install
npm run dev

Starts on http://localhost:4000.

Frontend

cd frontend
npm install
npm run dev

Create a .env file in the frontend directory:

VITE_BACKEND_URL=http://localhost:4000

Starts on http://localhost:3000.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
frontend		frontend
README.md		README.md
img1.png		img1.png
img2.png		img2.png
img3.png		img3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpecBench

What it does

Getting started

Stack

Running locally

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpecBench

What it does

Getting started

Stack

Running locally

Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages