Decouple runner onboarding#45
Merged
Merged
Conversation
✅ AccelMark Validation: All submissions validSee the workflow run for details. |
Adding a runner used to require touching at least three shared files — README.md, meta.schema.json, and collect_env.py — even when the work was confined to a single accelerator family. This PR rewires those touch points so contributors normally only edit files inside their own runner folder. What changed: * README platforms matrix is now auto-generated from each runner's meta.json (new optional suite_support / hardware_label fields). README.md carries marker comments and tools/generate_platforms_matrix.py splices the table in; CI can call --check to fail PRs that get out of sync. * meta.schema.json no longer hard-codes the set of accelerator platforms. The platform field is now validated by a lowercase regex, and the curated catalogue lives in schema/platforms.json — purely for presentation (display label, sort order). validate_runners.py prints a non-fatal warning when it meets an uncatalogued platform. * collect_env.py is split into a thin orchestrator plus one self-contained plug-in per accelerator family under runners/platforms/ (nvidia, amd, ascend, apple, google, moorethreads). Plug-ins are auto-discovered; adding a new accelerator only requires dropping a single file in that directory. env_info.json now carries an accelerator_platform field identifying the active plug-in. Side effects worth flagging: * The regenerated README matrix now includes the apple_mlx_lm and nvidia_sglang_c43a8309 runners that had been missed in the hand-maintained table. * All 7 existing runners gained explicit suite_support entries; no behaviour change, just self-description used by the generator. * runners/README.md got a new "Adding a new accelerator family" section that documents the plug-in protocol. Co-authored-by: Cursor <cursoragent@cursor.com>
* Remove the older SGLang runner (nvidia_sglang_6da83845, sglang 0.4.0 / torch 2.5.1 / transformers 4.46.3). The newer nvidia_sglang_c43a8309 (sglang 0.5.6 / torch 2.9.1 / EAGLE speculative decoding) supersedes it in practice. No results in this repo reference the old hash and there are no external consumers (pre-open-source), so we delete the folder rather than mark deprecated_by — this is the last opportunity to do so before the immutability rule kicks in. * Expand .gitignore so the dozens of locally generated samples.jsonl files under results/verified/** stop showing up as untracked, and add the common IDE / lint / test-cache directories (.idea/, .vscode/, .pytest_cache/, .mypy_cache/, .ruff_cache/, .coverage*, .tox/) that contributors typically have. Co-authored-by: Cursor <cursoragent@cursor.com>
…ed-flow walk-through * Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) with a small benchmark-specific addendum covering fabricated results and vendor affiliation disclosure. * Add SECURITY.md scoping the threat model (code that runs on contributor machines + validator bypasses for fake leaderboard entries) and pointing reporters at GitHub private security advisories instead of public issues. * Flesh out pyproject.toml with authors, maintainers, keywords, Trove classifiers (license, audience, Python 3.10–3.12, platforms), and the full set of project.urls (Homepage, Leaderboard, Documentation, Repository, Issues, Changelog) so it renders nicely on PyPI once we cut a release. * Rewrite the 'Adding support for a new platform' section of CONTRIBUTING.md to match the decoupled onboarding flow that landed in the previous commit: a new runner on an existing platform no longer needs to touch any shared file, and a brand-new accelerator family only needs a single self-contained plug-in under runners/platforms/. The section is renamed 'Adding a new runner' to reflect what most contributors actually do, with a clearly marked sub-section for the rarer 'new accelerator family' case. * Repoint two README.md links that pointed at the old '#adding-support-for-a-new-platform' anchor. No behavioural changes to the framework or runners. Co-authored-by: Cursor <cursoragent@cursor.com>
…dation * Run runners/validate_runners.py over **every** runner folder in the repo (not just the ones touched in the current PR). This catches drift introduced by shared changes — e.g. a meta.schema.json edit that accidentally breaks an unrelated existing runner. * Run tools/generate_platforms_matrix.py --check on every triggering PR. The README 'Supported platforms' matrix is auto-generated from each runner's meta.json; if a PR changes a runner's suite_support / hardware_label or adds a new runner without regenerating the table, the job now fails with a clear instruction to regenerate locally and commit the result. * Expand the workflow's paths trigger to cover the README, the platforms catalogue (schema/platforms.json), and the generator itself, so the matrix-sync check actually runs when those files are modified. Co-authored-by: Cursor <cursoragent@cursor.com>
…xamples)
Pre-1.0 cleanup before open-sourcing:
* `utils/run_all_{4,8}gpu.sh` — older duplicates of the same-named
scripts under `examples/`. Nothing references them; drop the folder.
* `configs/runner_configs/runner_*_{523da458,605db33a,9f42fabb}.yaml.example`
— stale templates whose runner folders were superseded by the current
hash IDs (`6c18cd8f`, `d4aa9fda`, `c43a8309`). Each surviving runner
has its own up-to-date `*.yaml.example` companion.
No code path or doc references any of these, so this is a pure delete.
Co-authored-by: Cursor <cursoragent@cursor.com>
…uage
Round of pre-1.0 documentation work driven by the question "what would a
first-time contributor see, and does it look maintained or maintainer-run?".
Visual identity
* New SVG mark + wordmark: a lightning bolt crossing a speedometer arc.
Lives under `docs/assets/` and renders via `<picture>` with separate
light/dark variants — GitHub README's `prefers-color-scheme` swap.
* The README header now uses the wordmark instead of an emoji + H1.
README slimming
* Drop the full `Repository structure` tree from the top page. Mature
projects (PyTorch, vLLM, llama.cpp) don't ship the tree on the front
door; the trimmed copy in DEVELOPMENT.md is enough for spelunkers.
* Quick-start step 4 is now "open a pull request" with `gh pr create`.
The issue-bot path is kept verbatim as a one-line escape hatch for
people who don't want to touch git.
* New top-level links to Discussions for Q&A and to `openclaw_skill/`
for the optional voice-driven launcher (clearly marked optional).
* Citation now credits "The AccelMark Contributors" alongside the
original author.
CONTRIBUTING rewrite of the submission flow
* The whole "Submitting your results" section was rewritten under a new
`## Submitting a result` anchor (referenced from README). PR path is
primary, with the bot-drafted-PR path as the no-git fallback.
* New paragraph documents the `configs/runner_configs/runner_<id>.yaml`
gitignore policy explicitly — only the `*.yaml.example` companions
ship; the live override file is strictly local.
* Verified-tier definition rephrased: it is hardware reproducibility,
not a maintainer privilege. Anyone with the same chip + runner can
open a reproduction PR and bump a community result to verified.
Community-facing language cleanup
* `results/README.md`, `suites/README.md`, `DEVELOPMENT.md`, and
`CONTRIBUTING.md` no longer describe verification / flagging /
suite-acceptance as maintainer-gated. They read as community
workflows that anyone can drive.
* Time SLAs ("within a day or two") and "maintainer reviews" copy
removed from the contribution path so the doc doesn't make promises
that depend on a single person.
`CODE_OF_CONDUCT.md` and `SECURITY.md` still mention maintainers
intentionally — those documents need a clear enforcement contact and
that's expected of any open-source repo.
Co-authored-by: Cursor <cursoragent@cursor.com>
Round-trip feedback from rendering the new README header in light mode:
* The icon was visually drifting below the wordmark because the SVG was
packing both "AccelMark" and a tagline into the same image, forcing
the icon to balance two text lines.
* The leaderboard site still used a bare emoji and had no favicon, so
there was no continuity between the README and the public site.
* When two runners share the same `framework` string (e.g. `vLLM` ships
both the stable runner and a future `vllm-0.20` one), result cards
rendered as indistinguishable "Qwen2.5-0.5B-Instruct · vLLM · BF16"
rows even though the `framework_version` field already disambiguates.
Logo + README
* `docs/assets/logo-wordmark{,-dark}.svg`: single-row mark of the form
`[icon] AccelMark`. ViewBox shrunk from 480×96 to 280×72 with the
icon's geometric centre put exactly on the cap-height midline of the
AccelMark glyphs. The "Cross-platform LLM inference benchmark"
tagline previously baked into the SVG is now a separate `<p>` under
the logo in README, so the brand mark stays compact and reusable.
* README rendering knob: `width="360"` (was 420) to fit the new aspect
ratio.
Leaderboard site branding
* New `leaderboard/site/favicon.svg` (copy of the standalone icon).
Registered via `<link rel="icon" type="image/svg+xml" …>` so the tab
picks it up immediately.
* `header h1` swapped the ⚡ emoji for the inline SVG mark, using a
dark-theme palette (#FCD34D bolt + #93C5FD gauge) that pops on the
#0d1117 background. Flex layout for vertical alignment between the
icon and the title.
Runner disambiguation on cards and tables
* Card layout (line 836): the framework field now reads
`${framework}${framework_version}`, e.g. `vLLM 0.5.5`. A `title=` on
the same span exposes `runner: <implementation_id>` on hover when the
user wants the precise hash.
* Table cell formatter (`formatFramework`): same inline version after
the framework name (rendered in a muted colour so the framework name
stays the dominant token), and `implementation_id` is added to the
hover tooltip alongside the existing version / script / notes lines.
Net effect for the open question raised in review: two vLLM runners on
the same hardware are now visually distinct without anyone editing the
runner's `_get_framework_name()` to fake a variant suffix.
Co-authored-by: Cursor <cursoragent@cursor.com>
60af4b8 to
b1129ff
Compare
Previously the leaderboard deploy workflow only fired on `results/**`
changes, so PRs that touched `leaderboard/site/index.html`,
`leaderboard/generate.py`, or platform metadata could land on main and
never reach the public site until somebody happened to merge a new
result.
Widen the `paths:` filter so any of these can trigger a redeploy:
* `leaderboard/**` — the static site and generator script
* `tools/generate_platforms_matrix.py` and `schema/platforms.json`
— the README platforms matrix inputs
(the workflow regenerates that too)
* `runners/*/meta.json` — runner metadata that the leaderboard
surfaces (framework, suite support,
hardware labels)
`workflow_dispatch` stays available as the escape hatch for forcing a
redeploy when nothing in the watched paths changed.
Co-authored-by: Cursor <cursoragent@cursor.com>
All three removals were verified to have zero in-repo dependencies — every
suite.json and the entire codebase is already on the new format.
suite_C/suite.py — stale runner-backend gating
Eleven lines of commented-out code that gated each quantized format on
whether the runner declared the backend in SUPPORTED_QUANTIZATION_BACKENDS.
The strategy changed long ago: now we always send the format through and
let the inference engine report its own incompatibility (recorded in the
subprocess summary). The accompanying skip-reason `print` was updated to
match what actually causes the skip today (the *other* full-precision
baseline, e.g. FP16 on Ampere where the baseline is BF16).
benchmark_runner._parse_scenarios_config — flat-list legacy
Five lines that accepted suite.json with `"scenarios": ["accuracy", ...]`
instead of the documented `{"default": [...], "extra": [...]}`. All seven
suite.json files are on the dict form; flat-list was never documented for
external authors. Docstring and the DEVELOPMENT.md line referencing the
legacy form updated.
benchmark_runner._resolve_requests_path — per-suite requests.jsonl fallback
Ten lines that fell back to `suites/<id>/requests.jsonl` when a suite had
no `dataset` key. Every suite.json now declares `dataset:` and points at
`datasets/<name>/requests.jsonl`; there is no `suites/*/requests.jsonl`
anywhere in the repo. The function now requires `dataset` and produces a
pointed error message if it's missing.
Kept on purpose
`/v1/completions` in `serve/server.py` and the README — that is OpenAI's
own legacy endpoint (still widely used by older LangChain/llama.cpp/etc.
clients), not an AccelMark-internal compat shim, so removing it would
narrow the audience of the drop-in OpenAI replacement we advertise.
Net: -28 lines, +13 lines of clearer code paths, no functional change.
Co-authored-by: Cursor <cursoragent@cursor.com>
13 tasks
JuhaoLiang1997
added a commit
that referenced
this pull request
May 15, 2026
…#46) Follow-up to the cleanup in #45. That PR removed the runner-declared quantization-backend gating logic and renamed the obvious skip-reason in the headline `print` (line 101), but two sibling references to the old strategy were missed: * The function-level docstring still claimed format selection intersects with `runner.SUPPORTED_QUANTIZATIONS` and warns on any format the runner doesn't declare. * The per-format final-summary line printed `skipped (backend not in SUPPORTED_QUANTIZATION_BACKENDS)` even though the `skipped` list now only ever holds the *other* full-precision baseline (e.g. FP16 on Ampere where the hw baseline is BF16). Rewrite both so the docstring describes today's policy (always include the hw-supported full-precision baseline; dispatch every quantized level; let the inference subprocess decide hardware compatibility) and the skip-reason print matches what actually causes the entry. The result.json field name `precision_levels_skipped` is **kept** — it's a stable schema field already indexed by the leaderboard and used by older results, so the name stays; only the human-readable strings around it are corrected. No functional change. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Type of change
Testing
# Commands used to verifyChecklist
result.jsonfiles (or I have explained the migration path)BenchmarkRunner, produces validresult.json, includes a reference resultvalidate_submission.pyupdated and all existing results still validateleaderboard/generate.pyproduces correct output on existing resultsRelated issues