Decouple runner onboarding by JuhaoLiang1997 · Pull Request #45 · FreedomIntelligence/AccelMark

JuhaoLiang1997 · 2026-05-15T04:23:00Z

Summary

Type of change

Testing

# Commands used to verify

Checklist

I have read CONTRIBUTING.md
My change does not break existing result.json files (or I have explained the migration path)
If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
If changing the schema: validate_submission.py updated and all existing results still validate
If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
I have updated relevant documentation

Related issues

github-actions · 2026-05-15T04:23:13Z

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

Adding a runner used to require touching at least three shared files — README.md, meta.schema.json, and collect_env.py — even when the work was confined to a single accelerator family. This PR rewires those touch points so contributors normally only edit files inside their own runner folder. What changed: * README platforms matrix is now auto-generated from each runner's meta.json (new optional suite_support / hardware_label fields). README.md carries marker comments and tools/generate_platforms_matrix.py splices the table in; CI can call --check to fail PRs that get out of sync. * meta.schema.json no longer hard-codes the set of accelerator platforms. The platform field is now validated by a lowercase regex, and the curated catalogue lives in schema/platforms.json — purely for presentation (display label, sort order). validate_runners.py prints a non-fatal warning when it meets an uncatalogued platform. * collect_env.py is split into a thin orchestrator plus one self-contained plug-in per accelerator family under runners/platforms/ (nvidia, amd, ascend, apple, google, moorethreads). Plug-ins are auto-discovered; adding a new accelerator only requires dropping a single file in that directory. env_info.json now carries an accelerator_platform field identifying the active plug-in. Side effects worth flagging: * The regenerated README matrix now includes the apple_mlx_lm and nvidia_sglang_c43a8309 runners that had been missed in the hand-maintained table. * All 7 existing runners gained explicit suite_support entries; no behaviour change, just self-description used by the generator. * runners/README.md got a new "Adding a new accelerator family" section that documents the plug-in protocol. Co-authored-by: Cursor <cursoragent@cursor.com>

* Remove the older SGLang runner (nvidia_sglang_6da83845, sglang 0.4.0 / torch 2.5.1 / transformers 4.46.3). The newer nvidia_sglang_c43a8309 (sglang 0.5.6 / torch 2.9.1 / EAGLE speculative decoding) supersedes it in practice. No results in this repo reference the old hash and there are no external consumers (pre-open-source), so we delete the folder rather than mark deprecated_by — this is the last opportunity to do so before the immutability rule kicks in. * Expand .gitignore so the dozens of locally generated samples.jsonl files under results/verified/** stop showing up as untracked, and add the common IDE / lint / test-cache directories (.idea/, .vscode/, .pytest_cache/, .mypy_cache/, .ruff_cache/, .coverage*, .tox/) that contributors typically have. Co-authored-by: Cursor <cursoragent@cursor.com>

…ed-flow walk-through * Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) with a small benchmark-specific addendum covering fabricated results and vendor affiliation disclosure. * Add SECURITY.md scoping the threat model (code that runs on contributor machines + validator bypasses for fake leaderboard entries) and pointing reporters at GitHub private security advisories instead of public issues. * Flesh out pyproject.toml with authors, maintainers, keywords, Trove classifiers (license, audience, Python 3.10–3.12, platforms), and the full set of project.urls (Homepage, Leaderboard, Documentation, Repository, Issues, Changelog) so it renders nicely on PyPI once we cut a release. * Rewrite the 'Adding support for a new platform' section of CONTRIBUTING.md to match the decoupled onboarding flow that landed in the previous commit: a new runner on an existing platform no longer needs to touch any shared file, and a brand-new accelerator family only needs a single self-contained plug-in under runners/platforms/. The section is renamed 'Adding a new runner' to reflect what most contributors actually do, with a clearly marked sub-section for the rarer 'new accelerator family' case. * Repoint two README.md links that pointed at the old '#adding-support-for-a-new-platform' anchor. No behavioural changes to the framework or runners. Co-authored-by: Cursor <cursoragent@cursor.com>

…dation * Run runners/validate_runners.py over **every** runner folder in the repo (not just the ones touched in the current PR). This catches drift introduced by shared changes — e.g. a meta.schema.json edit that accidentally breaks an unrelated existing runner. * Run tools/generate_platforms_matrix.py --check on every triggering PR. The README 'Supported platforms' matrix is auto-generated from each runner's meta.json; if a PR changes a runner's suite_support / hardware_label or adds a new runner without regenerating the table, the job now fails with a clear instruction to regenerate locally and commit the result. * Expand the workflow's paths trigger to cover the README, the platforms catalogue (schema/platforms.json), and the generator itself, so the matrix-sync check actually runs when those files are modified. Co-authored-by: Cursor <cursoragent@cursor.com>

…xamples) Pre-1.0 cleanup before open-sourcing: * `utils/run_all_{4,8}gpu.sh` — older duplicates of the same-named scripts under `examples/`. Nothing references them; drop the folder. * `configs/runner_configs/runner_*_{523da458,605db33a,9f42fabb}.yaml.example` — stale templates whose runner folders were superseded by the current hash IDs (`6c18cd8f`, `d4aa9fda`, `c43a8309`). Each surviving runner has its own up-to-date `*.yaml.example` companion. No code path or doc references any of these, so this is a pure delete. Co-authored-by: Cursor <cursoragent@cursor.com>

…uage Round of pre-1.0 documentation work driven by the question "what would a first-time contributor see, and does it look maintained or maintainer-run?". Visual identity * New SVG mark + wordmark: a lightning bolt crossing a speedometer arc. Lives under `docs/assets/` and renders via `<picture>` with separate light/dark variants — GitHub README's `prefers-color-scheme` swap. * The README header now uses the wordmark instead of an emoji + H1. README slimming * Drop the full `Repository structure` tree from the top page. Mature projects (PyTorch, vLLM, llama.cpp) don't ship the tree on the front door; the trimmed copy in DEVELOPMENT.md is enough for spelunkers. * Quick-start step 4 is now "open a pull request" with `gh pr create`. The issue-bot path is kept verbatim as a one-line escape hatch for people who don't want to touch git. * New top-level links to Discussions for Q&A and to `openclaw_skill/` for the optional voice-driven launcher (clearly marked optional). * Citation now credits "The AccelMark Contributors" alongside the original author. CONTRIBUTING rewrite of the submission flow * The whole "Submitting your results" section was rewritten under a new `## Submitting a result` anchor (referenced from README). PR path is primary, with the bot-drafted-PR path as the no-git fallback. * New paragraph documents the `configs/runner_configs/runner_<id>.yaml` gitignore policy explicitly — only the `*.yaml.example` companions ship; the live override file is strictly local. * Verified-tier definition rephrased: it is hardware reproducibility, not a maintainer privilege. Anyone with the same chip + runner can open a reproduction PR and bump a community result to verified. Community-facing language cleanup * `results/README.md`, `suites/README.md`, `DEVELOPMENT.md`, and `CONTRIBUTING.md` no longer describe verification / flagging / suite-acceptance as maintainer-gated. They read as community workflows that anyone can drive. * Time SLAs ("within a day or two") and "maintainer reviews" copy removed from the contribution path so the doc doesn't make promises that depend on a single person. `CODE_OF_CONDUCT.md` and `SECURITY.md` still mention maintainers intentionally — those documents need a clear enforcement contact and that's expected of any open-source repo. Co-authored-by: Cursor <cursoragent@cursor.com>

Round-trip feedback from rendering the new README header in light mode: * The icon was visually drifting below the wordmark because the SVG was packing both "AccelMark" and a tagline into the same image, forcing the icon to balance two text lines. * The leaderboard site still used a bare emoji and had no favicon, so there was no continuity between the README and the public site. * When two runners share the same `framework` string (e.g. `vLLM` ships both the stable runner and a future `vllm-0.20` one), result cards rendered as indistinguishable "Qwen2.5-0.5B-Instruct · vLLM · BF16" rows even though the `framework_version` field already disambiguates. Logo + README * `docs/assets/logo-wordmark{,-dark}.svg`: single-row mark of the form `[icon] AccelMark`. ViewBox shrunk from 480×96 to 280×72 with the icon's geometric centre put exactly on the cap-height midline of the AccelMark glyphs. The "Cross-platform LLM inference benchmark" tagline previously baked into the SVG is now a separate `<p>` under the logo in README, so the brand mark stays compact and reusable. * README rendering knob: `width="360"` (was 420) to fit the new aspect ratio. Leaderboard site branding * New `leaderboard/site/favicon.svg` (copy of the standalone icon). Registered via `<link rel="icon" type="image/svg+xml" …>` so the tab picks it up immediately. * `header h1` swapped the ⚡ emoji for the inline SVG mark, using a dark-theme palette (#FCD34D bolt + #93C5FD gauge) that pops on the #0d1117 background. Flex layout for vertical alignment between the icon and the title. Runner disambiguation on cards and tables * Card layout (line 836): the framework field now reads `${framework}${framework_version}`, e.g. `vLLM 0.5.5`. A `title=` on the same span exposes `runner: <implementation_id>` on hover when the user wants the precise hash. * Table cell formatter (`formatFramework`): same inline version after the framework name (rendered in a muted colour so the framework name stays the dominant token), and `implementation_id` is added to the hover tooltip alongside the existing version / script / notes lines. Net effect for the open question raised in review: two vLLM runners on the same hardware are now visually distinct without anyone editing the runner's `_get_framework_name()` to fake a variant suffix. Co-authored-by: Cursor <cursoragent@cursor.com>

Previously the leaderboard deploy workflow only fired on `results/**` changes, so PRs that touched `leaderboard/site/index.html`, `leaderboard/generate.py`, or platform metadata could land on main and never reach the public site until somebody happened to merge a new result. Widen the `paths:` filter so any of these can trigger a redeploy: * `leaderboard/**` — the static site and generator script * `tools/generate_platforms_matrix.py` and `schema/platforms.json` — the README platforms matrix inputs (the workflow regenerates that too) * `runners/*/meta.json` — runner metadata that the leaderboard surfaces (framework, suite support, hardware labels) `workflow_dispatch` stays available as the escape hatch for forcing a redeploy when nothing in the watched paths changed. Co-authored-by: Cursor <cursoragent@cursor.com>

All three removals were verified to have zero in-repo dependencies — every suite.json and the entire codebase is already on the new format. suite_C/suite.py — stale runner-backend gating Eleven lines of commented-out code that gated each quantized format on whether the runner declared the backend in SUPPORTED_QUANTIZATION_BACKENDS. The strategy changed long ago: now we always send the format through and let the inference engine report its own incompatibility (recorded in the subprocess summary). The accompanying skip-reason `print` was updated to match what actually causes the skip today (the *other* full-precision baseline, e.g. FP16 on Ampere where the baseline is BF16). benchmark_runner._parse_scenarios_config — flat-list legacy Five lines that accepted suite.json with `"scenarios": ["accuracy", ...]` instead of the documented `{"default": [...], "extra": [...]}`. All seven suite.json files are on the dict form; flat-list was never documented for external authors. Docstring and the DEVELOPMENT.md line referencing the legacy form updated. benchmark_runner._resolve_requests_path — per-suite requests.jsonl fallback Ten lines that fell back to `suites/<id>/requests.jsonl` when a suite had no `dataset` key. Every suite.json now declares `dataset:` and points at `datasets/<name>/requests.jsonl`; there is no `suites/*/requests.jsonl` anywhere in the repo. The function now requires `dataset` and produces a pointed error message if it's missing. Kept on purpose `/v1/completions` in `serve/server.py` and the README — that is OpenAI's own legacy endpoint (still widely used by older LangChain/llama.cpp/etc. clients), not an AccelMark-internal compat shim, so removing it would narrow the audience of the drop-in OpenAI replacement we advertise. Net: -28 lines, +13 lines of clearer code paths, no functional change. Co-authored-by: Cursor <cursoragent@cursor.com>

…#46) Follow-up to the cleanup in #45. That PR removed the runner-declared quantization-backend gating logic and renamed the obvious skip-reason in the headline `print` (line 101), but two sibling references to the old strategy were missed: * The function-level docstring still claimed format selection intersects with `runner.SUPPORTED_QUANTIZATIONS` and warns on any format the runner doesn't declare. * The per-format final-summary line printed `skipped (backend not in SUPPORTED_QUANTIZATION_BACKENDS)` even though the `skipped` list now only ever holds the *other* full-precision baseline (e.g. FP16 on Ampere where the hw baseline is BF16). Rewrite both so the docstring describes today's policy (always include the hw-supported full-precision baseline; dispatch every quantized level; let the inference subprocess decide hardware compatibility) and the skip-reason print matches what actually causes the entry. The result.json field name `precision_levels_skipped` is **kept** — it's a stable schema field already indexed by the leaderboard and used by older results, so the name stays; only the human-readable strings around it are corrected. No functional change. Co-authored-by: Cursor <cursoragent@cursor.com>

JuhaoLiang1997 and others added 7 commits May 15, 2026 12:46

JuhaoLiang1997 force-pushed the decouple-runner-onboarding branch from 60af4b8 to b1129ff Compare May 15, 2026 04:46

JuhaoLiang1997 and others added 2 commits May 15, 2026 12:53

JuhaoLiang1997 merged commit 3529759 into main May 15, 2026
2 checks passed

JuhaoLiang1997 deleted the decouple-runner-onboarding branch May 15, 2026 05:08

JuhaoLiang1997 mentioned this pull request May 15, 2026

fix(suite_C): describe the actual reason a precision level is skipped #46

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple runner onboarding#45

Decouple runner onboarding#45
JuhaoLiang1997 merged 9 commits into
mainfrom
decouple-runner-onboarding

JuhaoLiang1997 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JuhaoLiang1997 commented May 15, 2026

Summary

Type of change

Testing

Checklist

Related issues

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ AccelMark Validation: All submissions valid

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 15, 2026 •

edited

Loading