Skip to content

Decouple runner onboarding#45

Merged
JuhaoLiang1997 merged 9 commits into
mainfrom
decouple-runner-onboarding
May 15, 2026
Merged

Decouple runner onboarding#45
JuhaoLiang1997 merged 9 commits into
mainfrom
decouple-runner-onboarding

Conversation

@JuhaoLiang1997
Copy link
Copy Markdown
Collaborator

Summary

Type of change

  • New platform support
  • Bug fix (runner, validator, leaderboard, or tooling)
  • Suite definition change
  • Schema change
  • Leaderboard / UI improvement
  • Documentation
  • Other:

Testing

# Commands used to verify

Checklist

  • I have read CONTRIBUTING.md
  • My change does not break existing result.json files (or I have explained the migration path)
  • If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
  • If changing the schema: validate_submission.py updated and all existing results still validate
  • If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
  • I have updated relevant documentation

Related issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

JuhaoLiang1997 and others added 7 commits May 15, 2026 12:46
Adding a runner used to require touching at least three shared files —
README.md, meta.schema.json, and collect_env.py — even when the work
was confined to a single accelerator family. This PR rewires those
touch points so contributors normally only edit files inside their
own runner folder.

What changed:

* README platforms matrix is now auto-generated from each runner's
  meta.json (new optional suite_support / hardware_label fields).
  README.md carries marker comments and tools/generate_platforms_matrix.py
  splices the table in; CI can call --check to fail PRs that get out of
  sync.

* meta.schema.json no longer hard-codes the set of accelerator
  platforms. The platform field is now validated by a lowercase regex,
  and the curated catalogue lives in schema/platforms.json — purely for
  presentation (display label, sort order). validate_runners.py prints
  a non-fatal warning when it meets an uncatalogued platform.

* collect_env.py is split into a thin orchestrator plus one
  self-contained plug-in per accelerator family under runners/platforms/
  (nvidia, amd, ascend, apple, google, moorethreads). Plug-ins are
  auto-discovered; adding a new accelerator only requires dropping a
  single file in that directory. env_info.json now carries an
  accelerator_platform field identifying the active plug-in.

Side effects worth flagging:

* The regenerated README matrix now includes the apple_mlx_lm and
  nvidia_sglang_c43a8309 runners that had been missed in the
  hand-maintained table.

* All 7 existing runners gained explicit suite_support entries; no
  behaviour change, just self-description used by the generator.

* runners/README.md got a new "Adding a new accelerator family"
  section that documents the plug-in protocol.

Co-authored-by: Cursor <cursoragent@cursor.com>
* Remove the older SGLang runner (nvidia_sglang_6da83845, sglang 0.4.0
  / torch 2.5.1 / transformers 4.46.3). The newer nvidia_sglang_c43a8309
  (sglang 0.5.6 / torch 2.9.1 / EAGLE speculative decoding) supersedes
  it in practice. No results in this repo reference the old hash and
  there are no external consumers (pre-open-source), so we delete the
  folder rather than mark deprecated_by — this is the last opportunity
  to do so before the immutability rule kicks in.

* Expand .gitignore so the dozens of locally generated samples.jsonl
  files under results/verified/** stop showing up as untracked, and
  add the common IDE / lint / test-cache directories
  (.idea/, .vscode/, .pytest_cache/, .mypy_cache/, .ruff_cache/,
  .coverage*, .tox/) that contributors typically have.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ed-flow walk-through

* Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) with a small
  benchmark-specific addendum covering fabricated results and vendor
  affiliation disclosure.

* Add SECURITY.md scoping the threat model (code that runs on
  contributor machines + validator bypasses for fake leaderboard
  entries) and pointing reporters at GitHub private security
  advisories instead of public issues.

* Flesh out pyproject.toml with authors, maintainers, keywords,
  Trove classifiers (license, audience, Python 3.10–3.12, platforms),
  and the full set of project.urls (Homepage, Leaderboard,
  Documentation, Repository, Issues, Changelog) so it renders nicely
  on PyPI once we cut a release.

* Rewrite the 'Adding support for a new platform' section of
  CONTRIBUTING.md to match the decoupled onboarding flow that landed
  in the previous commit: a new runner on an existing platform no
  longer needs to touch any shared file, and a brand-new accelerator
  family only needs a single self-contained plug-in under
  runners/platforms/. The section is renamed 'Adding a new runner' to
  reflect what most contributors actually do, with a clearly marked
  sub-section for the rarer 'new accelerator family' case.

* Repoint two README.md links that pointed at the old
  '#adding-support-for-a-new-platform' anchor.

No behavioural changes to the framework or runners.

Co-authored-by: Cursor <cursoragent@cursor.com>
…dation

* Run runners/validate_runners.py over **every** runner folder in the
  repo (not just the ones touched in the current PR). This catches
  drift introduced by shared changes — e.g. a meta.schema.json edit
  that accidentally breaks an unrelated existing runner.

* Run tools/generate_platforms_matrix.py --check on every triggering
  PR. The README 'Supported platforms' matrix is auto-generated from
  each runner's meta.json; if a PR changes a runner's suite_support /
  hardware_label or adds a new runner without regenerating the table,
  the job now fails with a clear instruction to regenerate locally
  and commit the result.

* Expand the workflow's paths trigger to cover the README, the
  platforms catalogue (schema/platforms.json), and the generator
  itself, so the matrix-sync check actually runs when those files
  are modified.

Co-authored-by: Cursor <cursoragent@cursor.com>
…xamples)

Pre-1.0 cleanup before open-sourcing:

* `utils/run_all_{4,8}gpu.sh` — older duplicates of the same-named
  scripts under `examples/`. Nothing references them; drop the folder.

* `configs/runner_configs/runner_*_{523da458,605db33a,9f42fabb}.yaml.example`
  — stale templates whose runner folders were superseded by the current
  hash IDs (`6c18cd8f`, `d4aa9fda`, `c43a8309`). Each surviving runner
  has its own up-to-date `*.yaml.example` companion.

No code path or doc references any of these, so this is a pure delete.

Co-authored-by: Cursor <cursoragent@cursor.com>
…uage

Round of pre-1.0 documentation work driven by the question "what would a
first-time contributor see, and does it look maintained or maintainer-run?".

Visual identity
* New SVG mark + wordmark: a lightning bolt crossing a speedometer arc.
  Lives under `docs/assets/` and renders via `<picture>` with separate
  light/dark variants — GitHub README's `prefers-color-scheme` swap.
* The README header now uses the wordmark instead of an emoji + H1.

README slimming
* Drop the full `Repository structure` tree from the top page. Mature
  projects (PyTorch, vLLM, llama.cpp) don't ship the tree on the front
  door; the trimmed copy in DEVELOPMENT.md is enough for spelunkers.
* Quick-start step 4 is now "open a pull request" with `gh pr create`.
  The issue-bot path is kept verbatim as a one-line escape hatch for
  people who don't want to touch git.
* New top-level links to Discussions for Q&A and to `openclaw_skill/`
  for the optional voice-driven launcher (clearly marked optional).
* Citation now credits "The AccelMark Contributors" alongside the
  original author.

CONTRIBUTING rewrite of the submission flow
* The whole "Submitting your results" section was rewritten under a new
  `## Submitting a result` anchor (referenced from README). PR path is
  primary, with the bot-drafted-PR path as the no-git fallback.
* New paragraph documents the `configs/runner_configs/runner_<id>.yaml`
  gitignore policy explicitly — only the `*.yaml.example` companions
  ship; the live override file is strictly local.
* Verified-tier definition rephrased: it is hardware reproducibility,
  not a maintainer privilege. Anyone with the same chip + runner can
  open a reproduction PR and bump a community result to verified.

Community-facing language cleanup
* `results/README.md`, `suites/README.md`, `DEVELOPMENT.md`, and
  `CONTRIBUTING.md` no longer describe verification / flagging /
  suite-acceptance as maintainer-gated. They read as community
  workflows that anyone can drive.
* Time SLAs ("within a day or two") and "maintainer reviews" copy
  removed from the contribution path so the doc doesn't make promises
  that depend on a single person.

`CODE_OF_CONDUCT.md` and `SECURITY.md` still mention maintainers
intentionally — those documents need a clear enforcement contact and
that's expected of any open-source repo.

Co-authored-by: Cursor <cursoragent@cursor.com>
Round-trip feedback from rendering the new README header in light mode:
* The icon was visually drifting below the wordmark because the SVG was
  packing both "AccelMark" and a tagline into the same image, forcing
  the icon to balance two text lines.
* The leaderboard site still used a bare emoji and had no favicon, so
  there was no continuity between the README and the public site.
* When two runners share the same `framework` string (e.g. `vLLM` ships
  both the stable runner and a future `vllm-0.20` one), result cards
  rendered as indistinguishable "Qwen2.5-0.5B-Instruct · vLLM · BF16"
  rows even though the `framework_version` field already disambiguates.

Logo + README
* `docs/assets/logo-wordmark{,-dark}.svg`: single-row mark of the form
  `[icon] AccelMark`. ViewBox shrunk from 480×96 to 280×72 with the
  icon's geometric centre put exactly on the cap-height midline of the
  AccelMark glyphs. The "Cross-platform LLM inference benchmark"
  tagline previously baked into the SVG is now a separate `<p>` under
  the logo in README, so the brand mark stays compact and reusable.
* README rendering knob: `width="360"` (was 420) to fit the new aspect
  ratio.

Leaderboard site branding
* New `leaderboard/site/favicon.svg` (copy of the standalone icon).
  Registered via `<link rel="icon" type="image/svg+xml" …>` so the tab
  picks it up immediately.
* `header h1` swapped the ⚡ emoji for the inline SVG mark, using a
  dark-theme palette (#FCD34D bolt + #93C5FD gauge) that pops on the
  #0d1117 background. Flex layout for vertical alignment between the
  icon and the title.

Runner disambiguation on cards and tables
* Card layout (line 836): the framework field now reads
  `${framework}${framework_version}`, e.g. `vLLM 0.5.5`. A `title=` on
  the same span exposes `runner: <implementation_id>` on hover when the
  user wants the precise hash.
* Table cell formatter (`formatFramework`): same inline version after
  the framework name (rendered in a muted colour so the framework name
  stays the dominant token), and `implementation_id` is added to the
  hover tooltip alongside the existing version / script / notes lines.

Net effect for the open question raised in review: two vLLM runners on
the same hardware are now visually distinct without anyone editing the
runner's `_get_framework_name()` to fake a variant suffix.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 force-pushed the decouple-runner-onboarding branch from 60af4b8 to b1129ff Compare May 15, 2026 04:46
JuhaoLiang1997 and others added 2 commits May 15, 2026 12:53
Previously the leaderboard deploy workflow only fired on `results/**`
changes, so PRs that touched `leaderboard/site/index.html`,
`leaderboard/generate.py`, or platform metadata could land on main and
never reach the public site until somebody happened to merge a new
result.

Widen the `paths:` filter so any of these can trigger a redeploy:

* `leaderboard/**`               — the static site and generator script
* `tools/generate_platforms_matrix.py` and `schema/platforms.json`
                                 — the README platforms matrix inputs
                                   (the workflow regenerates that too)
* `runners/*/meta.json`          — runner metadata that the leaderboard
                                   surfaces (framework, suite support,
                                   hardware labels)

`workflow_dispatch` stays available as the escape hatch for forcing a
redeploy when nothing in the watched paths changed.

Co-authored-by: Cursor <cursoragent@cursor.com>
All three removals were verified to have zero in-repo dependencies — every
suite.json and the entire codebase is already on the new format.

suite_C/suite.py — stale runner-backend gating
    Eleven lines of commented-out code that gated each quantized format on
    whether the runner declared the backend in SUPPORTED_QUANTIZATION_BACKENDS.
    The strategy changed long ago: now we always send the format through and
    let the inference engine report its own incompatibility (recorded in the
    subprocess summary). The accompanying skip-reason `print` was updated to
    match what actually causes the skip today (the *other* full-precision
    baseline, e.g. FP16 on Ampere where the baseline is BF16).

benchmark_runner._parse_scenarios_config — flat-list legacy
    Five lines that accepted suite.json with `"scenarios": ["accuracy", ...]`
    instead of the documented `{"default": [...], "extra": [...]}`. All seven
    suite.json files are on the dict form; flat-list was never documented for
    external authors. Docstring and the DEVELOPMENT.md line referencing the
    legacy form updated.

benchmark_runner._resolve_requests_path — per-suite requests.jsonl fallback
    Ten lines that fell back to `suites/<id>/requests.jsonl` when a suite had
    no `dataset` key. Every suite.json now declares `dataset:` and points at
    `datasets/<name>/requests.jsonl`; there is no `suites/*/requests.jsonl`
    anywhere in the repo. The function now requires `dataset` and produces a
    pointed error message if it's missing.

Kept on purpose
    `/v1/completions` in `serve/server.py` and the README — that is OpenAI's
    own legacy endpoint (still widely used by older LangChain/llama.cpp/etc.
    clients), not an AccelMark-internal compat shim, so removing it would
    narrow the audience of the drop-in OpenAI replacement we advertise.

Net: -28 lines, +13 lines of clearer code paths, no functional change.
Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 merged commit 3529759 into main May 15, 2026
2 checks passed
@JuhaoLiang1997 JuhaoLiang1997 deleted the decouple-runner-onboarding branch May 15, 2026 05:08
JuhaoLiang1997 added a commit that referenced this pull request May 15, 2026
…#46)

Follow-up to the cleanup in #45. That PR removed the runner-declared
quantization-backend gating logic and renamed the obvious skip-reason in
the headline `print` (line 101), but two sibling references to the old
strategy were missed:

* The function-level docstring still claimed format selection
  intersects with `runner.SUPPORTED_QUANTIZATIONS` and warns on any
  format the runner doesn't declare.
* The per-format final-summary line printed
  `skipped (backend not in SUPPORTED_QUANTIZATION_BACKENDS)`
  even though the `skipped` list now only ever holds the *other*
  full-precision baseline (e.g. FP16 on Ampere where the hw baseline
  is BF16).

Rewrite both so the docstring describes today's policy (always include
the hw-supported full-precision baseline; dispatch every quantized
level; let the inference subprocess decide hardware compatibility) and
the skip-reason print matches what actually causes the entry.

The result.json field name `precision_levels_skipped` is **kept** — it's
a stable schema field already indexed by the leaderboard and used by
older results, so the name stays; only the human-readable strings
around it are corrected.

No functional change.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant