Skip to content

1cat vllm#50

Merged
JuhaoLiang1997 merged 5 commits into
mainfrom
1cat-vllm
May 18, 2026
Merged

1cat vllm#50
JuhaoLiang1997 merged 5 commits into
mainfrom
1cat-vllm

Conversation

@JuhaoLiang1997
Copy link
Copy Markdown
Collaborator

Summary

Type of change

  • New platform support
  • Bug fix (runner, validator, leaderboard, or tooling)
  • Suite definition change
  • Schema change
  • Leaderboard / UI improvement
  • Documentation
  • Other:

Testing

# Commands used to verify

Checklist

  • I have read CONTRIBUTING.md
  • My change does not break existing result.json files (or I have explained the migration path)
  • If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
  • If changing the schema: validate_submission.py updated and all existing results still validate
  • If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
  • I have updated relevant documentation

Related issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

JuhaoLiang1997 and others added 5 commits May 18, 2026 22:10
Adds the AccelMark runner for the 1Cat-vLLM community fork that
re-enables AWQ 4-bit inference on Volta (SM70) Tesla V100 via lmdeploy
TurboMind kernels and the FLASH_ATTN_V100 attention backend.

What is included:

* runners/nvidia_onecat_vllm_a43d1bcf/ — runner.py, meta.json (with
  hardware_label="NVIDIA V100 (SM70)" and suite_support self-declaration),
  requirements.txt, README.md
* configs/runner_configs/runner_nvidia_onecat_vllm_a43d1bcf.yaml.example

The README platforms matrix updates automatically — the hardware label
is taken from meta.hardware_label rather than the catalogue default,
so the V100-specific row is rendered correctly without touching
schema/platforms.json or any shared file.

Capability flags:

* SUPPORTED_PRECISIONS drops BF16 (V100 has no native BF16 datapath).
* SUPPORTED_QUANTIZATION_BACKENDS lists only AWQ — the fork's headline
  contribution; FP8 KV cache and other formats are intentionally not
  exposed by default.
* Auto-injects attention_backend=FLASH_ATTN_V100 unless the user
  overrides it.
* Suite F (Qwen2.5-0.5B-Instruct on a consumer/edge GPU) is marked
  unsupported — 1Cat-vLLM targets dense + MoE on 4 x V100, not edge
  inference.

Initial commit, not yet validated end-to-end on hardware; all
applicable suites are marked "pending".

Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 merged commit 48afa1a into main May 18, 2026
3 checks passed
@JuhaoLiang1997 JuhaoLiang1997 deleted the 1cat-vllm branch May 18, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant