Revert "[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5"#990
Conversation
…iMax-M2.…" This reverts commit 7d10a7b.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
| search-space: | ||
| - { tp: 2, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
| - { tp: 8, ep: 8, conc-start: 4, conc-end: 512 } |
There was a problem hiding this comment.
🔴 The minimaxm2.5-fp8-mi325x-vllm 1k1k entry uses conc-end: 512 (line 371), which is double the conc-end: 256 used in the companion 8k1k entry and double the value used by the higher-end MI355X for the same workload. The perf-changelog for PR #953 also documents this config's range as 4-256, creating a direct contradiction with the actual code. Please verify whether 512 was intentional or a copy-paste error, as running TP8/EP8 concurrency up to 512 on MI325X risks OOM or unexpectedly long benchmark runs.
Extended reasoning...
What the bug is and how it manifests
In .github/configs/amd-master.yaml, the minimaxm2.5-fp8-mi325x-vllm config has an asymmetric concurrency ceiling: conc-end: 512 for the 1k1k (ISL=1024, OSL=1024) case but conc-end: 256 for the 8k1k (ISL=8192, OSL=1024) case. When the benchmark harness sweeps this config, it will run TP8/EP8 inference at concurrencies up to 512 on MI325X hardware, potentially causing OOM errors or extreme slowdowns.
The specific code path that triggers it
The relevant section after this PR applies:
minimaxm2.5-fp8-mi325x-vllm:
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 64 }
- { tp: 8, ep: 8, conc-start: 4, conc-end: 512 } # <-- suspicious
- isl: 8192
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 64 }
- { tp: 8, ep: 8, conc-start: 4, conc-end: 256 } # <-- 256 hereThis PR introduces this change as part of what is framed as a revert of #989, but the MI325X vllm config was not modified by #989 (which added ATOM configs only), making this an undocumented non-revert modification.
Why existing code doesn't prevent it
The YAML config is consumed directly by the benchmark harness without any upper-bound validation. The perf-changelog for PR #953 documents the range as 4-256, but the actual code uses 512 — the changelog and code disagree. No schema validation catches this inconsistency.
Addressing the key refutation
One verifier checked git show 04d83df and asserts that PR #953 intentionally set conc-end: 512 for 1k1k, with the rationale that 1k1k requests consume less KV cache memory than 8k1k requests (ISL=1024 vs ISL=8192), allowing higher concurrency. This is technically plausible. However, even if intentional in PR #953, the following issues remain: (1) the perf-changelog for #953 explicitly states "add conc range 4-256", directly contradicting the code — at minimum this is a documentation bug that should be corrected; (2) the higher-end MI355X uses conc-end: 256 for the same 1k1k workload with the same TP8/EP8 parallelism, and a lower-tier GPU exceeding the higher-tier GPU's ceiling by 2x is architecturally counterintuitive; (3) the value 512 = 256 × 2 is the classic signature of a doubling copy-paste error.
What the impact would be
If 512 is a typo for 256, benchmark runs will sweep twice as many unnecessary concurrency points, consuming significant GPU-hours and potentially hitting OOM at high concurrencies on MI325X. If 512 is intentional, the perf-changelog documentation must be corrected to avoid confusion for future reviewers.
Step-by-step proof of the inconsistency
- MI355X (higher-end GPU) uses
tp: 8, ep: 8, conc-start: 32, conc-end: 256for 1k1k. - MI325X (lower-end GPU) now uses
tp: 8, ep: 8, conc-start: 4, conc-end: 512for 1k1k. - Same config on MI325X, 8k1k uses
conc-end: 256. - The perf-changelog for PR [AMD] Update Minimax M2.5 MI325 image and adjust search space #953 says "add conc range 4-256" for
minimaxm2.5-fp8-mi325x-vllm. - Three out of four comparable data points (MI355X 1k1k, MI325X 8k1k, changelog) agree on 256; only the MI325X 1k1k code says 512.
The author should either confirm this was intentional (and fix the perf-changelog documentation) or correct it to conc-end: 256.
| kimik2.5-int4-mi300x-vllm: | ||
| image: vllm/vllm-openai-rocm:v0.18.0 | ||
| model: amd/Kimi-K2.5-MXFP4 | ||
| model: moonshotai/Kimi-K2.5 | ||
| model-prefix: kimik2.5 | ||
| runner: mi355x | ||
| precision: fp4 | ||
| runner: mi300x | ||
| precision: int4 | ||
| framework: vllm | ||
| multinode: false | ||
| seq-len-configs: | ||
| - isl: 1024 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 8, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
| - isl: 1024 | ||
| osl: 8192 | ||
| search-space: | ||
| - { tp: 8, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, ep: 4, conc-start: 4, conc-end: 64 } | ||
| - isl: 8192 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 8, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
|
|
There was a problem hiding this comment.
🟡 This PR is titled as a revert of #989, but it also introduces a brand-new kimik2.5-int4-mi300x-vllm config (MI300X runner, int4 precision, moonshotai/Kimi-K2.5 model) that was never part of PR #989. The YAML slot previously holding kimik2.5-fp4-mi355x-vllm (MI355X, fp4, amd/Kimi-K2.5-MXFP4) was repurposed into this net-new config, making the revert impure and the PR description misleading — reviewers should be aware this PR does more than a simple revert.
Extended reasoning...
What the bug is
PR #990 is described as reverting PR #989, but it introduces a brand-new benchmark configuration kimik2.5-int4-mi300x-vllm that never existed in the codebase before this PR.
The specific code path
In .github/configs/amd-master.yaml around lines 278-295, the diff shows the slot previously occupied by kimik2.5-fp4-mi355x-vllm (model: amd/Kimi-K2.5-MXFP4, runner: mi355x, precision: fp4) being replaced with a completely different config: kimik2.5-int4-mi300x-vllm (model: moonshotai/Kimi-K2.5, runner: mi300x, precision: int4). The kimik2.5-fp4-mi355x-atom slot that #989 added then gets repurposed to become the new kimik2.5-fp4-mi355x-vllm. The net effect is a slot rotation that introduces a config that was never in the pre-#989 state.
Why existing safeguards do not catch this
A git revert should mechanically undo only the changes from the targeted commit. There is no automated check that enforces a PR titled as a revert only removes or restores, never adds net-new configs. The perf-changelog.yaml confirms the gap: the entry for kimik2.5-int4-mi300x-vllm at the top of perf-changelog.yaml has pr-link: XXX (a placeholder), indicating it was authored for a separate, future PR, not for PR #989.
Impact
The functional impact is low: the new config itself appears valid (it follows the same structure as other kimik2.5-int4-*-vllm entries like kimik2.5-int4-mi355x-vllm and kimik2.5-int4-mi325x-vllm). However, the PR description is misleading to reviewers, making it appear this is a pure revert when it silently adds benchmark coverage for a new hardware/model combination. CI history and git blame will also be confusing since kimik2.5-int4-mi300x-vllm will appear to originate from a revert PR.
Step-by-step proof
- Pre-[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 state:
kimik2.5-fp4-mi355x-vllmexists withmodel: amd/Kimi-K2.5-MXFP4,runner: mi355x,precision: fp4(added in PR [AMD] Add Kimi K2.5 MXFP4 vLLM benchmark for MI355X (TP8) #825 per perf-changelog). Nokimik2.5-int4-mi300x-vllmentry exists anywhere. - PR [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 adds:
kimik2.5-fp4-mi355x-atomandminimaxm2.5-fp8-mi355x-atom(confirmed by the perf-changelog entry withpr-link: .../pull/989). - A pure revert of [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 should only remove
kimik2.5-fp4-mi355x-atomandminimaxm2.5-fp8-mi355x-atom, restore any entries [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 modified, and leavekimik2.5-fp4-mi355x-vllmunchanged. - What this PR actually does: It renames
kimik2.5-fp4-mi355x-vllmtokimik2.5-int4-mi300x-vllm(entirely different model/runner/precision), then repurposes thekimik2.5-fp4-mi355x-atomslot from [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 as the newkimik2.5-fp4-mi355x-vllm. - The perf-changelog entry for
kimik2.5-int4-mi300x-vllmat the top usespr-link: XXX, explicitly marking it as belonging to a different, not-yet-merged PR.
How to fix
The cleanest resolution is to separate concerns: land the net-new kimik2.5-int4-mi300x-vllm config via its own dedicated PR (with a proper pr-link in the changelog), and keep this revert PR as a pure revert of #989 only.
Reverts #989