Revert "[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5" by cquil11 · Pull Request #990 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-01T19:37:16Z

Reverts #989

…iMax-M2.…" This reverts commit 7d10a7b.

github-actions · 2026-04-01T19:37:25Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-04-01T19:37:25Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

claude · 2026-04-01T19:52:00Z

-    search-space:
-    - { tp: 2, conc-start: 4, conc-end: 64 }
-    - { tp: 4, conc-start: 4, conc-end: 64 }
+    - { tp: 8, ep: 8, conc-start: 4, conc-end: 512 }


🔴 The minimaxm2.5-fp8-mi325x-vllm 1k1k entry uses conc-end: 512 (line 371), which is double the conc-end: 256 used in the companion 8k1k entry and double the value used by the higher-end MI355X for the same workload. The perf-changelog for PR #953 also documents this config's range as 4-256, creating a direct contradiction with the actual code. Please verify whether 512 was intentional or a copy-paste error, as running TP8/EP8 concurrency up to 512 on MI325X risks OOM or unexpectedly long benchmark runs.

Extended reasoning...

What the bug is and how it manifests

In .github/configs/amd-master.yaml, the minimaxm2.5-fp8-mi325x-vllm config has an asymmetric concurrency ceiling: conc-end: 512 for the 1k1k (ISL=1024, OSL=1024) case but conc-end: 256 for the 8k1k (ISL=8192, OSL=1024) case. When the benchmark harness sweeps this config, it will run TP8/EP8 inference at concurrencies up to 512 on MI325X hardware, potentially causing OOM errors or extreme slowdowns.

The specific code path that triggers it

The relevant section after this PR applies:

minimaxm2.5-fp8-mi325x-vllm: seq-len-configs: - isl: 1024 osl: 1024 search-space: - { tp: 2, conc-start: 4, conc-end: 64 } - { tp: 8, ep: 8, conc-start: 4, conc-end: 512 } # <-- suspicious - isl: 8192 osl: 1024 search-space: - { tp: 2, conc-start: 4, conc-end: 64 } - { tp: 8, ep: 8, conc-start: 4, conc-end: 256 } # <-- 256 here

This PR introduces this change as part of what is framed as a revert of #989, but the MI325X vllm config was not modified by #989 (which added ATOM configs only), making this an undocumented non-revert modification.

Why existing code doesn't prevent it

The YAML config is consumed directly by the benchmark harness without any upper-bound validation. The perf-changelog for PR #953 documents the range as 4-256, but the actual code uses 512 — the changelog and code disagree. No schema validation catches this inconsistency.

Addressing the key refutation

One verifier checked git show 04d83df and asserts that PR #953 intentionally set conc-end: 512 for 1k1k, with the rationale that 1k1k requests consume less KV cache memory than 8k1k requests (ISL=1024 vs ISL=8192), allowing higher concurrency. This is technically plausible. However, even if intentional in PR #953, the following issues remain: (1) the perf-changelog for #953 explicitly states "add conc range 4-256", directly contradicting the code — at minimum this is a documentation bug that should be corrected; (2) the higher-end MI355X uses conc-end: 256 for the same 1k1k workload with the same TP8/EP8 parallelism, and a lower-tier GPU exceeding the higher-tier GPU's ceiling by 2x is architecturally counterintuitive; (3) the value 512 = 256 × 2 is the classic signature of a doubling copy-paste error.

What the impact would be

If 512 is a typo for 256, benchmark runs will sweep twice as many unnecessary concurrency points, consuming significant GPU-hours and potentially hitting OOM at high concurrencies on MI325X. If 512 is intentional, the perf-changelog documentation must be corrected to avoid confusion for future reviewers.

Step-by-step proof of the inconsistency

MI355X (higher-end GPU) uses tp: 8, ep: 8, conc-start: 32, conc-end: 256 for 1k1k.

MI325X (lower-end GPU) now uses tp: 8, ep: 8, conc-start: 4, conc-end: 512 for 1k1k.

Same config on MI325X, 8k1k uses conc-end: 256.

The perf-changelog for PR [AMD] Update Minimax M2.5 MI325 image and adjust search space #953 says "add conc range 4-256" for minimaxm2.5-fp8-mi325x-vllm.

Three out of four comparable data points (MI355X 1k1k, MI325X 8k1k, changelog) agree on 256; only the MI325X 1k1k code says 512.

The author should either confirm this was intentional (and fix the perf-changelog documentation) or correct it to conc-end: 256.

claude · 2026-04-01T19:52:00Z

+kimik2.5-int4-mi300x-vllm:
  image: vllm/vllm-openai-rocm:v0.18.0
-  model: amd/Kimi-K2.5-MXFP4
+  model: moonshotai/Kimi-K2.5
  model-prefix: kimik2.5
-  runner: mi355x
-  precision: fp4
+  runner: mi300x
+  precision: int4
  framework: vllm
  multinode: false
  seq-len-configs:
  - isl: 1024
    osl: 1024
    search-space:
    - { tp: 8, conc-start: 4, conc-end: 64 }
-    - { tp: 4, conc-start: 4, conc-end: 64 }
-  - isl: 1024
-    osl: 8192
-    search-space:
-    - { tp: 8, conc-start: 4, conc-end: 64 }
-    - { tp: 4, conc-start: 4, conc-end: 64 }
-    - { tp: 4, ep: 4, conc-start: 4, conc-end: 64 }
  - isl: 8192
    osl: 1024
    search-space:
    - { tp: 8, conc-start: 4, conc-end: 64 }
-    - { tp: 4, conc-start: 4, conc-end: 64 }



🟡 This PR is titled as a revert of #989, but it also introduces a brand-new kimik2.5-int4-mi300x-vllm config (MI300X runner, int4 precision, moonshotai/Kimi-K2.5 model) that was never part of PR #989. The YAML slot previously holding kimik2.5-fp4-mi355x-vllm (MI355X, fp4, amd/Kimi-K2.5-MXFP4) was repurposed into this net-new config, making the revert impure and the PR description misleading — reviewers should be aware this PR does more than a simple revert.

Extended reasoning...

What the bug is

PR #990 is described as reverting PR #989, but it introduces a brand-new benchmark configuration kimik2.5-int4-mi300x-vllm that never existed in the codebase before this PR.

The specific code path

In .github/configs/amd-master.yaml around lines 278-295, the diff shows the slot previously occupied by kimik2.5-fp4-mi355x-vllm (model: amd/Kimi-K2.5-MXFP4, runner: mi355x, precision: fp4) being replaced with a completely different config: kimik2.5-int4-mi300x-vllm (model: moonshotai/Kimi-K2.5, runner: mi300x, precision: int4). The kimik2.5-fp4-mi355x-atom slot that #989 added then gets repurposed to become the new kimik2.5-fp4-mi355x-vllm. The net effect is a slot rotation that introduces a config that was never in the pre-#989 state.

Why existing safeguards do not catch this

A git revert should mechanically undo only the changes from the targeted commit. There is no automated check that enforces a PR titled as a revert only removes or restores, never adds net-new configs. The perf-changelog.yaml confirms the gap: the entry for kimik2.5-int4-mi300x-vllm at the top of perf-changelog.yaml has pr-link: XXX (a placeholder), indicating it was authored for a separate, future PR, not for PR #989.

Impact

The functional impact is low: the new config itself appears valid (it follows the same structure as other kimik2.5-int4-*-vllm entries like kimik2.5-int4-mi355x-vllm and kimik2.5-int4-mi325x-vllm). However, the PR description is misleading to reviewers, making it appear this is a pure revert when it silently adds benchmark coverage for a new hardware/model combination. CI history and git blame will also be confusing since kimik2.5-int4-mi300x-vllm will appear to originate from a revert PR.

Step-by-step proof

Pre-[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 state: kimik2.5-fp4-mi355x-vllm exists with model: amd/Kimi-K2.5-MXFP4, runner: mi355x, precision: fp4 (added in PR [AMD] Add Kimi K2.5 MXFP4 vLLM benchmark for MI355X (TP8) #825 per perf-changelog). No kimik2.5-int4-mi300x-vllm entry exists anywhere.

PR [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 adds: kimik2.5-fp4-mi355x-atom and minimaxm2.5-fp8-mi355x-atom (confirmed by the perf-changelog entry with pr-link: .../pull/989).

A pure revert of [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 should only remove kimik2.5-fp4-mi355x-atom and minimaxm2.5-fp8-mi355x-atom, restore any entries [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 modified, and leave kimik2.5-fp4-mi355x-vllm unchanged.

What this PR actually does: It renames kimik2.5-fp4-mi355x-vllm to kimik2.5-int4-mi300x-vllm (entirely different model/runner/precision), then repurposes the kimik2.5-fp4-mi355x-atom slot from [AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5 #989 as the new kimik2.5-fp4-mi355x-vllm.

The perf-changelog entry for kimik2.5-int4-mi300x-vllm at the top uses pr-link: XXX, explicitly marking it as belonging to a different, not-yet-merged PR.

How to fix

The cleanest resolution is to separate concerns: land the net-new kimik2.5-int4-mi300x-vllm config via its own dedicated PR (with a proper pr-link in the changelog), and keep this revert PR as a pure revert of #989 only.

Revert "[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and Min…

6cc5810

…iMax-M2.…" This reverts commit 7d10a7b.

cquil11 requested a review from a team April 1, 2026 19:37

cquil11 requested review from billishyahao and chunfangamd as code owners April 1, 2026 19:37

github-project-automation Bot added this to InferenceMAX Board Apr 1, 2026

cquil11 merged commit 24eea11 into main Apr 1, 2026
7 checks passed

cquil11 deleted the revert-989-srok/atom_kimik2.5_minimaxm2.5_glm5_v2 branch April 1, 2026 19:37

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 1, 2026

claude Bot reviewed Apr 1, 2026

View reviewed changes

cquil11 added the AMD label Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5"#990

Revert "[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4 and MiniMax-M2.5"#990
cquil11 merged 1 commit into
mainfrom
revert-989-srok/atom_kimik2.5_minimaxm2.5_glm5_v2

cquil11 commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

Uh oh!

claude Bot Apr 1, 2026

Uh oh!

claude Bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

Uh oh!

claude Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 1, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Why existing safeguards do not catch this

Impact

Step-by-step proof

How to fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant