[AMD] Fix eval for dsr1 fp4 by billishyahao · Pull Request #1566 · SemiAnalysisAI/InferenceX

billishyahao · 2026-05-26T15:52:00Z

This patch is to

Fix the eval result of dsr1 fp4 with fp8 blockwise combine
Bump the image to May 19
Add conc 512 new sweep points

Note

Medium Risk
Changes MoE combine dtypes and DP+EP serving limits that directly affect benchmark throughput and lm-eval scores for a flagship FP4 config, though scope is limited to AMD disagg benchmark scripts and YAML.

Overview
Updates DeepSeek-R1 FP4 disaggregated MI355X benchmark configs (dsr1-fp4-mi355x-sglang-disagg and -mtp) to SGLang v0.5.12 (May 19 image) and expands sweeps—including a new 1×DEP8 + 1×DEP8 non-MTP point at conc 128/256/512 and revised MTP layouts (more DEP8+DEP8, adjusted conc lists and DECODE_MTP_SIZE).

MoRI / server tuning fixes FP4 eval accuracy: replaces the old FP8-combine flag with per-phase combine dtypes (fp8_direct_cast prefill, fp8 decode), passes SGLANG_MORI_COMBINE_DTYPE into launch commands, disables overlap plan stream, and when DP+EP are both on, scales max running requests and dispatch/MoE token limits from the benchmark’s max concurrency. Documents the change in perf-changelog.yaml.

^{Reviewed by Cursor Bugbot for commit 8e13068. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.}

claude

Additional findings (outside current diff — PR may have been updated during review):

🔴 benchmarks/multi_node/amd_utils/server.sh:725 — On line 725, the decode launch command sets SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} — the leading $ is missing before the brace, so bash passes the literal string {MORI_COMBINE_DTYPE_DECODE} as the env var value instead of expanding to fp8. The two prefill counterparts at lines 425 and 657 correctly use ${MORI_COMBINE_DTYPE_PREFILL}. This defeats the stated PR purpose (fix dsr1 fp4 eval with fp8 blockwise combine for decode), since the decode worker never receives the fp8 combine_dtype.
Extended reasoning...

What the bug is

On benchmarks/multi_node/amd_utils/server.sh:725, the decode DECODE_CMD string is built with:
```
DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ..."
```
The leading $ is missing before {MORI_COMBINE_DTYPE_DECODE}. In bash, {NAME} without a leading dollar sign is not variable expansion — it is a literal sequence of curly braces and characters (brace expansion does not apply here either, since there is no comma or .. inside). So when this string is later passed to eval, sglang's decode process is exec'd with the env var literally set to the string {MORI_COMBINE_DTYPE_DECODE} instead of fp8 (the value that env.sh exports for MORI_COMBINE_DTYPE_DECODE).

Why the existing pattern doesn't catch it

The same patch correctly uses ${MORI_COMBINE_DTYPE_PREFILL} (with the $) on the two prefill command-build sites at server.sh:425 and server.sh:657. That confirms the author knew the proper expansion syntax — this is a typo on the decode-only line. Because set +x is in effect when DECODE_CMD is assembled, the rendered command is not echoed, so the bad value is not visible at a glance in logs.

Impact

The whole point of the PR ("Fix the eval result of dsr1 fp4 with fp8 blockwise combine") relies on the decode worker actually receiving SGLANG_MORI_COMBINE_DTYPE=fp8. With this typo, decode receives an invalid literal value and the blockwise combine setting for decode is never applied — either sglang errors out parsing the value, or it silently falls back to a default, producing exactly the broken behavior the PR claims to fix. The two prefill sites are unaffected, so prefill will get fp8_direct_cast correctly, but the eval regression on the decode side will remain.

Step-by-step proof
1. env.sh exports: MORI_COMBINE_DTYPE_DECODE=fp8 (env.sh:44 in the patch).
2. server.sh sources env.sh near the top.
3. At line 725, bash assigns DECODE_CMD from a double-quoted string. Inside double quotes, only $VAR and ${VAR} trigger parameter expansion; {VAR} does not. So DECODE_CMD ends up containing the literal substring SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}.
4. Around line 756, eval "$DECODE_CMD" runs that string. Bash parses SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} as a command-prefix env assignment of value {MORI_COMBINE_DTYPE_DECODE} (literal 26 chars, including the braces).
5. The sglang decode server therefore sees os.environ['SGLANG_MORI_COMBINE_DTYPE'] == '{MORI_COMBINE_DTYPE_DECODE}', not 'fp8'.
You can reproduce this in any shell:
```
$ MORI_COMBINE_DTYPE_DECODE=fp8
$ CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} env | grep SGLANG_MORI_COMBINE_DTYPE"
$ eval "$CMD"
SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}
```
Compare with the prefill form (correct):
```
$ MORI_COMBINE_DTYPE_PREFILL=fp8_direct_cast
$ CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} env | grep SGLANG_MORI_COMBINE_DTYPE"
$ eval "$CMD"
SGLANG_MORI_COMBINE_DTYPE=fp8_direct_cast
```
Fix

Add the missing $ on line 725:
```
-    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
+    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
```
This was also independently flagged by Cursor Bugbot at the same location with High severity.
_{🔬 also observed by cursor}

github-actions · 2026-05-27T00:21:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26459653823
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26459653823

functionstackx · 2026-05-27T02:48:09Z

@billishyahao is this PR ready for review?

github-actions · 2026-05-27T02:58:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488008644
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26488008644

billishyahao · 2026-05-27T03:22:43Z

@billishyahao is this PR ready for review?

yes, please

functionstackx · 2026-05-27T03:23:36Z

@billishyahao is this PR ready for review?

yes, please

@Oseltamivir or @cquil11 can u review this PR?

The decode launch command was using the literal string '{MORI_COMBINE_DTYPE_DECODE}' instead of expanding the variable, so the decode worker never received the correct fp8 combine dtype. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-27T03:27:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488795817
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26488795817

github-actions · 2026-05-27T03:51:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488890533
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26488890533

seungrokj

lgtm

billishyahao added 6 commits May 13, 2026 15:14

bump image

cd3a2cb

change env accordingly

ffb9b82

fix

60df23c

only conc 512

f5b7263

fix

5dddb4e

fix

7b95bd6

billishyahao requested a review from a team May 26, 2026 15:52

billishyahao requested review from 1am9trash, chunfangamd, seungrokj and yctseng0211 as code owners May 26, 2026 15:52

github-project-automation Bot added this to InferenceMAX Board May 26, 2026

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server.sh Outdated

fix

4816729

billishyahao added AMD full-sweep-enabled labels May 26, 2026

Merge remote-tracking branch 'inf/main' into amd/mi355x-dsfp4-may12

feffffc

claude Bot reviewed May 26, 2026

View reviewed changes

billishyahao requested review from cquil11 and functionstackx May 26, 2026 16:00

billishyahao mentioned this pull request May 26, 2026

[AMD] add mori blog lm-sys/lm-sys.github.io#336

Open

fix

130d359

Merge remote-tracking branch 'inf/main' into amd/mi355x-dsfp4-may12

d327d7b

cquil11 approved these changes May 27, 2026

View reviewed changes

billishyahao added sweep-enabled and removed full-sweep-enabled labels May 27, 2026

seungrokj approved these changes May 27, 2026

View reviewed changes

billishyahao merged commit 820adf2 into main May 27, 2026
39 of 80 checks passed

github-project-automation Bot moved this to Done in InferenceMAX Board May 27, 2026

billishyahao deleted the amd/mi355x-dsfp4-may12 branch May 27, 2026 04:48

Conversation

billishyahao commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

What the bug is

Why the existing pattern doesn't catch it

Impact

Step-by-step proof

Fix

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

billishyahao commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

seungrokj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

billishyahao commented May 26, 2026 •

edited by cursor Bot

Loading