[AMD] Fix eval for dsr1 fp4 #1566
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
benchmarks/multi_node/amd_utils/server.sh:725— On line 725, the decode launch command setsSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}— the leading$is missing before the brace, so bash passes the literal string{MORI_COMBINE_DTYPE_DECODE}as the env var value instead of expanding tofp8. The two prefill counterparts at lines 425 and 657 correctly use${MORI_COMBINE_DTYPE_PREFILL}. This defeats the stated PR purpose (fix dsr1 fp4 eval with fp8 blockwise combine for decode), since the decode worker never receives the fp8 combine_dtype.Extended reasoning...
What the bug is
On
benchmarks/multi_node/amd_utils/server.sh:725, the decodeDECODE_CMDstring is built with:DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ..."The leading
$is missing before{MORI_COMBINE_DTYPE_DECODE}. In bash,{NAME}without a leading dollar sign is not variable expansion — it is a literal sequence of curly braces and characters (brace expansion does not apply here either, since there is no comma or..inside). So when this string is later passed toeval, sglang's decode process is exec'd with the env var literally set to the string{MORI_COMBINE_DTYPE_DECODE}instead offp8(the value thatenv.shexports forMORI_COMBINE_DTYPE_DECODE).Why the existing pattern doesn't catch it
The same patch correctly uses
${MORI_COMBINE_DTYPE_PREFILL}(with the$) on the two prefill command-build sites atserver.sh:425andserver.sh:657. That confirms the author knew the proper expansion syntax — this is a typo on the decode-only line. Becauseset +xis in effect whenDECODE_CMDis assembled, the rendered command is not echoed, so the bad value is not visible at a glance in logs.Impact
The whole point of the PR ("Fix the eval result of dsr1 fp4 with fp8 blockwise combine") relies on the decode worker actually receiving
SGLANG_MORI_COMBINE_DTYPE=fp8. With this typo, decode receives an invalid literal value and the blockwise combine setting for decode is never applied — either sglang errors out parsing the value, or it silently falls back to a default, producing exactly the broken behavior the PR claims to fix. The two prefill sites are unaffected, so prefill will getfp8_direct_castcorrectly, but the eval regression on the decode side will remain.Step-by-step proof
env.shexports:MORI_COMBINE_DTYPE_DECODE=fp8(env.sh:44 in the patch).server.shsourcesenv.shnear the top.- At line 725, bash assigns
DECODE_CMDfrom a double-quoted string. Inside double quotes, only$VARand${VAR}trigger parameter expansion;{VAR}does not. SoDECODE_CMDends up containing the literal substringSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}. - Around line 756,
eval "$DECODE_CMD"runs that string. Bash parsesSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}as a command-prefix env assignment of value{MORI_COMBINE_DTYPE_DECODE}(literal 26 chars, including the braces). - The sglang decode server therefore sees
os.environ['SGLANG_MORI_COMBINE_DTYPE'] == '{MORI_COMBINE_DTYPE_DECODE}', not'fp8'.
You can reproduce this in any shell:
$ MORI_COMBINE_DTYPE_DECODE=fp8 $ CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} env | grep SGLANG_MORI_COMBINE_DTYPE" $ eval "$CMD" SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}
Compare with the prefill form (correct):
$ MORI_COMBINE_DTYPE_PREFILL=fp8_direct_cast $ CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} env | grep SGLANG_MORI_COMBINE_DTYPE" $ eval "$CMD" SGLANG_MORI_COMBINE_DTYPE=fp8_direct_cast
Fix
Add the missing
$on line 725:- DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ... + DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
This was also independently flagged by Cursor Bugbot at the same location with High severity.
🔬 also observed by cursor
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26459653823 |
|
@billishyahao is this PR ready for review? |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488008644 |
yes, please |
@Oseltamivir or @cquil11 can u review this PR? |
The decode launch command was using the literal string
'{MORI_COMBINE_DTYPE_DECODE}' instead of expanding the variable,
so the decode worker never received the correct fp8 combine dtype.
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488795817 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488890533 |

This patch is to
Note
Medium Risk
Changes MoE combine dtypes and DP+EP serving limits that directly affect benchmark throughput and lm-eval scores for a flagship FP4 config, though scope is limited to AMD disagg benchmark scripts and YAML.
Overview
Updates DeepSeek-R1 FP4 disaggregated MI355X benchmark configs (
dsr1-fp4-mi355x-sglang-disaggand-mtp) to SGLang v0.5.12 (May 19 image) and expands sweeps—including a new 1×DEP8 + 1×DEP8 non-MTP point at conc 128/256/512 and revised MTP layouts (more DEP8+DEP8, adjusted conc lists andDECODE_MTP_SIZE).MoRI / server tuning fixes FP4 eval accuracy: replaces the old FP8-combine flag with per-phase combine dtypes (
fp8_direct_castprefill,fp8decode), passesSGLANG_MORI_COMBINE_DTYPEinto launch commands, disables overlap plan stream, and when DP+EP are both on, scales max running requests and dispatch/MoE token limits from the benchmark’s max concurrency. Documents the change inperf-changelog.yaml.Reviewed by Cursor Bugbot for commit 8e13068. Bugbot is set up for automated code reviews on this repo. Configure here.