-
Notifications
You must be signed in to change notification settings - Fork 151
Add MI355X config: glm5-fp8-sglang-mtp #1086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| source "$(dirname "$0")/../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME | ||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| # GLM-5 requires transformers with glm_moe_dsa model type support. | ||
| # However, the Image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 doesn't provide this support. | ||
| python3 -m pip install -U --no-cache-dir \ | ||
| "git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2" | ||
|
|
||
| hf download "$MODEL" | ||
|
|
||
| # ROCm / SGLang performance tuning for MI355X | ||
| export SGLANG_ROCM_FUSED_DECODE_MLA=0 | ||
| export ROCM_QUICK_REDUCE_QUANTIZATION=INT4 | ||
| export SAFETENSORS_FAST_GPU=1 | ||
| export SGLANG_ENABLE_SPEC_V2=1 | ||
|
|
||
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
|
|
||
| EVAL_CONTEXT_ARGS="" | ||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN" | ||
| fi | ||
| # Start GPU monitoring (power, temperature, clocks every second) | ||
| start_gpu_monitor | ||
|
|
||
| python3 -m sglang.launch_server \ | ||
| --model-path $MODEL \ | ||
| --host=0.0.0.0 \ | ||
| --port $PORT \ | ||
| --tensor-parallel-size $TP \ | ||
| --trust-remote-code \ | ||
| --tool-call-parser glm47 \ | ||
| --reasoning-parser glm45 \ | ||
| --mem-fraction-static 0.85 \ | ||
| --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 8}' \ | ||
| --nsa-prefill-backend tilelang \ | ||
| --nsa-decode-backend tilelang $EVAL_CONTEXT_ARGS \ | ||
| --kv-cache-dtype fp8_e4m3 \ | ||
| --speculative-algorithm EAGLE \ | ||
| --speculative-num-steps 3 \ | ||
| --speculative-eagle-topk 1 \ | ||
| --speculative-num-draft-tokens 4 \ | ||
| --disable-radix-cache> $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| # Wait for server to be ready | ||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts "$((CONC * 10))" \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir /workspace/ \ | ||
| --use-chat-template | ||
|
|
||
| # After throughput, run evaluation only if RUN_EVAL is true | ||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| # Stop GPU monitoring | ||
| stop_gpu_monitor | ||
| set +x | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The comment on lines 18-19 of
glm5_fp8_mi355x_mtp.shreferences imagerocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219, but the script actually runs againstlmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413(as defined in the YAML config). Update the comment to reference the correct image so maintainers can accurately evaluate whether the pinnedtransformerspip install is still required.Extended reasoning...
What the bug is: Lines 18–19 of the new
benchmarks/single_node/glm5_fp8_mi355x_mtp.shcontain the comment:This comment explains why the script unconditionally runs
pip install -U … transformers@<commit>. The stated reason is that a specific image lacks GLM-5glm_moe_dsamodel type support.The specific code path that triggers it: A developer reading the script to decide whether the workaround is still necessary will look at lines 18–21 to understand the rationale. The comment identifies
rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219as the deficient image.Why existing code doesn't prevent it: The comment was copy-pasted from
benchmarks/single_node/glm5_fp8_mi355x.sh(the non-MTP script), which itself already had a stale comment after PR #1023 upgraded the non-MTP image from that oldrocm/sgl-devimage tolmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413without updating the explanatory comment. The MTP script was then created by copying the non-MTP script and inheriting the already-stale comment.What the impact would be: The actual image used—as defined by
glm5-fp8-mi355x-sglang-mtpinamd-master.yaml—islmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413. A maintainer reading the comment might:The pip install runs regardless, so there is no functional impact—this is purely a documentation accuracy issue.
How to fix it: Update the comment to reference the image actually in use. For example:
The same fix should also be applied to the identical stale comment in
glm5_fp8_mi355x.sh(the non-MTP script), which was introduced in PR #1023.Step-by-step proof:
.github/configs/amd-master.yamland look at theglm5-fp8-mi355x-sglang-mtpkey →image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413.benchmarks/single_node/glm5_fp8_mi355x_mtp.sh, line 19 →# However, the Image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 doesn't provide this support.rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219) is entirely different from the image actually pulled (lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413) — different registry, different tag, different date.benchmarks/single_node/glm5_fp8_mi355x.shline 19 → identical stale comment, confirming the MTP script was copy-pasted from there without updating the comment.