-
Notifications
You must be signed in to change notification settings - Fork 147
Add dsv4-fp8-h200-sglang single-node config #1136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| source "$(dirname "$0")/../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME | ||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| hf download "$MODEL" | ||
|
|
||
| nvidia-smi | ||
|
|
||
| export SGLANG_JIT_DEEPGEMM_PRECOMPILE=0 | ||
| export SGLANG_DSV4_FP4_EXPERTS=0 | ||
|
|
||
| # TODO(Cam): the lmsysorg/sglang:deepseek-v4-hopper image installs sglang editable | ||
| # at /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so | ||
| # the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install. The | ||
| # runner mounts at /ix for this image; paths here are $PWD-relative to be agnostic. | ||
| # Drop once lmsys moves sglang back out of /workspace. | ||
|
|
||
| SERVER_LOG="$PWD/server.log" | ||
| PORT=${PORT:-8888} | ||
|
|
||
| echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" | ||
|
|
||
| EVAL_CONTEXT_ARGS="" | ||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN" | ||
| fi | ||
|
|
||
| start_gpu_monitor --output "$PWD/gpu_metrics.csv" | ||
|
|
||
| set -x | ||
| PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path $MODEL --host 0.0.0.0 --port $PORT --trust-remote-code \ | ||
| --tp $TP \ | ||
| --moe-runner-backend flashinfer_mxfp4 \ | ||
| --chunked-prefill-size 4096 \ | ||
| --disable-flashinfer-autotune \ | ||
| --disable-radix-cache $EVAL_CONTEXT_ARGS > $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| pip install -q datasets pandas | ||
|
|
||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts $((CONC * 10)) \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir "$PWD/" | ||
|
|
||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| stop_gpu_monitor | ||
| set +x |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,17 @@ | |
| SQUASH_FILE="/mnt/vast/gharunner/squash/$(echo "$IMAGE" | sed 's/[\/:@#]/_/g').sqsh" | ||
| LOCK_FILE="${SQUASH_FILE}.lock" | ||
|
|
||
| # TODO(Cam): lmsysorg/sglang:deepseek-v4-hopper installs sglang editable at | ||
| # /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so | ||
| # the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install and | ||
| # breaks `import sglang`. Mount this one image at /ix instead; drop the | ||
| # conditional once the image stops installing editable under /workspace. | ||
| if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then | ||
| CONTAINER_MOUNT_DIR=/ix | ||
| else | ||
| CONTAINER_MOUNT_DIR=/workspace | ||
| fi | ||
|
Check failure on line 23 in runners/launch_h200-cw.sh
|
||
|
Comment on lines
+14
to
+23
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 The /ix mount workaround is applied to launch_h200-cw.sh and launch_h200-nb.sh but not to runners/launch_h200-dgxc-slurm.sh. Per .github/configs/runners.yaml, 14 of the 18 h200 pool runners are h200-dgxc-slurm_*, so the new dsv4-fp8-h200-sglang config (declared as runner: h200) will most often be scheduled onto the unfixed launcher, where /workspace bind-mount will mask /workspace/sglang/python and Extended reasoning...The bugThis PR adds a conditional (see Why it matters: the majority of h200 runners hit the unfixed launcherFrom
That is 14/18 ≈ 78% of the pool. The new How it triggers
Step-by-step proof
Why existing code doesn't prevent itThe two launchers that were patched added a conditional CONTAINER_MOUNT_DIR at their top-level; the dgxc-slurm variant has no such conditional, and still literally writes How to fixApply the same two-step fix the PR already made to the other launchers, to the single-node branch of
This is a purely mechanical fix that mirrors the existing two-launcher patch and resolves the failure on the majority of the h200 pool. |
||
|
|
||
| set -x | ||
|
|
||
| JOB_ID=$(salloc --partition=$PARTITION --gres=gpu:$TP --exclusive --time=180 --no-shell --job-name="$RUNNER_NAME" 2>&1 | tee /dev/stderr | grep -oP 'Granted job allocation \K[0-9]+') | ||
|
|
@@ -40,9 +51,9 @@ | |
|
|
||
| srun --jobid=$JOB_ID \ | ||
| --container-image=$CONTAINER_IMAGE \ | ||
| --container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \ | ||
| --container-mounts=$GITHUB_WORKSPACE:$CONTAINER_MOUNT_DIR,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \ | ||
| --container-mount-home \ | ||
| --container-workdir=/workspace/ \ | ||
| --container-workdir=$CONTAINER_MOUNT_DIR \ | ||
| --no-container-entrypoint --export=ALL \ | ||
| bash benchmarks/single_node/${MODEL_CODE}_${PRECISION}_h200${FRAMEWORK_SUFFIX}${SPEC_SUFFIX}.sh | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The new dsv4-fp8-h200-sglang entry was prepended at lines 1-9 of
perf-changelog.yaml, but AGENTS.md requires new entries to be appended to the END of the file. Please move this entry to the bottom of the file, alongside the other recent entries (e.g., #1043, #1120).Extended reasoning...
What the bug is
AGENTS.md(line 160) contains an explicit, unambiguous rule forperf-changelog.yaml:This PR's diff header
@@ -1,3 +1,13 @@shows that the newdsv4-fp8-h200-sglangentry was inserted at the top ofperf-changelog.yaml(lines 1-9), immediately before the previous first entry (dsr1-fp8-h100-dynamo-trt/dsr1-fp8-h100-dynamo-sglang). That directly violates the documented convention.Why existing code doesn't prevent it
perf-changelog.yamlis a plain YAML sequence, so order is stylistic/documentary rather than functional —process_changelog.pywill still pick up the entry no matter where it sits. There is no lint or CI check that enforces the append-only convention; it relies on the rule inAGENTS.md.Why this is the right interpretation (convention is still active)
Scanning the end of the modified
perf-changelog.yaml, the most recent entries are all properly appended at the bottom:So the convention is still being actively followed by other contributors. The prepend in this PR is an outlier.
Proof (step-by-step)
AGENTS.mdat line 160: the rule says entries "MUST be appended to the END of the file — never insert in the middle or prepend."perf-changelog.yaml: the hunk header is@@ -1,3 +1,13 @@, meaning the 10 new lines start at line 1 of the new file — i.e. the top.perf-changelog.yaml: the newest pre-existing entry (PR [AMD/ROCM] atom glm5.1 fp4 on mi355x #1043,glm5.1-fp4-mi355x-atom) sits there, confirming the append convention is still in force.AGENTS.md.How to fix
Move the new entry block (the 10 lines starting with
- config-keys:/dsv4-fp8-h200-sglang/description:/pr-link:) from lines 1-9 to the end ofperf-changelog.yaml, after theglm5.1-fp4-mi355x-atomentry (PR #1043). Also update thepr-link:fromTBDto the actual PR URL (.../pull/1136) while you're in there.Impact
Functionally harmless —
process_changelog.pywill still process the entry correctly. But it's a documented-convention violation that makes the "newest at the bottom" ordering no longer reliable for readers or tooling that assumes chronological order (e.g. quicktailinspections). Hence nit severity.