Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2363,6 +2363,24 @@ dsr1-fp8-h200-sglang:
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }

dsv4-fp8-h200-sglang:
image: lmsysorg/sglang:deepseek-v4-hopper
model: sgl-project/DeepSeek-V4-Flash-FP8
model-prefix: dsv4
runner: h200
precision: fp8
framework: sglang
multinode: false
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 64 }
- isl: 8192
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 32 }

qwen3.5-fp8-h200-sglang:
image: lmsysorg/sglang:v0.5.9-cu129-amd64
model: Qwen/Qwen3.5-397B-A17B-FP8
Expand Down
76 changes: 76 additions & 0 deletions benchmarks/single_node/dsv4_fp8_h200.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env bash

source "$(dirname "$0")/../benchmark_lib.sh"

check_env_vars \
MODEL \
TP \
CONC \
ISL \
OSL \
RANDOM_RANGE_RATIO \
RESULT_FILENAME

if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

hf download "$MODEL"

nvidia-smi

export SGLANG_JIT_DEEPGEMM_PRECOMPILE=0
export SGLANG_DSV4_FP4_EXPERTS=0

# TODO(Cam): the lmsysorg/sglang:deepseek-v4-hopper image installs sglang editable
# at /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so
# the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install. The
# runner mounts at /ix for this image; paths here are $PWD-relative to be agnostic.
# Drop once lmsys moves sglang back out of /workspace.

SERVER_LOG="$PWD/server.log"
PORT=${PORT:-8888}

echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL"

EVAL_CONTEXT_ARGS=""
if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN"
fi

start_gpu_monitor --output "$PWD/gpu_metrics.csv"

set -x
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path $MODEL --host 0.0.0.0 --port $PORT --trust-remote-code \
--tp $TP \
--moe-runner-backend flashinfer_mxfp4 \
--chunked-prefill-size 4096 \
--disable-flashinfer-autotune \
--disable-radix-cache $EVAL_CONTEXT_ARGS > $SERVER_LOG 2>&1 &

SERVER_PID=$!

wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

pip install -q datasets pandas

run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend vllm \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts $((CONC * 10)) \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir "$PWD/"

if [ "${RUN_EVAL}" = "true" ]; then
run_eval --framework lm-eval --port "$PORT"
append_lm_eval_summary
fi

stop_gpu_monitor
set +x
10 changes: 10 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
- config-keys:
- dsv4-fp8-h200-sglang
description:
- "Add DeepSeek-V4-Flash-FP8 single-node H200 SGLang benchmark (TP4)"
- "Container: lmsysorg/sglang:deepseek-v4-hopper"
- "Model: sgl-project/DeepSeek-V4-Flash-FP8"
- "Recipe from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4"
- "Prefix caching and speculative decoding disabled for baseline numbers"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD

Check warning on line 9 in perf-changelog.yaml

View check run for this annotation

Claude / Claude Code Review

perf-changelog.yaml entry prepended instead of appended

The new dsv4-fp8-h200-sglang entry was prepended at lines 1-9 of `perf-changelog.yaml`, but AGENTS.md requires new entries to be appended to the END of the file. Please move this entry to the bottom of the file, alongside the other recent entries (e.g., #1043, #1120).
Comment on lines +1 to +9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new dsv4-fp8-h200-sglang entry was prepended at lines 1-9 of perf-changelog.yaml, but AGENTS.md requires new entries to be appended to the END of the file. Please move this entry to the bottom of the file, alongside the other recent entries (e.g., #1043, #1120).

Extended reasoning...

What the bug is

AGENTS.md (line 160) contains an explicit, unambiguous rule for perf-changelog.yaml:

The file is read in chronological order: oldest at the top, newest at the bottom. New entries MUST be appended to the END of the file — never insert in the middle or prepend.

This PR's diff header @@ -1,3 +1,13 @@ shows that the new dsv4-fp8-h200-sglang entry was inserted at the top of perf-changelog.yaml (lines 1-9), immediately before the previous first entry (dsr1-fp8-h100-dynamo-trt / dsr1-fp8-h100-dynamo-sglang). That directly violates the documented convention.

Why existing code doesn't prevent it

perf-changelog.yaml is a plain YAML sequence, so order is stylistic/documentary rather than functional — process_changelog.py will still pick up the entry no matter where it sits. There is no lint or CI check that enforces the append-only convention; it relies on the rule in AGENTS.md.

Why this is the right interpretation (convention is still active)

Scanning the end of the modified perf-changelog.yaml, the most recent entries are all properly appended at the bottom:

So the convention is still being actively followed by other contributors. The prepend in this PR is an outlier.

Proof (step-by-step)

  1. Open AGENTS.md at line 160: the rule says entries "MUST be appended to the END of the file — never insert in the middle or prepend."
  2. Open the PR diff for perf-changelog.yaml: the hunk header is @@ -1,3 +1,13 @@, meaning the 10 new lines start at line 1 of the new file — i.e. the top.
  3. Look at the current tail of perf-changelog.yaml: the newest pre-existing entry (PR [AMD/ROCM] atom glm5.1 fp4 on mi355x #1043, glm5.1-fp4-mi355x-atom) sits there, confirming the append convention is still in force.
  4. Therefore this PR prepends rather than appends, in direct contradiction of AGENTS.md.

How to fix

Move the new entry block (the 10 lines starting with - config-keys: / dsv4-fp8-h200-sglang / description: / pr-link:) from lines 1-9 to the end of perf-changelog.yaml, after the glm5.1-fp4-mi355x-atom entry (PR #1043). Also update the pr-link: from TBD to the actual PR URL (.../pull/1136) while you're in there.

Impact

Functionally harmless — process_changelog.py will still process the entry correctly. But it's a documented-convention violation that makes the "newest at the bottom" ordering no longer reliable for readers or tooling that assumes chronological order (e.g. quick tail inspections). Hence nit severity.


- config-keys:
- dsr1-fp8-h100-dynamo-trt
- dsr1-fp8-h100-dynamo-sglang
Expand Down
15 changes: 13 additions & 2 deletions runners/launch_h200-cw.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@
SQUASH_FILE="/mnt/vast/gharunner/squash/$(echo "$IMAGE" | sed 's/[\/:@#]/_/g').sqsh"
LOCK_FILE="${SQUASH_FILE}.lock"

# TODO(Cam): lmsysorg/sglang:deepseek-v4-hopper installs sglang editable at
# /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so
# the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install and
# breaks `import sglang`. Mount this one image at /ix instead; drop the
# conditional once the image stops installing editable under /workspace.
if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then
CONTAINER_MOUNT_DIR=/ix
else
CONTAINER_MOUNT_DIR=/workspace
fi

Check failure on line 23 in runners/launch_h200-cw.sh

View check run for this annotation

Claude / Claude Code Review

Missing /ix mount workaround in launch_h200-dgxc-slurm.sh

The /ix mount workaround is applied to launch_h200-cw.sh and launch_h200-nb.sh but not to runners/launch_h200-dgxc-slurm.sh. Per .github/configs/runners.yaml, 14 of the 18 h200 pool runners are h200-dgxc-slurm_*, so the new dsv4-fp8-h200-sglang config (declared as runner: h200) will most often be scheduled onto the unfixed launcher, where /workspace bind-mount will mask /workspace/sglang/python and `import sglang` will fail — the exact failure this PR is trying to prevent. Apply the same conditi
Comment on lines +14 to +23
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The /ix mount workaround is applied to launch_h200-cw.sh and launch_h200-nb.sh but not to runners/launch_h200-dgxc-slurm.sh. Per .github/configs/runners.yaml, 14 of the 18 h200 pool runners are h200-dgxc-slurm_*, so the new dsv4-fp8-h200-sglang config (declared as runner: h200) will most often be scheduled onto the unfixed launcher, where /workspace bind-mount will mask /workspace/sglang/python and import sglang will fail — the exact failure this PR is trying to prevent. Apply the same conditional CONTAINER_MOUNT_DIR=/ix logic to the single-node else-branch (lines 289-295) of runners/launch_h200-dgxc-slurm.sh.

Extended reasoning...

The bug

This PR adds a conditional /ix mount for the lmsysorg/sglang:deepseek-v4-hopper image to two of the three H200 launchers (launch_h200-cw.sh, launch_h200-nb.sh) but leaves the third — runners/launch_h200-dgxc-slurm.sh — unpatched. Its single-node else-branch still hardcodes:

--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE
--container-workdir=/workspace/

(see runners/launch_h200-dgxc-slurm.sh lines 291 and 293, inside the else branch that starts at line 262).

Why it matters: the majority of h200 runners hit the unfixed launcher

From .github/configs/runners.yaml the h200 pool (lines 29-47) is:

  • h200-cw_* — patched by this PR
  • h200-nb_* — patched by this PR
  • 14× h200-dgxc-slurm_*not patched

That is 14/18 ≈ 78% of the pool. The new dsv4-fp8-h200-sglang entry in .github/configs/nvidia-master.yaml declares runner: h200 (not h200-multinode), so it is schedulable onto any of these 18 runners.

How it triggers

.github/workflows/benchmark-tmpl.yml:154 selects the launcher via bash ./runners/launch_${RUNNER_NAME%%_*}.sh. So a runner labeled h200-dgxc-slurm_7 executes runners/launch_h200-dgxc-slurm.sh. The new config is single-node (multinode: false), so the workflow takes the else branch (single-node path, line 262 onward) which hardcodes /workspace.

Step-by-step proof

  1. GitHub Actions dispatches the dsv4-fp8-h200-sglang job with runner: h200.
  2. The scheduler picks one of the 18 pool runners; 14 of 18 are h200-dgxc-slurm_N.
  3. benchmark-tmpl.yml invokes runners/launch_h200-dgxc-slurm.sh.
  4. IS_MULTINODE is not true (config declares multinode: false), so execution enters the else branch at line 262.
  5. srun runs with --container-mounts=$GITHUB_WORKSPACE:/workspace/ and --container-workdir=/workspace/ (lines 291, 293).
  6. The lmsysorg/sglang:deepseek-v4-hopper image installs the editable sglang at /workspace/sglang/python, but the bind-mount has masked that path with $GITHUB_WORKSPACE contents.
  7. benchmarks/single_node/dsv4_fp8_h200.sh runs python3 -m sglang.launch_server ..., which errors with ModuleNotFoundError: No module named 'sglang' (the exact failure the PR's own TODO comment is guarding against).

Why existing code doesn't prevent it

The two launchers that were patched added a conditional CONTAINER_MOUNT_DIR at their top-level; the dgxc-slurm variant has no such conditional, and still literally writes /workspace in both the --container-mounts and --container-workdir flags of the single-node srun. Nothing else in the launcher rewrites these paths based on image name.

How to fix

Apply the same two-step fix the PR already made to the other launchers, to the single-node branch of runners/launch_h200-dgxc-slurm.sh:

  1. Near the top of the file (or inside the else-branch before the srun), add:
    if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then
        CONTAINER_MOUNT_DIR=/ix
    else
        CONTAINER_MOUNT_DIR=/workspace
    fi
  2. Change lines 291 and 293 from /workspace/ to $CONTAINER_MOUNT_DIR (matching the pattern already used in launch_h200-cw.sh:53,55 and launch_h200-nb.sh:26,29).

This is a purely mechanical fix that mirrors the existing two-launcher patch and resolves the failure on the majority of the h200 pool.


set -x

JOB_ID=$(salloc --partition=$PARTITION --gres=gpu:$TP --exclusive --time=180 --no-shell --job-name="$RUNNER_NAME" 2>&1 | tee /dev/stderr | grep -oP 'Granted job allocation \K[0-9]+')
Expand Down Expand Up @@ -40,9 +51,9 @@

srun --jobid=$JOB_ID \
--container-image=$CONTAINER_IMAGE \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-mounts=$GITHUB_WORKSPACE:$CONTAINER_MOUNT_DIR,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-mount-home \
--container-workdir=/workspace/ \
--container-workdir=$CONTAINER_MOUNT_DIR \
--no-container-entrypoint --export=ALL \
bash benchmarks/single_node/${MODEL_CODE}_${PRECISION}_h200${FRAMEWORK_SUFFIX}${SPEC_SUFFIX}.sh

Expand Down
15 changes: 13 additions & 2 deletions runners/launch_h200-nb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,25 @@ SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf '_mtp' || printf '')

PARTITION="main"

# TODO(Cam): lmsysorg/sglang:deepseek-v4-hopper installs sglang editable at
# /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so
# the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install and
# breaks `import sglang`. Mount this one image at /ix instead; drop the
# conditional once the image stops installing editable under /workspace.
if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then
CONTAINER_MOUNT_DIR=/ix
else
CONTAINER_MOUNT_DIR=/workspace
fi

set -x
srun --partition=$PARTITION --gres=gpu:$TP --exclusive --job-name="$RUNNER_NAME" \
--container-image=$IMAGE \
--container-name=$(echo "$IMAGE" | sed 's/[\/:@#]/_/g')-${USER} \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-mounts=$GITHUB_WORKSPACE:$CONTAINER_MOUNT_DIR,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-remap-root \
--container-writable \
--container-mount-home \
--container-workdir=/workspace/ \
--container-workdir=$CONTAINER_MOUNT_DIR \
--no-container-entrypoint --export=ALL \
bash benchmarks/single_node/${MODEL_CODE}_${PRECISION}_h200${FRAMEWORK_SUFFIX}${SPEC_SUFFIX}.sh
Loading