Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0f1645b
Update vLLM version to v0.12.0
nvpohanh Dec 4, 2025
8796275
Fix H100/H200 perf regression
nvpohanh Dec 11, 2025
7a3fdaa
check and install git before use
Ankur-singh Dec 11, 2025
be1e695
add container writable to h200 nv runner launch script
cquil11 Dec 11, 2025
c683d18
add sudo to apt-get
cquil11 Dec 11, 2025
59dae33
add container-remap-root to h200 nv and nb runner launchers
cquil11 Dec 11, 2025
f547cf5
Merge branch 'main' into dev-vllm-v0.12.0
cquil11 Dec 15, 2025
9cc728c
Merge branch 'main' into dev-vllm-v0.12.0
cquil11 Dec 16, 2025
ca8f30f
make changes to perf changelog
cquil11 Dec 16, 2025
9951db6
fix typo, use correct env var for h100
Ankur-singh Dec 17, 2025
7433c55
Merge branch 'main' into dev-vllm-v0.12.0
Ankur-singh Dec 17, 2025
7b4c76f
Merge branch 'main' into dev-vllm-v0.12.0
Ankur-singh Dec 18, 2025
2f2377a
update to v0.13.0
Ankur-singh Dec 30, 2025
c290779
Merge branch 'main' into dev-vllm-v0.12.0
Ankur-singh Dec 30, 2025
cd5ad1b
make changes to perf changelog
cquil11 Dec 16, 2025
dac5bfa
fix perf-changelog
Ankur-singh Dec 30, 2025
8e1b8a7
Merge branch 'main' into dev-vllm-v0.12.0
cquil11 Dec 31, 2025
159cec9
Merge branch 'main' into dev-vllm-v0.12.0
cquil11 Dec 31, 2025
06b2938
Merge branch 'main' into dev-vllm-v0.12.0
cquil11 Dec 31, 2025
ce9f4d9
fix compilation configs
cquil11 Dec 31, 2025
a716627
make num prompts conc * 10
cquil11 Dec 31, 2025
7be0229
add --container-writable to h200 nb
cquil11 Jan 2, 2026
e13a2ec
add --container-remap-root to b200 nb
cquil11 Jan 2, 2026
f268bac
add --container-remap-root to b200 nv
cquil11 Jan 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ gptoss-fp4-b200-trt:
- { tp: 8, conc-start: 4, conc-end: 8 }

gptoss-fp4-b200-vllm:
image: vllm/vllm-openai:v0.11.2
image: vllm/vllm-openai:v0.13.0
model: openai/gpt-oss-120b
model-prefix: gptoss
runner: b200
Expand Down Expand Up @@ -240,7 +240,7 @@ gptoss-fp4-b200-vllm:
- { tp: 8, conc-start: 4, conc-end: 4 }

gptoss-fp4-h100-vllm:
image: vllm/vllm-openai:v0.11.2
image: vllm/vllm-openai:v0.13.0
model: openai/gpt-oss-120b
model-prefix: gptoss
runner: h100
Expand Down Expand Up @@ -300,7 +300,7 @@ gptoss-fp4-h200-trt:
- { tp: 8, ep: 8, dp-attn: false, conc-start: 4, conc-end: 8 }

gptoss-fp4-h200-vllm:
image: vllm/vllm-openai:v0.11.2
image: vllm/vllm-openai:v0.13.0
model: openai/gpt-oss-120b
model-prefix: gptoss
runner: h200
Expand Down
11 changes: 11 additions & 0 deletions benchmarks/benchmark_lib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,17 @@ run_benchmark_serving() {
echo "Error: --result-dir is required"
return 1
fi

# Check if git is installed, install if missing
if ! command -v git &> /dev/null; then
echo "git not found, installing..."
if command -v apt-get &> /dev/null; then
sudo apt-get update && sudo apt-get install -y git
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error handling if git installation fails

The sudo apt-get update && sudo apt-get install -y git command has no error handling after it. If the installation fails (due to permission issues, network problems, or missing sudo privileges), the script silently continues to the git clone command on line 225, which will then fail with a confusing error message. When apt-get is found but the install command fails, the function does not return an error code like it does for the missing package manager case.

Fix in Cursor Fix in Web

else
echo "Error: Could not install git. Package manager not found."
return 1
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H100 slurm runner missing container flags for git install

The PR adds git installation logic using sudo apt-get in benchmark_lib.sh and updates h100 config to use vLLM v0.13.0 (which lacks git). However, unlike the h200 runner scripts which are updated with --container-remap-root and --container-writable flags, the h100 slurm runner (launch_h100-cw.sh) is not modified. This means h100 slurm benchmarks will fail because the container lacks root privileges needed for sudo apt-get install.

Additional Locations (1)

Fix in Cursor Fix in Web

fi

# Clone benchmark serving repo
local BENCH_SERVING_DIR=$(mktemp -d /tmp/bmk-XXXXXX)
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/gptoss_fp4_b200_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ fi

cat > config.yaml << EOF
kv-cache-dtype: fp8
compilation-config: '{"pass_config":{"enable_fi_allreduce_fusion":true,"enable_noop":true}}'
compilation-config: '{"pass_config":{"fuse_allreduce_rms":true,"eliminate_noops":true}}'
async-scheduling: true
no-enable-prefix-caching: true
max-cudagraph-capture-size: 2048
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/gptoss_fp4_b200_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ fi

cat > config.yaml << EOF
kv-cache-dtype: fp8
compilation-config: '{"pass_config":{"enable_fi_allreduce_fusion":true,"enable_noop":true}}'
compilation-config: '{"pass_config":{"fuse_allreduce_rms":true,"eliminate_noops":true}}'
async-scheduling: true
no-enable-prefix-caching: true
max-cudagraph-capture-size: 2048
Expand Down Expand Up @@ -64,7 +64,7 @@ run_benchmark_serving \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$NUM_PROMPTS" \
--num-prompts $(( CONC * 10 )) \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent num-prompts handling between B200 docker and slurm scripts

The gptoss_fp4_b200_slurm.sh script was updated to compute --num-prompts dynamically as $(( CONC * 10 )), matching the pattern used in other gptoss scripts like H100 and H200. However, gptoss_fp4_b200_docker.sh still uses $NUM_PROMPTS environment variable. Both scripts were modified in this PR for the compilation config update, but only the slurm script got the num-prompts formula change. This creates inconsistent benchmark behavior between docker and slurm runs for B200, where the docker script requires NUM_PROMPTS to be set while the slurm script no longer needs it.

Additional Locations (1)

Fix in Cursor Fix in Web

--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/
1 change: 1 addition & 0 deletions benchmarks/gptoss_fp4_h100_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ max-model-len: 10240
EOF

export PYTHONNOUSERSITE=1
export VLLM_MXFP4_USE_MARLIN=1
SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)

set -x
Expand Down
1 change: 1 addition & 0 deletions benchmarks/gptoss_fp4_h100_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ EOF

SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
export TORCH_CUDA_ARCH_LIST="9.0"
export VLLM_MXFP4_USE_MARLIN=1

set -x
PYTHONNOUSERSITE=1 vllm serve $MODEL --host=0.0.0.0 --port=$PORT \
Expand Down
1 change: 1 addition & 0 deletions benchmarks/gptoss_fp4_h200_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)
PORT=$(( 8888 + $PORT_OFFSET ))

export TORCH_CUDA_ARCH_LIST="9.0"
export VLLM_MXFP4_USE_MARLIN=1
Comment thread
ankursingh-nv marked this conversation as resolved.

PYTHONNOUSERSITE=1 vllm serve $MODEL --host 0.0.0.0 --port $PORT --config config.yaml \
--gpu-memory-utilization 0.9 --tensor-parallel-size $TP --max-num-seqs $CONC \
Expand Down
9 changes: 9 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,12 @@
description:
- "Update NVIDIA DeepSeek sglang Docker image from v0.5.5 to v0.5.6"
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/276

- config-keys:
- gptoss-fp4-b200-vllm
- gptoss-fp4-h100-vllm
- gptoss-fp4-h200-vllm
description:
- "Update vLLM image from v0.11.2 to v0.13.0"
- "Add VLLM_MXFP4_USE_MARLIN=1 to H100 and H200 benchmark scripts"
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/327
Comment thread
cursor[bot] marked this conversation as resolved.
4 changes: 3 additions & 1 deletion runners/launch_b200-nb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ srun --partition=$PARTITION --gres=gpu:$TP --exclusive \
--container-image=$IMAGE \
--container-name=$(echo "$IMAGE" | sed 's/[\/:@#]/_/g')-${USER: -1} \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--no-container-mount-home --container-writable \
--no-container-mount-home \
--container-remap-root \
--container-writable \
--container-workdir=/workspace/ \
--no-container-entrypoint --export=ALL,PORT_OFFSET=${USER: -1},UCX_NET_DEVICES=$UCX_NET_DEVICES \
bash benchmarks/${EXP_NAME%%_*}_${PRECISION}_b200${FRAMEWORK_SUFFIX}_slurm.sh
4 changes: 3 additions & 1 deletion runners/launch_b200-nv.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ srun --jobid=$JOB_ID bash -c "enroot import -o $SQUASH_FILE docker://$IMAGE"
srun --jobid=$JOB_ID \
--container-image=$SQUASH_FILE \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--no-container-mount-home --container-writable \
--no-container-mount-home \
--container-remap-root \
--container-writable \
--container-workdir=/workspace/ \
--no-container-entrypoint --export=ALL \
bash benchmarks/${MODEL_CODE}_${PRECISION}_b200${FRAMEWORK_SUFFIX}_slurm.sh
Expand Down
2 changes: 2 additions & 0 deletions runners/launch_h200-nb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ fi
srun --jobid=$JOB_ID \
--container-image=$CONTAINER_IMAGE \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-remap-root \
Comment thread
cursor[bot] marked this conversation as resolved.
--container-writable \
--container-mount-home \
--container-workdir=/workspace/ \
--no-container-entrypoint --export=ALL \
Expand Down
2 changes: 2 additions & 0 deletions runners/launch_h200-nv.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ srun --jobid=$JOB_ID bash -c "enroot import -o $SQUASH_FILE docker://$IMAGE"
srun --jobid=$JOB_ID \
--container-image=$SQUASH_FILE \
--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \
--container-writable \
Comment thread
cquil11 marked this conversation as resolved.
--container-remap-root \
--container-mount-home \
--container-workdir=/workspace/ \
--no-container-entrypoint --export=ALL \
Expand Down
Loading