Update vLLM version to v0.13.0 for NVIDIA configs#327
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@cquil11 even after the fix (#308) running into same error for all the jobs picked up by CoreWeave runners. Refer workflow https://github.com/InferenceMAX/InferenceMAX/actions/runs/20143278800 |
I believe on the CW runners, I have to download the image manually smh |
|
@ankursingh-nv should be running now |
|
Thanks @cquil11 for the help.
|
|
@ankursingh-nv this should be fixed by --container-writable that I just added |
|
@ankursingh-nv update: it seems --container-remap-root + --container-writable + adding sudo to apt-get commands worked |
cquil11
left a comment
There was a problem hiding this comment.
lgtm
wanna get this in before the weekend? perf looks good
|
We are still reviewing all the data points |
|
Reminder:
So for this PR, you will add something like the following entry to the bottom of - config-keys:
- gptoss-fp4-b200-vllm
- gptoss-fp4-h100-vllm
- gptoss-fp4-h200-vllm
description: |
- Update vLLM version to v0.12.0 for NVIDIA GPT-OSS configs
PR: https://github.com/InferenceMAX/InferenceMAX/pull/327Then add the |
cquil11
left a comment
There was a problem hiding this comment.
please see comment about perf changelog
|
@ankursingh-nv where are we with this? I added configs to |
| if ! command -v git &> /dev/null; then | ||
| echo "git not found, installing..." | ||
| if command -v apt-get &> /dev/null; then | ||
| sudo apt-get update && sudo apt-get install -y git |
There was a problem hiding this comment.
No error handling if git installation fails
The sudo apt-get update && sudo apt-get install -y git command has no error handling after it. If the installation fails (due to permission issues, network problems, or missing sudo privileges), the script silently continues to the git clone command on line 225, which will then fail with a confusing error message. When apt-get is found but the install command fails, the function does not return an error code like it does for the missing package manager case.
de787e7 to
c751d86
Compare
| else | ||
| echo "Error: Could not install git. Package manager not found." | ||
| return 1 | ||
| fi |
There was a problem hiding this comment.
H100 slurm runner missing container flags for git install
The PR adds git installation logic using sudo apt-get in benchmark_lib.sh and updates h100 config to use vLLM v0.13.0 (which lacks git). However, unlike the h200 runner scripts which are updated with --container-remap-root and --container-writable flags, the h100 slurm runner (launch_h100-cw.sh) is not modified. This means h100 slurm benchmarks will fail because the container lacks root privileges needed for sudo apt-get install.
Additional Locations (1)
fix perf-changelog fix perf-changelog fix
c751d86 to
dac5bfa
Compare
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts "$NUM_PROMPTS" \ | ||
| --num-prompts $(( CONC * 10 )) \ |
There was a problem hiding this comment.
Inconsistent num-prompts handling between B200 docker and slurm scripts
The gptoss_fp4_b200_slurm.sh script was updated to compute --num-prompts dynamically as $(( CONC * 10 )), matching the pattern used in other gptoss scripts like H100 and H200. However, gptoss_fp4_b200_docker.sh still uses $NUM_PROMPTS environment variable. Both scripts were modified in this PR for the compilation config update, but only the slurm script got the num-prompts formula change. This creates inconsistent benchmark behavior between docker and slurm runs for B200, where the docker script requires NUM_PROMPTS to be set while the slurm script no longer needs it.

In this PR,
Updates vllm to v.0.12.0
Sets correct env variables for h100/h200
Adds

gitcheck and install logic before use.vLLM team has removed a lot of packages (including git) to reduce the size of the docker image, as a result v0.12.0 don't have
gitBenchmark script assumes
gitis present and doesn't check for it before usingRefer Workflow
cc @kedarpotdar-nv @cquil11
Note
Updates NVIDIA GPT-OSS single-node vLLM configs and associated scripts to align with vLLM 0.13.0 and improve benchmarking reliability.
imagetovllm/vllm-openai:v0.13.0forgptoss-fp4-{b200,h100,h200}-vllminnvidia-master.yamlVLLM_MXFP4_USE_MARLIN=1in H100/H200 benchmark scripts; keep FP8 KV cache on B200 and FlashInfer MOE flagscompilation-configpass_config:fuse_allreduce_rms,eliminate_noops)gitpresence check with apt-get install fallback inbenchmark_lib.shbefore cloning bench repo--num-promptson B200 slurm toCONC * 10--container-remap-rootand ensure--container-writableacross B200/H200 launchersperf-changelog.yamlto reflect vLLM 0.13.0 and MARLIN env changesWritten by Cursor Bugbot for commit f268bac. This will update automatically on new commits. Configure here.