Skip to content

Update vLLM version to v0.13.0 for NVIDIA configs#327

Merged
cquil11 merged 24 commits into
mainfrom
dev-vllm-v0.12.0
Jan 2, 2026
Merged

Update vLLM version to v0.13.0 for NVIDIA configs#327
cquil11 merged 24 commits into
mainfrom
dev-vllm-v0.12.0

Conversation

@ankursingh-nv
Copy link
Copy Markdown
Contributor

@ankursingh-nv ankursingh-nv commented Dec 11, 2025

In this PR,

  • Updates vllm to v.0.12.0

  • Sets correct env variables for h100/h200

  • Adds git check and install logic before use.
    vLLM team has removed a lot of packages (including git) to reduce the size of the docker image, as a result v0.12.0 don't have git
    image

    Benchmark script assumes git is present and doesn't check for it before using

image

Refer Workflow

cc @kedarpotdar-nv @cquil11


Note

Updates NVIDIA GPT-OSS single-node vLLM configs and associated scripts to align with vLLM 0.13.0 and improve benchmarking reliability.

  • Bump image to vllm/vllm-openai:v0.13.0 for gptoss-fp4-{b200,h100,h200}-vllm in nvidia-master.yaml
  • Add VLLM_MXFP4_USE_MARLIN=1 in H100/H200 benchmark scripts; keep FP8 KV cache on B200 and FlashInfer MOE flags
  • Standardize vLLM compilation flags on B200 (compilation-config pass_config: fuse_allreduce_rms, eliminate_noops)
  • Add git presence check with apt-get install fallback in benchmark_lib.sh before cloning bench repo
  • Adjust --num-prompts on B200 slurm to CONC * 10
  • Runners: add --container-remap-root and ensure --container-writable across B200/H200 launchers
  • Update perf-changelog.yaml to reflect vLLM 0.13.0 and MARLIN env changes

Written by Cursor Bugbot for commit f268bac. This will update automatically on new commits. Configure here.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@ankursingh-nv
Copy link
Copy Markdown
Contributor Author

@cquil11 even after the fix (#308) running into same error for all the jobs picked up by CoreWeave runners.

Refer workflow https://github.com/InferenceMAX/InferenceMAX/actions/runs/20143278800

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 11, 2025

@cquil11 even after the fix (#308) running into same error for all the jobs picked up by CoreWeave runners.

Refer workflow https://github.com/InferenceMAX/InferenceMAX/actions/runs/20143278800

I believe on the CW runners, I have to download the image manually smh
doing this now

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 11, 2025

@ankursingh-nv should be running now

@ankursingh-nv
Copy link
Copy Markdown
Contributor Author

ankursingh-nv commented Dec 11, 2025

Thanks @cquil11 for the help.

git installation is more nuanced (file system permission) than I initially thought. It fails on h200-nb and h200-nv nodes.

image

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 11, 2025

@ankursingh-nv this should be fixed by --container-writable that I just added
however, now we're getting the superuser priv required error

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 11, 2025

@ankursingh-nv update: it seems --container-remap-root + --container-writable + adding sudo to apt-get commands worked
apparently apt-get + dpkg requires sudo

@cquil11 cquil11 changed the title Update vLLM version to v0.12.0 Update vLLM version to v0.12.0 for NVIDIA configs Dec 11, 2025
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
wanna get this in before the weekend? perf looks good

@ankursingh-nv
Copy link
Copy Markdown
Contributor Author

We are still reviewing all the data points

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 15, 2025

@ankursingh-nv

Reminder:

PR 267 has been merged. With this. sweeps will no longer run nightly, rather they will run only when necessary as indicated by the perf-changelog.yaml file at the root of the repo. Going forward, when developers make changes to configs that have performance impact, they must note that change in perf-changelog.yaml and give a brief description of the changes. Once their PR is ready for review, they can add the sweep-enabled label to trigger a test sweep on their local branch. Once everything looks good, they can merge to main and an official sweep will be run for the specified configs.

So for this PR, you will add something like the following entry to the bottom of perf-changelog.yaml:

- config-keys:
    - gptoss-fp4-b200-vllm
    - gptoss-fp4-h100-vllm
    - gptoss-fp4-h200-vllm
  description: |
    - Update vLLM version to v0.12.0 for NVIDIA GPT-OSS configs
    PR: https://github.com/InferenceMAX/InferenceMAX/pull/327

Then add the sweep-enabled tag to the PR after marking it ready for review to run a test sweep. After the test sweep is done, please link the run in your PR description.

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see comment about perf changelog

@functionstackx functionstackx moved this to In Progress in InferenceMAX Board Dec 15, 2025
Comment thread runners/launch_h200-nv.sh
Comment thread benchmarks/gptoss_fp4_h200_slurm.sh
Comment thread benchmarks/gptoss_fp4_h100_slurm.sh Outdated
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 16, 2025

@ankursingh-nv where are we with this? I added configs to perf-changelog.yaml and started the test sweep https://github.com/InferenceMAX/InferenceMAX/actions/runs/20286718407
Please double check me

Comment thread perf-changelog.yaml Outdated
Comment thread perf-changelog.yaml
Comment thread runners/launch_h200-nb.sh
if ! command -v git &> /dev/null; then
echo "git not found, installing..."
if command -v apt-get &> /dev/null; then
sudo apt-get update && sudo apt-get install -y git
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error handling if git installation fails

The sudo apt-get update && sudo apt-get install -y git command has no error handling after it. If the installation fails (due to permission issues, network problems, or missing sudo privileges), the script silently continues to the git clone command on line 225, which will then fail with a confusing error message. When apt-get is found but the install command fails, the function does not return an error code like it does for the missing package manager case.

Fix in Cursor Fix in Web

else
echo "Error: Could not install git. Package manager not found."
return 1
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H100 slurm runner missing container flags for git install

The PR adds git installation logic using sudo apt-get in benchmark_lib.sh and updates h100 config to use vLLM v0.13.0 (which lacks git). However, unlike the h200 runner scripts which are updated with --container-remap-root and --container-writable flags, the h100 slurm runner (launch_h100-cw.sh) is not modified. This means h100 slurm benchmarks will fail because the container lacks root privileges needed for sudo apt-get install.

Additional Locations (1)

Fix in Cursor Fix in Web

fix perf-changelog

fix perf-changelog

fix
@functionstackx functionstackx changed the title Update vLLM version to v0.12.0 for NVIDIA configs Update vLLM version to v0.13.0 for NVIDIA configs Dec 30, 2025
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$NUM_PROMPTS" \
--num-prompts $(( CONC * 10 )) \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent num-prompts handling between B200 docker and slurm scripts

The gptoss_fp4_b200_slurm.sh script was updated to compute --num-prompts dynamically as $(( CONC * 10 )), matching the pattern used in other gptoss scripts like H100 and H200. However, gptoss_fp4_b200_docker.sh still uses $NUM_PROMPTS environment variable. Both scripts were modified in this PR for the compilation config update, but only the slurm script got the num-prompts formula change. This creates inconsistent benchmark behavior between docker and slurm runs for B200, where the docker script requires NUM_PROMPTS to be set while the slurm script no longer needs it.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cquil11 cquil11 merged commit 4e52ed4 into main Jan 2, 2026
158 checks passed
@cquil11 cquil11 deleted the dev-vllm-v0.12.0 branch January 2, 2026 22:24
@github-project-automation github-project-automation Bot moved this from In Progress to Done in InferenceMAX Board Jan 2, 2026
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

5 participants