Update vLLM version to v0.13.0 for NVIDIA configs by ankursingh-nv · Pull Request #327 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2025-12-11T18:44:15Z

In this PR,

Updates vllm to v.0.12.0
Sets correct env variables for h100/h200
Adds git check and install logic before use.
vLLM team has removed a lot of packages (including git) to reduce the size of the docker image, as a result v0.12.0 don't have git

Benchmark script assumes git is present and doesn't check for it before using

Note

Updates NVIDIA GPT-OSS single-node vLLM configs and associated scripts to align with vLLM 0.13.0 and improve benchmarking reliability.

Bump image to vllm/vllm-openai:v0.13.0 for gptoss-fp4-{b200,h100,h200}-vllm in nvidia-master.yaml
Add VLLM_MXFP4_USE_MARLIN=1 in H100/H200 benchmark scripts; keep FP8 KV cache on B200 and FlashInfer MOE flags
Standardize vLLM compilation flags on B200 (compilation-config pass_config: fuse_allreduce_rms, eliminate_noops)
Add git presence check with apt-get install fallback in benchmark_lib.sh before cloning bench repo
Adjust --num-prompts on B200 slurm to CONC * 10
Runners: add --container-remap-root and ensure --container-writable across B200/H200 launchers
Update perf-changelog.yaml to reflect vLLM 0.13.0 and MARLIN env changes

^{Written by Cursor Bugbot for commit f268bac. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector · 2025-12-11T18:44:20Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

ankursingh-nv · 2025-12-11T18:48:25Z

@cquil11 even after the fix (#308) running into same error for all the jobs picked up by CoreWeave runners.

Refer workflow https://github.com/InferenceMAX/InferenceMAX/actions/runs/20143278800

cquil11 · 2025-12-11T19:47:09Z

@cquil11 even after the fix (#308) running into same error for all the jobs picked up by CoreWeave runners.

Refer workflow https://github.com/InferenceMAX/InferenceMAX/actions/runs/20143278800

I believe on the CW runners, I have to download the image manually smh
doing this now

cquil11 · 2025-12-11T21:16:18Z

@ankursingh-nv should be running now

ankursingh-nv · 2025-12-11T21:28:05Z

Thanks @cquil11 for the help.

git installation is more nuanced (file system permission) than I initially thought. It fails on h200-nb and h200-nv nodes.

cquil11 · 2025-12-11T21:43:58Z

@ankursingh-nv this should be fixed by --container-writable that I just added
however, now we're getting the superuser priv required error

cquil11 · 2025-12-11T21:58:13Z

@ankursingh-nv update: it seems --container-remap-root + --container-writable + adding sudo to apt-get commands worked
apparently apt-get + dpkg requires sudo

cquil11

lgtm
wanna get this in before the weekend? perf looks good

ankursingh-nv · 2025-12-15T13:28:39Z

We are still reviewing all the data points

cquil11 · 2025-12-15T16:11:55Z

@ankursingh-nv

Reminder:

PR 267 has been merged. With this. sweeps will no longer run nightly, rather they will run only when necessary as indicated by the perf-changelog.yaml file at the root of the repo. Going forward, when developers make changes to configs that have performance impact, they must note that change in perf-changelog.yaml and give a brief description of the changes. Once their PR is ready for review, they can add the sweep-enabled label to trigger a test sweep on their local branch. Once everything looks good, they can merge to main and an official sweep will be run for the specified configs.

So for this PR, you will add something like the following entry to the bottom of perf-changelog.yaml:

- config-keys:
    - gptoss-fp4-b200-vllm
    - gptoss-fp4-h100-vllm
    - gptoss-fp4-h200-vllm
  description: |
    - Update vLLM version to v0.12.0 for NVIDIA GPT-OSS configs
    PR: https://github.com/InferenceMAX/InferenceMAX/pull/327

Then add the sweep-enabled tag to the PR after marking it ready for review to run a test sweep. After the test sweep is done, please link the run in your PR description.

cquil11

please see comment about perf changelog

cquil11 · 2025-12-16T23:59:25Z

@ankursingh-nv where are we with this? I added configs to perf-changelog.yaml and started the test sweep https://github.com/InferenceMAX/InferenceMAX/actions/runs/20286718407
Please double check me

cursor · 2025-12-30T20:10:38Z

+    if ! command -v git &> /dev/null; then
+        echo "git not found, installing..."
+        if command -v apt-get &> /dev/null; then
+            sudo apt-get update && sudo apt-get install -y git


No error handling if git installation fails

The sudo apt-get update && sudo apt-get install -y git command has no error handling after it. If the installation fails (due to permission issues, network problems, or missing sudo privileges), the script silently continues to the git clone command on line 225, which will then fail with a confusing error message. When apt-get is found but the install command fails, the function does not return an error code like it does for the missing package manager case.

cursor · 2025-12-30T20:37:29Z

+        else
+            echo "Error: Could not install git. Package manager not found."
+            return 1
+        fi


H100 slurm runner missing container flags for git install

The PR adds git installation logic using sudo apt-get in benchmark_lib.sh and updates h100 config to use vLLM v0.13.0 (which lacks git). However, unlike the h200 runner scripts which are updated with --container-remap-root and --container-writable flags, the h100 slurm runner (launch_h100-cw.sh) is not modified. This means h100 slurm benchmarks will fail because the container lacks root privileges needed for sudo apt-get install.

Additional Locations (1)

.github/configs/nvidia-master.yaml#L242-L243

fix perf-changelog fix perf-changelog fix

cursor · 2026-01-02T15:24:15Z

    --output-len "$OSL" \
    --random-range-ratio "$RANDOM_RANGE_RATIO" \
-    --num-prompts "$NUM_PROMPTS" \
+    --num-prompts $(( CONC * 10 )) \


Inconsistent num-prompts handling between B200 docker and slurm scripts

The gptoss_fp4_b200_slurm.sh script was updated to compute --num-prompts dynamically as $(( CONC * 10 )), matching the pattern used in other gptoss scripts like H100 and H200. However, gptoss_fp4_b200_docker.sh still uses $NUM_PROMPTS environment variable. Both scripts were modified in this PR for the compilation config update, but only the slurm script got the num-prompts formula change. This creates inconsistent benchmark behavior between docker and slurm runs for B200, where the docker script requires NUM_PROMPTS to be set while the slurm script no longer needs it.

Additional Locations (1)

benchmarks/gptoss_fp4_b200_docker.sh#L32-L33

cquil11

lgtm

nvpohanh and others added 3 commits December 10, 2025 18:32

Update vLLM version to v0.12.0

0f1645b

Fix H100/H200 perf regression

8796275

check and install git before use

7a3fdaa

ankursingh-nv requested a review from a team as a code owner December 11, 2025 18:44

github-project-automation Bot added this to InferenceMAX Board Dec 11, 2025

add container writable to h200 nv runner launch script

be1e695

cquil11 added 2 commits December 11, 2025 15:47

add sudo to apt-get

c683d18

add container-remap-root to h200 nv and nb runner launchers

59dae33

cquil11 changed the title ~~Update vLLM version to v0.12.0~~ Update vLLM version to v0.12.0 for NVIDIA configs Dec 11, 2025

cquil11 approved these changes Dec 12, 2025

View reviewed changes

Merge branch 'main' into dev-vllm-v0.12.0

f547cf5

cquil11 requested changes Dec 15, 2025

View reviewed changes

functionstackx moved this to In Progress in InferenceMAX Board Dec 15, 2025

functionstackx reviewed Dec 16, 2025

View reviewed changes

Comment thread runners/launch_h200-nv.sh

functionstackx reviewed Dec 16, 2025

View reviewed changes

Comment thread benchmarks/gptoss_fp4_h200_slurm.sh

functionstackx reviewed Dec 16, 2025

View reviewed changes

Comment thread benchmarks/gptoss_fp4_h100_slurm.sh Outdated

cquil11 added 2 commits December 16, 2025 17:55

Merge branch 'main' into dev-vllm-v0.12.0

9cc728c

make changes to perf changelog

ca8f30f

cquil11 added the sweep-enabled label Dec 16, 2025

cquil11 temporarily deployed to fork-pr-validation December 16, 2025 23:58 — with GitHub Actions Inactive

cursor Bot reviewed Dec 30, 2025

View reviewed changes

Comment thread perf-changelog.yaml Outdated

Ankur-singh and others added 2 commits December 30, 2025 12:05

Merge branch 'main' into dev-vllm-v0.12.0

c290779

make changes to perf changelog

cd5ad1b

cursor Bot reviewed Dec 30, 2025

View reviewed changes

ankursingh-nv force-pushed the dev-vllm-v0.12.0 branch from de787e7 to c751d86 Compare December 30, 2025 20:33

cursor Bot reviewed Dec 30, 2025

View reviewed changes

fix perf-changelog

dac5bfa

fix perf-changelog fix perf-changelog fix

ankursingh-nv force-pushed the dev-vllm-v0.12.0 branch from c751d86 to dac5bfa Compare December 30, 2025 20:42

functionstackx changed the title ~~Update vLLM version to v0.12.0 for NVIDIA configs~~ Update vLLM version to v0.13.0 for NVIDIA configs Dec 30, 2025

Merge branch 'main' into dev-vllm-v0.12.0

8e1b8a7

cquil11 added sweep-enabled and removed sweep-enabled labels Dec 31, 2025

cquil11 added 6 commits December 31, 2025 00:19

Merge branch 'main' into dev-vllm-v0.12.0

159cec9

Merge branch 'main' into dev-vllm-v0.12.0

06b2938

fix compilation configs

ce9f4d9

make num prompts conc * 10

a716627

add --container-writable to h200 nb

7be0229

add --container-remap-root to b200 nb

e13a2ec

cursor Bot reviewed Jan 2, 2026

View reviewed changes

add --container-remap-root to b200 nv

f268bac

cquil11 approved these changes Jan 2, 2026

View reviewed changes

cquil11 merged commit 4e52ed4 into main Jan 2, 2026
158 checks passed

cquil11 deleted the dev-vllm-v0.12.0 branch January 2, 2026 22:24

github-project-automation Bot moved this from In Progress to Done in InferenceMAX Board Jan 2, 2026

functionstackx mentioned this pull request Jan 5, 2026

update vllm from 0.11.2 to 0.12 for NVIDIA configs #303

Closed

Klaud-Cold mentioned this pull request Feb 26, 2026

[NVIDIA] update H100, H200, B200 GPT OSS vLLM image to latest 0.16.0 #798

Closed

claude Bot mentioned this pull request Mar 27, 2026

[NVIDIA] chore: update gptoss H100 & H200 vLLM image to v0.18.0 #960

Merged

cquil11 added the NVIDIA label Apr 8, 2026

Conversation

ankursingh-nv commented Dec 11, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Dec 11, 2025

Uh oh!

ankursingh-nv commented Dec 11, 2025

Uh oh!

cquil11 commented Dec 11, 2025

Uh oh!

cquil11 commented Dec 11, 2025

Uh oh!

ankursingh-nv commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cquil11 commented Dec 11, 2025

Uh oh!

cquil11 commented Dec 11, 2025

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

ankursingh-nv commented Dec 15, 2025

Uh oh!

cquil11 commented Dec 15, 2025

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cquil11 commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Dec 30, 2025

Choose a reason for hiding this comment

No error handling if git installation fails

Uh oh!

cursor Bot Dec 30, 2025

Choose a reason for hiding this comment

H100 slurm runner missing container flags for git install

Uh oh!

cursor Bot Jan 2, 2026

Choose a reason for hiding this comment

Inconsistent num-prompts handling between B200 docker and slurm scripts

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ankursingh-nv commented Dec 11, 2025 •

edited by cursor Bot

Loading

ankursingh-nv commented Dec 11, 2025 •

edited

Loading