[NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags by Copilot · Pull Request #204 · SemiAnalysisAI/InferenceX

Copilot · 2025-11-09T21:07:42Z

fix https://github.com/InferenceMAX/InferenceMAX/issues/208

Consolidates H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64 image tag and updates deprecated SGLang server arguments to their current equivalents.

--enable-flashinfer-trtllm-moe & --enable-ep-moe is no longer available in sglang so we needed to change it

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

Changes

dsr1-fp4-b200-sglang: v0.5.3rc1-cu129-b200 → v0.5.5-cu129-amd64
dsr1-fp8-b200-sglang: v0.5.3rc1-cu129-b200 → v0.5.5-cu129-amd64
dsr1-fp8-h200-sglang: v0.5.2rc2-cu126 → v0.5.5-cu129-amd64
nvidia-master.yaml: Added ep configuration to B200 SGLang search-space entries matching tp values (9 occurrences total):
- ep: 4 for all tp: 4 entries (3 occurrences in dsr1-fp4-b200-sglang)
- ep: 8 for all tp: 8 entries (6 occurrences across dsr1-fp4-b200-sglang and dsr1-fp8-b200-sglang)
dsr1_fp4_b200_docker.sh: Replaced --enable-ep-moe with --ep-size $EP_SIZE and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm
dsr1_fp8_b200_docker.sh: Replaced --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm and added --ep-size $EP_SIZE
launch_b200-nvd.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
launch_b200-tg.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container

Previous H200 configuration used CUDA 12.6 while B200 used CUDA 12.9 with GPU-specific tags. Now all three configs use the same image with CUDA 12.9. The deprecated flags were causing launch errors with the updated SGLang version and have been replaced per the official documentation. EP_SIZE is now configured through nvidia-master.yaml (with ep matching tp as per SGLang documentation) and passed as an environment variable to benchmark scripts. The runner scripts have been updated to ensure EP_SIZE is properly passed into Docker containers.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot

Pull Request Overview

Updates SGLang Docker images across B200 and H200 configurations to use a unified v0.5.5-cu129-amd64 image tag, replacing GPU-specific and older release candidate versions.

Key Changes:

Consolidates all three SGLang configurations to use the same unified image version
Upgrades H200 from CUDA 12.6 to CUDA 12.9
Removes GPU-specific image tags in favor of a single amd64 architecture tag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

functionstackx · 2025-11-09T21:28:38Z

functionstackx · 2025-11-09T21:29:44Z

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

https://github.com/sgl-project/sglang/blob/885cfca2732dd2fac9e4f76c4eacdcfafd4a5eee/docs/advanced_features/server_arguments.md?plain=1#L446

functionstackx · 2025-11-09T21:31:56Z

@copilot set --ep-size to 8 instead and fix `--enable-flashinfer-trtllm-moe

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

Copilot · 2025-11-09T21:34:43Z

@copilot set --ep-size to 8 instead and fix `--enable-flashinfer-trtllm-moe

...

Fixed in a7bbbf5. Replaced --enable-ep-moe with --ep-size 8 and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm in both dsr1_fp4_b200_docker.sh and dsr1_fp8_b200_docker.sh.

functionstackx · 2025-11-09T21:35:07Z

@copilot use $EP_SIZE var instead of hard setting it to 8 and add it to nvidia-master.yaml

Copilot · 2025-11-09T21:42:49Z

@copilot use $EP_SIZE var instead of hard setting it to 8 and add it to nvidia-master.yaml

Fixed in eb0fedb. Changed --ep-size 8 to --ep-size $EP_SIZE in both benchmark scripts and added ep: 8 configuration to the search-space entries for all B200 SGLang configs with tp=8 in nvidia-master.yaml.

functionstackx · 2025-11-09T21:44:26Z

@copilot shouldnt u enable ep 4 for tp=4 too?

Copilot

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

functionstackx · 2025-11-09T22:31:49Z

hi @ankursingh-nv

can u help review this PR?

Here is the validation on b200 fp4 & b200 fp8 & h200 fp8 https://github.com/InferenceMAX/InferenceMAX/actions/runs/19215140966?pr=204

had to change a couple flags due to sglang upstream removing it

https://github.com/InferenceMAX/InferenceMAX/pull/204#issuecomment-3508853560

https://github.com/InferenceMAX/InferenceMAX/pull/204#issuecomment-3508855924

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

… and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

…master.yaml for B200 SGLang configs Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

… scripts Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

ankursingh-nv

LGTM

kaixih

LGTM

Copilot AI assigned Copilot and functionstackx Nov 9, 2025

Copilot started work on behalf of functionstackx November 9, 2025 21:08 View session

Copilot AI changed the title ~~[WIP] Update h200/b200 sgl image tag to v0.5.5-cu129-amd64~~ Update H200/B200 SGLang image to v0.5.5-cu129-amd64 Nov 9, 2025

Copilot AI requested a review from functionstackx November 9, 2025 21:16

Copilot finished work on behalf of functionstackx November 9, 2025 21:16

functionstackx added b200_dsr1 labels Nov 9, 2025

functionstackx marked this pull request as ready for review November 9, 2025 21:17

functionstackx requested a review from a team as a code owner November 9, 2025 21:17

Copilot AI review requested due to automatic review settings November 9, 2025 21:17

Copilot AI reviewed Nov 9, 2025

View reviewed changes

functionstackx added b200_dsr1 and removed b200_dsr1 labels Nov 9, 2025

Copilot started work on behalf of functionstackx November 9, 2025 21:31 View session

Copilot stopped work on behalf of functionstackx due to an error November 9, 2025 21:31
Copilot has encountered an error. See logs for additional details.

Copilot started work on behalf of functionstackx November 9, 2025 21:32 View session

Copilot AI changed the title ~~Update H200/B200 SGLang image to v0.5.5-cu129-amd64~~ Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags Nov 9, 2025

Copilot finished work on behalf of functionstackx November 9, 2025 21:36

Copilot started work on behalf of functionstackx November 9, 2025 21:37 View session

Copilot finished work on behalf of functionstackx November 9, 2025 21:45

functionstackx added h200_dsr1 and removed b200_dsr1 labels Nov 9, 2025

functionstackx requested review from ankursingh-nv and Copilot November 9, 2025 22:29

Copilot AI reviewed Nov 9, 2025

View reviewed changes

functionstackx requested review from Copilot and removed request for Copilot November 9, 2025 22:32

functionstackx force-pushed the copilot/update-image-tag-v0-5-5 branch from 5f0a2d5 to b868866 Compare November 9, 2025 22:37

cquil11 approved these changes Nov 9, 2025

View reviewed changes

Copilot AI and others added 5 commits November 9, 2025 18:53

Update h200/b200 sglang image tags to v0.5.5-cu129-amd64

623a624

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Fix deprecated SGLang flags: replace --enable-ep-moe with --ep-size 8…

d6f38f2

… and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Use $EP_SIZE variable instead of hardcoded 8 and add ep: 8 to nvidia-…

d1d2c82

…master.yaml for B200 SGLang configs Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Add ep: 4 for tp=4 entries in dsr1-fp4-b200-sglang config

a21de78

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Pass EP_SIZE environment variable to Docker containers in B200 runner…

28534c7

… scripts Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

functionstackx force-pushed the copilot/update-image-tag-v0-5-5 branch from b868866 to 28534c7 Compare November 9, 2025 23:53

ankursingh-nv approved these changes Nov 10, 2025

View reviewed changes

kaixih approved these changes Nov 10, 2025

View reviewed changes

functionstackx merged commit d8fe8f7 into main Nov 10, 2025
7 checks passed

functionstackx deleted the copilot/update-image-tag-v0-5-5 branch November 10, 2025 19:55

functionstackx mentioned this pull request Nov 18, 2025

[NVIDIA] Set EP_SIZE = 1 for B200 measurements #242

Merged

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags~~ [NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags#204

[NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags#204
functionstackx merged 5 commits intomainfrom
copilot/update-image-tag-v0-5-5

Copilot AI commented Nov 9, 2025 •

edited by functionstackx

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

ankursingh-nv left a comment

Uh oh!

kaixih left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Copilot AI commented Nov 9, 2025 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI commented Nov 9, 2025

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

functionstackx commented Nov 9, 2025

Uh oh!

ankursingh-nv left a comment

Choose a reason for hiding this comment

Uh oh!

kaixih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Copilot AI commented Nov 9, 2025 •

edited by functionstackx

Loading