Skip to content

[NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags#204

Merged
functionstackx merged 5 commits intomainfrom
copilot/update-image-tag-v0-5-5
Nov 10, 2025
Merged

[NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags#204
functionstackx merged 5 commits intomainfrom
copilot/update-image-tag-v0-5-5

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 9, 2025

fix https://github.com/InferenceMAX/InferenceMAX/issues/208

Consolidates H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64 image tag and updates deprecated SGLang server arguments to their current equivalents.

--enable-flashinfer-trtllm-moe & --enable-ep-moe is no longer available in sglang so we needed to change it

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

Changes

  • dsr1-fp4-b200-sglang: v0.5.3rc1-cu129-b200v0.5.5-cu129-amd64
  • dsr1-fp8-b200-sglang: v0.5.3rc1-cu129-b200v0.5.5-cu129-amd64
  • dsr1-fp8-h200-sglang: v0.5.2rc2-cu126v0.5.5-cu129-amd64
  • nvidia-master.yaml: Added ep configuration to B200 SGLang search-space entries matching tp values (9 occurrences total):
    • ep: 4 for all tp: 4 entries (3 occurrences in dsr1-fp4-b200-sglang)
    • ep: 8 for all tp: 8 entries (6 occurrences across dsr1-fp4-b200-sglang and dsr1-fp8-b200-sglang)
  • dsr1_fp4_b200_docker.sh: Replaced --enable-ep-moe with --ep-size $EP_SIZE and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm
  • dsr1_fp8_b200_docker.sh: Replaced --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm and added --ep-size $EP_SIZE
  • launch_b200-nvd.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
  • launch_b200-tg.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container

Previous H200 configuration used CUDA 12.6 while B200 used CUDA 12.9 with GPU-specific tags. Now all three configs use the same image with CUDA 12.9. The deprecated flags were causing launch errors with the updated SGLang version and have been replaced per the official documentation. EP_SIZE is now configured through nvidia-master.yaml (with ep matching tp as per SGLang documentation) and passed as an environment variable to benchmark scripts. The runner scripts have been updated to ensure EP_SIZE is properly passed into Docker containers.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Update h200/b200 sgl image tag to v0.5.5-cu129-amd64 Update H200/B200 SGLang image to v0.5.5-cu129-amd64 Nov 9, 2025
Copilot AI requested a review from functionstackx November 9, 2025 21:16
@functionstackx functionstackx marked this pull request as ready for review November 9, 2025 21:17
@functionstackx functionstackx requested a review from a team as a code owner November 9, 2025 21:17
Copilot AI review requested due to automatic review settings November 9, 2025 21:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Updates SGLang Docker images across B200 and H200 configurations to use a unified v0.5.5-cu129-amd64 image tag, replacing GPU-specific and older release candidate versions.

Key Changes:

  • Consolidates all three SGLang configurations to use the same unified image version
  • Upgrades H200 from CUDA 12.6 to CUDA 12.9
  • Removes GPU-specific image tags in favor of a single amd64 architecture tag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@functionstackx
Copy link
Copy Markdown
Contributor

image

@functionstackx
Copy link
Copy Markdown
Contributor

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

https://github.com/sgl-project/sglang/blob/885cfca2732dd2fac9e4f76c4eacdcfafd4a5eee/docs/advanced_features/server_arguments.md?plain=1#L446

@functionstackx
Copy link
Copy Markdown
Contributor

@copilot set --ep-size to 8 instead and fix `--enable-flashinfer-trtllm-moe

| `--enable-ep-moe` | NOTE: --enable-ep-moe is deprecated. Please set `--ep-size` to the same value as `--tp-size` instead. | `None` | N/A |
| `--enable-deepep-moe` | NOTE: --enable-deepep-moe is deprecated. Please set `--moe-a2a-backend` to 'deepep' instead. | `None` | N/A |
| `--enable-flashinfer-cutlass-moe` | NOTE: --enable-flashinfer-cutlass-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutlass' instead. | `None` | N/A |
| `--enable-flashinfer-cutedsl-moe` | NOTE: --enable-flashinfer-cutedsl-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_cutedsl' instead. | `None` | N/A |
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 9, 2025

@copilot set --ep-size to 8 instead and fix `--enable-flashinfer-trtllm-moe

...

Fixed in a7bbbf5. Replaced --enable-ep-moe with --ep-size 8 and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm in both dsr1_fp4_b200_docker.sh and dsr1_fp8_b200_docker.sh.

@functionstackx
Copy link
Copy Markdown
Contributor

@copilot use $EP_SIZE var instead of hard setting it to 8 and add it to nvidia-master.yaml

Copilot AI changed the title Update H200/B200 SGLang image to v0.5.5-cu129-amd64 Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags Nov 9, 2025
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 9, 2025

@copilot use $EP_SIZE var instead of hard setting it to 8 and add it to nvidia-master.yaml

Fixed in eb0fedb. Changed --ep-size 8 to --ep-size $EP_SIZE in both benchmark scripts and added ep: 8 configuration to the search-space entries for all B200 SGLang configs with tp=8 in nvidia-master.yaml.

@functionstackx
Copy link
Copy Markdown
Contributor

@copilot shouldnt u enable ep 4 for tp=4 too?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@functionstackx
Copy link
Copy Markdown
Contributor

hi @ankursingh-nv

can u help review this PR?

Here is the validation on b200 fp4 & b200 fp8 & h200 fp8 https://github.com/InferenceMAX/InferenceMAX/actions/runs/19215140966?pr=204

had to change a couple flags due to sglang upstream removing it

https://github.com/InferenceMAX/InferenceMAX/pull/204#issuecomment-3508853560

https://github.com/InferenceMAX/InferenceMAX/pull/204#issuecomment-3508855924

@functionstackx functionstackx requested review from Copilot and removed request for Copilot November 9, 2025 22:32
@functionstackx functionstackx force-pushed the copilot/update-image-tag-v0-5-5 branch from 5f0a2d5 to b868866 Compare November 9, 2025 22:37
Copilot AI and others added 5 commits November 9, 2025 18:53
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
… and --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
…master.yaml for B200 SGLang configs

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
… scripts

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
@functionstackx functionstackx force-pushed the copilot/update-image-tag-v0-5-5 branch from b868866 to 28534c7 Compare November 9, 2025 23:53
Copy link
Copy Markdown
Contributor

@ankursingh-nv ankursingh-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@kaixih kaixih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@functionstackx functionstackx merged commit d8fe8f7 into main Nov 10, 2025
7 checks passed
@functionstackx functionstackx deleted the copilot/update-image-tag-v0-5-5 branch November 10, 2025 19:55
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags [NVIDIA] Update H200/B200 SGLang image to v0.5.5-cu129-amd64 and fix deprecated flags Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants