Skip to content

[NVIDIA] Set EP_SIZE = 1 for B200 measurements#242

Merged
functionstackx merged 2 commits intoSemiAnalysisAI:mainfrom
kaixih:disable_b200_ep_size
Nov 18, 2025
Merged

[NVIDIA] Set EP_SIZE = 1 for B200 measurements#242
functionstackx merged 2 commits intoSemiAnalysisAI:mainfrom
kaixih:disable_b200_ep_size

Conversation

@kaixih
Copy link
Copy Markdown
Contributor

@kaixih kaixih commented Nov 17, 2025

This PR disable the ep for B200 measurements.

@kaixih kaixih requested a review from a team as a code owner November 17, 2025 18:55
Copilot AI review requested due to automatic review settings November 17, 2025 18:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR disables endpoint parallelism (EP) for B200 measurements by setting EP_SIZE to 1 across all configurations. The change affects both FP4 and FP8 precision test configurations for the B200 platform using sglang.

  • Changed EP values from matching TP values (4 or 8) to a uniform value of 1
  • Applied consistently across all input/output sequence length combinations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Nov 18, 2025

does this lead to perf improvement?

@kaixih
Copy link
Copy Markdown
Contributor Author

kaixih commented Nov 18, 2025

does this lead to perf improvement?

I think a previous commit enables the EP accidentally. This PR tries to recover it.

@functionstackx
Copy link
Copy Markdown
Contributor

functionstackx commented Nov 18, 2025

@kaixih wasn't EP always enabled?

before PR https://github.com/InferenceMAX/InferenceMAX/pull/204 , --enable-ep-moe & --enable-flashinfer-trtllm-moe was set

according to SGLang docs, since --enable-ep-moe flag has been removed, the equivalent is to set it as --ep-size "Please set --ep-size to the same value as --tp-size instead"

image

@kaixih kaixih force-pushed the disable_b200_ep_size branch from 1585a4a to 7dfc7e4 Compare November 18, 2025 19:49
Copilot AI review requested due to automatic review settings November 18, 2025 19:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kaixih
Copy link
Copy Markdown
Contributor Author

kaixih commented Nov 18, 2025

@functionstackx Right, I re-enabled the EP for the fp4 workloads. We noticed that the EP was off only for FP8 and was accidentally enabled from this InferenceMAX/InferenceMAX@d8fe8f7.

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx Right, I re-enabled the EP for the fp4 workloads. We noticed that the EP was off only for FP8 and was accidentally enabled from this d8fe8f7.

@kaixih ah gotcha, i see. it was only fp4 that previously had --enable-ep-moe and fp8 didn't have --enable-ep-moe

I see now

image

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@functionstackx functionstackx merged commit 189ae51 into SemiAnalysisAI:main Nov 18, 2025
1 of 7 checks passed
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants