[NVIDIA] Set EP_SIZE = 1 for B200 measurements#242
[NVIDIA] Set EP_SIZE = 1 for B200 measurements#242functionstackx merged 2 commits intoSemiAnalysisAI:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR disables endpoint parallelism (EP) for B200 measurements by setting EP_SIZE to 1 across all configurations. The change affects both FP4 and FP8 precision test configurations for the B200 platform using sglang.
- Changed EP values from matching TP values (4 or 8) to a uniform value of 1
- Applied consistently across all input/output sequence length combinations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
does this lead to perf improvement? |
I think a previous commit enables the EP accidentally. This PR tries to recover it. |
|
@kaixih wasn't EP always enabled? before PR https://github.com/InferenceMAX/InferenceMAX/pull/204 , according to SGLang docs, since
|
1585a4a to
7dfc7e4
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@functionstackx Right, I re-enabled the EP for the fp4 workloads. We noticed that the EP was off only for FP8 and was accidentally enabled from this InferenceMAX/InferenceMAX@d8fe8f7. |
@kaixih ah gotcha, i see. it was only fp4 that previously had I see now
|


This PR disable the ep for B200 measurements.