-
Notifications
You must be signed in to change notification settings - Fork 153
[NV] Add deepseek-v4-pro b300 vllm config #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
74e99f1
feat: add deepseek-v4-pro b300 vllm benchmark
Ankur-singh 3eebd56
Update PR link in perf-changelog.yaml
Ankur-singh cdef7c9
Change precision from fp8 to fp4 in nvidia-master.yaml
Ankur-singh 59eef30
Bump deepseekv4 vLLM image to cu130
Ankur-singh 6a0fa73
B300 launcher: framework suffix for dsv4 scripts
Ankur-singh d38290c
Merge remote-tracking branch 'origin/main' into nv/b300-vllm-config
Ankur-singh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 All 4 search-space entries for dsv4-fp8-b300-vllm (nvidia-master.yaml:2402-2413) omit the
epfield, so generate_sweep_configs.py defaults each matrix entry to ep=1. But benchmarks/single_node/dsv4_fp8_b300.sh always passes--enable-expert-parallel, meaning the actual EP is 8 (for tp:8), 4 (for tp:4), or 4 (for tp:4/dp-attn:true) — never 1. Downstream metadata (RESULT_FILENAME, process_result.py, compare_results.py/summarize.py grouping keys) will therefore record ep=1 for every data point. Fix by addingep: 8to the two tp:8 entries andep: 4to the two tp:4 entries, mirroring the adjacent dsv4-fp8-h200-vllm config and PR #919's metadata cleanup.Extended reasoning...
What the bug is. The newly added
dsv4-fp8-b300-vllmblock (.github/configs/nvidia-master.yaml:2388-2413) declares four search-space entries across its two seq-len configs and none of them set theepfield:{tp:8,...},{tp:4,...},{tp:8,...},{tp:4,dp-attn:true,...}. In contrast, the siblingdsv4-fp8-h200-vllmat line 2385 correctly specifiesep: 8, which is the established convention for MoE configs in this file.Why the default is wrong for this recipe. utils/matrix_logic/generate_sweep_configs.py:354 initializes
Fields.EP.valueto 1 for single-node entries and only overrides it (lines 362-363) whenepis explicitly present in the YAML entry. So every generated matrix row for this config getsep=1. However, benchmarks/single_node/dsv4_fp8_b300.sh unconditionally passes--enable-expert-parallelon thevllm servecommand (line ~76 of the new script), independent of TP or DP_ATTENTION. With vLLM's expert-parallel semantics, the effective expert-parallel degree equals the world size (TP × DP), so the runtime EP is 8 or 4, never 1.How the metadata mismatch propagates. The EP value from the matrix becomes
EP_SIZEvia .github/workflows/benchmark-tmpl.yml:85, and that value is then (a) embedded inRESULT_FILENAMEat line 146 asep${EP_SIZE}, (b) written into the aggregated JSON by utils/process_result.py:100-108 asdata['ep'] = ep_size, (c) used as a grouping key in utils/summarize.py:82,104, and (d) forms thetp{tp}/ep{ep}lookup key in utils/compare_results.py:244. So every single B300 result file for this PR will be named...ep1...and every aggregated data point will claimep: 1, while the actual run executed with EP=4 or EP=8. Any downstream baseline comparison or eval grouping will key on a value that doesn't exist in the launched recipe.Step-by-step proof for the second entry (tp:4, conc 4-128 on 1k1k).
{ tp: 4, conc-start: 4, conc-end: 128 }— noepkey.ep: 1(default) and the tp override setstp: 4; line 362-363 does not run because'ep'is not in the dict.tp=4, ep=1, dp-attn=false.EP_SIZE=1; line 146 stamps the result file as..._tp4-ep1-dpaFalse_....--tensor-parallel-size 4 --data-parallel-size 1, and--enable-expert-parallelis always present → vLLM runs with TP=4, DP=1, EP enabled over world size 4 → effective EP=4.EP_SIZE=1from env and writes{'ep': 1, ...}to the JSON — theepfield recorded is 1, the actual EP used was 4.Why this was not caught earlier. There is no validation that cross-references
--enable-expert-parallelin a launch script against theepfield in matrix entries; the coupling is by convention. This is precisely the class of mismatch that PR #919 ('Fix metadata inconsistencies in nvidia-master.yaml - TP/EP/DP-attn values now match actual recipe files') was created to clean up, and that thegptoss-fp4-*anddsr1-fp4-*changelogs repeatedly reference ('Explicitly add EP=TP for DP attention configs', 'Set ep:4 for all tp:4 entries, ep:8 for all tp:8 entries').Fix. Add explicit
epto each B300 search-space entry to match the launched EP:{ tp: 8, ep: 8, conc-start: 4, conc-end: 4 }{ tp: 4, ep: 4, conc-start: 4, conc-end: 128 }{ tp: 8, ep: 8, conc-start: 128, conc-end: 128 }{ tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 512 }This mirrors the adjacent
dsv4-fp8-h200-vllmconvention (ep: 8fortp: 8, dp-attn: true) and keepsRESULT_FILENAME/process_result.py/compare_results.pyin sync with the actual runtime EP. Purely metadata-only — no recipe-file changes required.