Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
…-v2' into h100-multinode-eval-v2
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
perf-changelog.yaml:1705-1711— This changelog entry is missingevals-only: true, which will cause unintended H100 multinode benchmark sweeps to run alongside the intended eval runs for bothdsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglang. The PR title/description indicate evals-only intent, and every other analogous eval-trigger entry in this file (PRs #558, #892, #911, #1000, and the directly parallel H200 entry from #1094) sets the flag. Also a minor nit:pr-linkpoints to #1119 (the prior timeout-fix PR referenced in the description) instead of this PR — convention across the file is thatpr-linkidentifies the PR that adds the entry.Extended reasoning...
The bug
The new changelog entry at
perf-changelog.yaml:1705-1711adds an entry fordsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglangwithout anevals-only: truefield. The intent (per PR title "trigger H100 multinode evals" and description "Trigger H100 multinode evals after dist-timeout and health-check timeout fixes") is clearly to kick off evals-only for these two config-keys — nothing about the entry signals a perf/benchmark change, and there are no recipe or configuration changes in this PR.Code path that triggers unintended benchmark runs
In
utils/process_changelog.py(around line 107), benchmark-config generation is gated onif not entry.evals_only:. The field defaults toFalse(utils/matrix_logic/validation.py:345:evals_only: bool = Field(alias="evals-only", default=False)), so omitting the field takes the default branch that generates both benchmark configs (lines ~108-132) and eval configs (lines ~134-158).Concretely, when the changelog-processing CI job runs on the post-merge state, it sees the added entry with
evals_only=False, so it iterates theconfig-keyslist intobenchmark_configs, invokesgenerate_sweep_configs.pywith--no-evals(producing a full benchmark sweep), and separately invokes the evals branch. Bothdsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglangexist in.github/configs/nvidia-master.yaml, so these are fully realized multinode sweeps — not filtered out as unknown keys.Why the existing code doesn't prevent it
The
evals_onlydefault isFalse; there is no heuristic that infers eval-only intent from the description, nor any lint preventing eval-trigger entries from omitting the flag. PR-review is effectively the only line of defense. The immediate-above entry that this PR is patterned after (PR #1094 "Add H200 multinode evals-only runs" at lines 1673-1680) correctly setsevals-only: true, and every other multinode eval trigger entry in this file (PRs #558, #892, #911, #1000, #1094) also sets it. This entry breaks that unbroken convention.Step-by-step proof
- CI runs
utils/process_changelog.pyon the post-merge tree and reads the newly added block at lines 1705-1711. - Pydantic validation runs on the entry; since
evals-onlyis absent,entry.evals_onlydefaults toFalse(validation.py:345). process_changelog.py:107evaluatesif not entry.evals_only:→True→ enters the benchmark branch.config-keys=[dsr1-fp8-h100-dynamo-trt, dsr1-fp8-h100-dynamo-sglang]are added tobenchmark_configs(neither has been seen before in this PR'schangelog_data, which contains only this one added block).generate_sweep_configs.py --no-evalsis invoked for both config-keys, producing a full multinode benchmark sweep across all sequence lengths and concurrencies defined innvidia-master.yaml.- The code then falls through to the evals branch and also generates the intended eval configs.
- Net effect: both a full H100 multinode benchmark sweep and the intended eval run are launched, costing significant multinode GPU time and polluting the perf-history classification for these config-keys (they'll show perf-changelog entries that imply a perf/benchmark change when the PR made no recipe or config change).
How to fix
Add
evals-only: trueto the entry, matching the immediately analogous H200 predecessor at lines 1673-1680:- config-keys: - dsr1-fp8-h100-dynamo-trt - dsr1-fp8-h100-dynamo-sglang description: - "Trigger H100 multinode evals after dist-timeout and health-check timeout fixes" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1120 evals-only: true
Also (nit): update
pr-linkfrom/pull/1119to/pull/1120. The file-wide convention (documented at.github/workflows/claude.yml:243and followed by 160+ entries including the analogous H200 entry which links to its own #1094) is thatpr-linkidentifies the PR that adds the row, not a motivating reference PR. Precedent: thedsr1-fp8-h100-dynamo-trtentry at PR #663 references PR #651 inside itsdescriptiontext but still uses #663 as itspr-link. Thepr-linkhas no runtime semantics (it's only type-checked as a string invalidation.py:344), so this part is documentation/traceability only — hence noted as a nit, not blocking. - CI runs
No description provided.