DSV4 B300 DPA rerun#1162
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - config-keys: | ||
| - dsv4-fp4-b300-sglang | ||
| description: | ||
| - "Retrigger DSV4 B300 DPA runs" |
There was a problem hiding this comment.
🔴 The new changelog entry at lines 1837-1840 is missing the required pr-link field. The ChangelogEntry Pydantic model in utils/matrix_logic/validation.py declares pr_link with no default and extra="forbid", so loading this file will raise a ValidationError. Add pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1162 to match every other entry in the file (including the recent retrigger entries #1148, #1159, and #1160).
Extended reasoning...
What the bug is
The new changelog entry added by this PR (perf-changelog.yaml:1837-1840) is:
- config-keys:
- dsv4-fp4-b300-sglang
description:
- "Retrigger DSV4 B300 DPA runs"It has no pr-link field. Every one of the 207 other entries in the file has a pr-link (this PR's entry is the only one missing it).
Why this breaks validation
utils/matrix_logic/validation.py declares the schema for entries in this file:
class ChangelogEntry(BaseModel):
model_config = ConfigDict(extra="forbid", populate_by_name=True)
...
pr_link: str = Field(alias="pr-link")pr_link has no default value, so it is a required field. With extra="forbid" the model also rejects unknown keys, but the failure here is the missing required key: when the changelog loader (load_config_files / process_changelog) constructs a ChangelogEntry from this dict it will raise a Pydantic ValidationError with "Field required: pr-link".
Why retriggers don't get a pass
The recent retrigger entries demonstrate the convention: PR #1148 (line ~1819), PR #1159 (the previous dsv4-fp4-b300-sglang retrigger), and PR #1160 (line 1835, immediately above the new entry) all include pr-link. So a "retriggers don't need pr-link" exception does not exist — this is just an oversight.
Step-by-step proof
- Load
perf-changelog.yamlwith PyYAML — produces a list of dicts; the last dict has keysconfig-keysanddescriptiononly. - The loader iterates and constructs
ChangelogEntry(**entry)(with alias resolution). - Pydantic validates:
pr_linkis required (no default), but the dict has nopr-linkkey. ValidationErroris raised, aborting changelog processing for this PR.
Fix
Append pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1162 after the description block, matching the format of every adjacent entry.
Impact
Severity is "normal" — anything in CI that calls process_changelog against this file (the changelog-driven tooling that decides which configs ran in which PR) will fail until the field is added. The fix is a single line.
Summary