Skip to content

DSV4 B300 DPA rerun#1162

Closed
Oseltamivir wants to merge 2 commits intomainfrom
dsv4-b300-dpa-rerun
Closed

DSV4 B300 DPA rerun#1162
Oseltamivir wants to merge 2 commits intomainfrom
dsv4-b300-dpa-rerun

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

Summary

  • Retrigger DSV4 B300 DPA runs

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml
Comment on lines +1837 to +1840
- config-keys:
- dsv4-fp4-b300-sglang
description:
- "Retrigger DSV4 B300 DPA runs"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new changelog entry at lines 1837-1840 is missing the required pr-link field. The ChangelogEntry Pydantic model in utils/matrix_logic/validation.py declares pr_link with no default and extra="forbid", so loading this file will raise a ValidationError. Add pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1162 to match every other entry in the file (including the recent retrigger entries #1148, #1159, and #1160).

Extended reasoning...

What the bug is

The new changelog entry added by this PR (perf-changelog.yaml:1837-1840) is:

- config-keys:
    - dsv4-fp4-b300-sglang
  description:
    - "Retrigger DSV4 B300 DPA runs"

It has no pr-link field. Every one of the 207 other entries in the file has a pr-link (this PR's entry is the only one missing it).

Why this breaks validation

utils/matrix_logic/validation.py declares the schema for entries in this file:

class ChangelogEntry(BaseModel):
    model_config = ConfigDict(extra="forbid", populate_by_name=True)
    ...
    pr_link: str = Field(alias="pr-link")

pr_link has no default value, so it is a required field. With extra="forbid" the model also rejects unknown keys, but the failure here is the missing required key: when the changelog loader (load_config_files / process_changelog) constructs a ChangelogEntry from this dict it will raise a Pydantic ValidationError with "Field required: pr-link".

Why retriggers don't get a pass

The recent retrigger entries demonstrate the convention: PR #1148 (line ~1819), PR #1159 (the previous dsv4-fp4-b300-sglang retrigger), and PR #1160 (line 1835, immediately above the new entry) all include pr-link. So a "retriggers don't need pr-link" exception does not exist — this is just an oversight.

Step-by-step proof

  1. Load perf-changelog.yaml with PyYAML — produces a list of dicts; the last dict has keys config-keys and description only.
  2. The loader iterates and constructs ChangelogEntry(**entry) (with alias resolution).
  3. Pydantic validates: pr_link is required (no default), but the dict has no pr-link key.
  4. ValidationError is raised, aborting changelog processing for this PR.

Fix

Append pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1162 after the description block, matching the format of every adjacent entry.

Impact

Severity is "normal" — anything in CI that calls process_changelog against this file (the changelog-driven tooling that decides which configs ran in which PR) will fail until the field is added. The fix is a single line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant