Skip to content

mi355x dsv4 retry#1160

Merged
Oseltamivir merged 1 commit intomainfrom
retrigger-dsv4-mi355x
Apr 25, 2026
Merged

mi355x dsv4 retry#1160
Oseltamivir merged 1 commit intomainfrom
retrigger-dsv4-mi355x

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir merged commit 8505ad0 into main Apr 25, 2026
5 checks passed
@Oseltamivir Oseltamivir deleted the retrigger-dsv4-mi355x branch April 25, 2026 23:43
Comment thread perf-changelog.yaml
Comment on lines +1830 to +1836
- config-keys:
- dsv4-fp8-mi355x-sglang
description:
- "Drop --mem-fraction-static 0.88 and --max-total-tokens from dsv4_fp8_mi355x.sh"
- "Bump --chunked-prefill-size from 4096 to 8192"
- "Retrigger dsv4-fp8-mi355x-sglang"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1159
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The newly added perf-changelog entry sets pr-link: .../pull/1159, but this is PR #1160 ("mi355x dsv4 retry"); #1159 is the prior already-merged retrigger PR (commit d578b7e). The entry was copy-pasted from the unlinked block above without bumping the number — following the convention used for the analogous retrigger pattern at lines 1799-1805 (where #1148 self-references), this should point to pull/1160.

Extended reasoning...

What the bug is

The new perf-changelog.yaml entry added at lines 1830-1836 sets:

- config-keys:
    - dsv4-fp8-mi355x-sglang
  description:
    - "Drop --mem-fraction-static 0.88 and --max-total-tokens from dsv4_fp8_mi355x.sh"
    - "Bump --chunked-prefill-size from 4096 to 8192"
    - "Retrigger dsv4-fp8-mi355x-sglang"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1159

But this PR is #1160 ("mi355x dsv4 retry"), not #1159. PR #1159 ("retrigger mi355x dsv4") is the prior, already-merged retrigger PR — see commit d578b7e in git log. The new entry was clearly copy-pasted from the unlinked sibling block at lines 1823-1828 (which belongs to #1159) and only had pr-link appended, without updating the PR number.

Why this is a bug (convention violation)

The established convention in this file is that each entry's pr-link points to its own PR. The directly analogous retrigger pattern is at lines 1799-1805:

- config-keys:
    - dsv4-fp8-mi355x-sglang
  description:
    - "Bump MI355X SLURM allocation from --time=180 to --time=300 ..."
    ...
    - "Retriggering dsv4-fp8-mi355x-sglang"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1148

Per git show --stat 694f206, PR #1148 was a retrigger PR that only modified perf-changelog.yaml (no recipe changes — just adding the duplicate linked entry), and its pr-link self-references #1148 even though the actual recipe changes were made in PR #1147. This PR (#1160) is structurally identical — it's a retrigger of #1159's recipe changes that only adds a duplicate changelog entry — but uses pull/1159 instead of pull/1160, breaking the precedent.

Step-by-step proof

  1. PR metadata says this is mi355x dsv4 retry #1160 ("mi355x dsv4 retry"), opened 2026-04-25.
  2. git log shows commit d578b7e = "retrigger mi355x dsv4 (retrigger mi355x dsv4 #1159)" — already merged.
  3. The diff adds exactly one block at lines 1830-1836 with pr-link: ...pull/1159.
  4. Lines 1823-1828 (already on main, added by retrigger mi355x dsv4 #1159) contain the same config-keys/description as lines 1830-1836 but with no pr-link field — the new block is a copy-paste of that prior unlinked block.
  5. Following the Bump MI355X SLURM time-limit to 300m; retrigger dsv4-fp8-mi355x-sglang #1147Increase timeout #1148 precedent (retrigger PR self-references its own PR number), the correct value here is https://github.com/SemiAnalysisAI/InferenceX/pull/1160.

Impact and fix

Severity is nit: perf-changelog.yaml is documentation/metadata only — no runtime or functional impact. The misleading link still resolves to a real (related) PR, so a reader can still find their way, just not directly. But it does break traceability and the file's self-consistency. Fix: change 11591160 on line 1836.

This was referenced Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant