Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 by Klaud-Cold · Pull Request #1443 · SemiAnalysisAI/InferenceX

Klaud-Cold · 2026-05-17T18:15:55Z

Summary

Update qwen3.5-bf16-mi355x-sglang image from lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415 to lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517
Update qwen3.5-bf16-mi355x-sglang-mtp image from lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260415 to lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517

github-actions · 2026-05-17T18:16:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T18:16:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T18:16:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T18:16:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude

LGTM — straightforward SGLang ROCm image bump; please replace the placeholder in the perf-changelog pr-link with before merge.

Extended reasoning...

Overview

This PR bumps the SGLang ROCm image for two configs (qwen3.5-bf16-mi355x-sglang and qwen3.5-bf16-mi355x-sglang-mtp) in .github/configs/amd-master.yaml from v0.5.10rc0-rocm720-mi35x-20260415 to v0.5.12-rocm720-mi35x-20260517, and adds a corresponding entry to perf-changelog.yaml. Two files, four lines of actual change plus a 7-line changelog block.

Security risks

None. This is a benchmark/CI config data file — no executable code, no auth, no secrets. The new image tag points to a vendor-published SGLang ROCm image used by the perf sweep runners; the same image registry (lmsysorg/sglang-rocm) is already in use across many configs in this file.

Level of scrutiny

Low. Image version bumps for benchmark configs are mechanical and follow an established pattern visible throughout this PR's surrounding context (qwen3.5-bf16-mi325x-sglang, glm5-fp8-mi355x-sglang, etc. all live in the same file and get bumped similarly via dedicated PRs — #1394, #1416, #1429 are the immediate predecessors in the changelog).

Other factors

The single issue surfaced by the bug hunter is a cosmetic placeholder (pull/XXX instead of pull/1443) in the changelog pr-link — already attached as an inline comment. It doesn't affect runtime behavior but should be fixed before merge so the changelog back-reference resolves. No CODEOWNER-sensitive paths, no logic changes, and the PR is labeled full-sweep-enabled so CI will exercise the new image.

claude · 2026-05-17T18:19:31Z

+    - qwen3.5-bf16-mi355x-sglang-mtp
+  description:
+    - "Update SGLang ROCm image from v0.5.10rc0-rocm720-mi35x-20260415 to v0.5.12-rocm720-mi35x-20260517"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX


🟡 The new perf-changelog entry added by this PR contains a placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX at line 2632 instead of the real PR number (1443). Every other recent entry in the file uses the actual PR number, so this should be replaced with /pull/1443 before merge to avoid leaving a broken changelog link.

Extended reasoning...

What the bug is: The new perf-changelog.yaml entry added by this PR (#1443) at lines 2626-2632 documents the SGLang ROCm image bump for qwen3.5-bf16-mi355x-sglang and qwen3.5-bf16-mi355x-sglang-mtp. The pr-link field on line 2632 is set to a literal placeholder string:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The specific code path that triggers it: This is a static YAML data file (the perf changelog), so the bug manifests not at runtime but the moment the PR is merged — the URL .../pull/XXX does not resolve to a real PR, and any tooling or human reader that follows the link from the changelog entry hits a 404 (or a different, unrelated PR if one ever exists with that path).

Why existing code doesn't prevent it: There is no schema validation or CI check that enforces pr-link references a numeric PR ID — it's a plain string field. The author replaced every other placeholder with concrete values (model names, image tags, dates) but left this one with the boilerplate XXX.

Step-by-step proof:

Open perf-changelog.yaml at line 2632 in this PR's diff.

Observe the value: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.

Compare to the immediately preceding entries in the same file:

line 2606: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1423

line 2612: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1429

line 2619: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1416

line 2625: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1394

The PR metadata (and the description, which references issue [Auto] Docker Image Updates Available - 2026-04-25 #1154) confirms this PR's number is 1443.

After merge, navigating to https://github.com/SemiAnalysisAI/InferenceX/pull/XXX returns 404, so the changelog entry has no working back-reference.

Impact: Cosmetic / documentation only — no runtime behavior is affected, no benchmark is impacted. But the perf-changelog exists specifically so future readers can trace any image/version bump back to the PR that introduced it; a placeholder link defeats that purpose for this entry.

How to fix: Replace XXX with 1443 on line 2632 before merge:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1443

github-actions · 2026-05-17T19:00:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25998833055
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25998833055

Klaud-Cold · 2026-05-17T19:31:54Z

e2e test result: SLURM infrastructure failure

Run 26000387400: All single-node and eval jobs failed — SLURM salloc cancelled while waiting for GPU allocation (same mi355x cluster contention). Not related to the image update.

github-actions · 2026-05-17T22:23:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25998833055
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25998833055

…2-rocm720-mi35x-20260517 Ref #1154 Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

Drops the -20260517 nightly suffix so the recipe uses the lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x release tag rather than a date-pinned nightly build.

github-actions · 2026-05-17T23:10:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26005298161
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26005298161

… nightly suffix)

github-actions · 2026-05-17T23:26:03Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26005438628
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26005438628

Docker Hub does not publish a clean lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x release tag — only the dated nightly variant. The earlier switch to the un-suffixed tag was a mistake (caused 'manifest not found' on every job). Restoring the dated nightly tag that does exist.

# Conflicts: # perf-changelog.yaml

github-actions · 2026-05-18T02:05:54Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26005774349
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26005774349

Klaud-Cold requested a review from a team May 17, 2026 18:15

Klaud-Cold requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 17, 2026 18:15

github-project-automation Bot added this to InferenceMAX Board May 17, 2026

Klaud-Cold added the full-sweep-enabled label May 17, 2026

Klaud-Cold mentioned this pull request May 17, 2026

[Auto] Docker Image Updates Available - 2026-04-25 #1154

Open

claude Bot reviewed May 17, 2026

View reviewed changes

Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.1…

5f42a2e

…2-rocm720-mi35x-20260517 Ref #1154 Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

functionstackx force-pushed the claude/issue-1154-qwen3.5-bf16-mi355x-sglang-mtp branch from 7695122 to 5f42a2e Compare May 17, 2026 23:03

fix: use sglang v0.5.12 release tag instead of dated nightly

d7bcfbb

Drops the -20260517 nightly suffix so the recipe uses the lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x release tag rather than a date-pinned nightly build.

chore: update perf-changelog description to v0.5.12 release tag (drop…

68e151e

… nightly suffix)

functionstackx changed the title ~~Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517~~ Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x May 17, 2026

functionstackx changed the title ~~Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x~~ Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 May 18, 2026

Merge remote-tracking branch 'origin/main' into HEAD

c1f6999

# Conflicts: # perf-changelog.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517#1443

Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517#1443
Klaud-Cold wants to merge 5 commits into
mainfrom
claude/issue-1154-qwen3.5-bf16-mi355x-sglang-mtp

Klaud-Cold commented May 17, 2026 •

edited by functionstackx

Loading

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Klaud-Cold commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Klaud-Cold commented May 17, 2026 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Klaud-Cold commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Klaud-Cold commented May 17, 2026 •

edited by functionstackx

Loading