[AMD/ROCm] qwen3.5 fp8 mi355x optimization by seungrokj · Pull Request #910 · SemiAnalysisAI/InferenceX

seungrokj · 2026-03-13T05:04:24Z

waiting for the optimized upstream docker image.

Regards,
Seungrok

Signed-off-by: seungrokj <seungrok.jung@amd.com>

github-actions · 2026-03-13T05:04:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Signed-off-by: seungrokj <seungrok.jung@amd.com>

claude · 2026-03-16T09:42:35Z

Claude finished @seungrokj's task in 1m 22s —— View job

PR #910 Review

Gather context and read changed files
Validate master config and perf-changelog
Review benchmark script
Post review summary

1 blocking issue found.

🔴 perf-changelog.yaml not updated: .github/configs/amd-master.yaml was modified (image bump v0.5.8.post1 → v0.5.9) but perf-changelog.yaml was not updated. Please add an entry documenting the config change.

The benchmark script changes look good — server launch command is properly formatted with arguments on separate lines, optimization flags (--kv-cache-dtype fp8_e4m3, --chunked-prefill-size, --disable-radix-cache, --num-continuous-decode-steps 2, etc.) are reasonable for throughput tuning, and the image (rocm/sgl-dev:v0.5.9-rocm720-mi35x-20260315) is publicly accessible on Docker Hub.

.github/configs/amd-master.yaml

Signed-off-by: seungrokj <seungrok.jung@amd.com>

claude

LGTM — straightforward Docker image bump and benchmark server tuning flags.

Extended reasoning...

Overview

This PR touches two files: a Docker image version bump in .github/configs/amd-master.yaml (from v0.5.8.post1 to v0.5.9) and optimization flags added to the benchmark script benchmarks/single_node/qwen3.5_fp8_mi355x.sh. The script changes include switching from python3 -m sglang.launch_server to the newer sglang serve CLI, making MEM_FRAC_STATIC configurable via environment variable, and adding several performance-tuning flags (fp8 KV cache, chunked prefill, cuda graph bounds, radix cache disable, etc.).

Security risks

None. These are benchmark configuration files and a shell script that runs in a controlled CI/benchmarking environment. No auth, crypto, permissions, or user-facing code is affected.

Level of scrutiny

Low scrutiny is appropriate. This is a config and benchmark tuning change following established patterns already present in the repo for other model configurations. The author has triggered a test run via /test-config.

Other factors

No bugs were found by the automated bug hunting system. No outstanding reviewer comments exist. The changes are self-contained and follow the same patterns as other benchmark scripts in the repository.

seungrokj · 2026-03-17T00:25:55Z

@functionstackx can you plz run this sweep ? (I tried to run this manually but seems not working..)

seungrokj · 2026-03-18T00:43:31Z

/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-fp8-mi355x-sglang

github-actions · 2026-03-18T00:43:39Z

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23223448478
Command: test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-fp8-mi355x-sglang
Pinned ref: dcf4337
Approval: not required (trusted collaborator).

Signed-off-by: seungrokj <seungrok.jung@amd.com>

functionstackx · 2026-03-18T04:18:02Z

@cquil11 or @Oseltamivir can u help for day to day tasks. plz ping them

chunfangamd

lgtm

seungrokj added 2 commits March 13, 2026 05:02

qwen3.5 fp8 mi355x optimization

a11cb7a

Signed-off-by: seungrokj <seungrok.jung@amd.com>

qwen3.5 fp8 mi355x optimization

a706a84

Signed-off-by: seungrokj <seungrok.jung@amd.com>

github-project-automation bot added this to InferenceMAX Board Mar 13, 2026

seungrokj and others added 2 commits March 16, 2026 09:38

qwen3.5 fp8 mi355x

d26ce9c

Signed-off-by: seungrokj <seungrok.jung@amd.com>

Merge branch 'main' into srok/qwen3.5fp8_mi355x

1bbaf1f

seungrokj marked this pull request as ready for review March 16, 2026 09:42

seungrokj requested a review from a team March 16, 2026 09:42

seungrokj requested review from billishyahao and chunfangamd as code owners March 16, 2026 09:42

claude bot reviewed Mar 16, 2026

View reviewed changes

.github/configs/amd-master.yaml Show resolved Hide resolved

seungrokj added 4 commits March 16, 2026 09:48

qwen3.5 fp8 perf changelog

28030df

Signed-off-by: seungrokj <seungrok.jung@amd.com>

qwen3.5 fp8 perf changelog

188efa9

Signed-off-by: seungrokj <seungrok.jung@amd.com>

qwen3.5 fp8 perf changelog

fd5c8d6

Signed-off-by: seungrokj <seungrok.jung@amd.com>

qwen3.5 fp8 perf changelog

dcf4337

Signed-off-by: seungrokj <seungrok.jung@amd.com>

claude bot reviewed Mar 16, 2026

View reviewed changes

prevent pythonbytecode

3335a18

Signed-off-by: seungrokj <seungrok.jung@amd.com>

chunfangamd approved these changes Mar 18, 2026

View reviewed changes

chunfangamd enabled auto-merge (squash) March 18, 2026 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD/ROCm] qwen3.5 fp8 mi355x optimization#910

[AMD/ROCm] qwen3.5 fp8 mi355x optimization#910
seungrokj wants to merge 9 commits intomainfrom
srok/qwen3.5fp8_mi355x

seungrokj commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

claude bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

seungrokj commented Mar 17, 2026

Uh oh!

seungrokj commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

functionstackx commented Mar 18, 2026 •

edited

Loading

Uh oh!

chunfangamd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

seungrokj commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #910 Review

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

seungrokj commented Mar 17, 2026

Uh oh!

seungrokj commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

functionstackx commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Mar 16, 2026 •

edited

Loading

functionstackx commented Mar 18, 2026 •

edited

Loading