Skip to content

[AMD/ROCm] qwen3.5 fp8 mi355x optimization#910

Open
seungrokj wants to merge 9 commits intomainfrom
srok/qwen3.5fp8_mi355x
Open

[AMD/ROCm] qwen3.5 fp8 mi355x optimization#910
seungrokj wants to merge 9 commits intomainfrom
srok/qwen3.5fp8_mi355x

Conversation

@seungrokj
Copy link
Collaborator

waiting for the optimized upstream docker image.

Regards,
Seungrok

Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: seungrokj <seungrok.jung@amd.com>
@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

seungrokj and others added 2 commits March 16, 2026 09:38
Signed-off-by: seungrokj <seungrok.jung@amd.com>
@seungrokj seungrokj marked this pull request as ready for review March 16, 2026 09:42
@seungrokj seungrokj requested a review from a team March 16, 2026 09:42
@claude
Copy link
Contributor

claude bot commented Mar 16, 2026

Claude finished @seungrokj's task in 1m 22s —— View job


PR #910 Review

  • Gather context and read changed files
  • Validate master config and perf-changelog
  • Review benchmark script
  • Post review summary

1 blocking issue found.

🔴 perf-changelog.yaml not updated: .github/configs/amd-master.yaml was modified (image bump v0.5.8.post1v0.5.9) but perf-changelog.yaml was not updated. Please add an entry documenting the config change.

The benchmark script changes look good — server launch command is properly formatted with arguments on separate lines, optimization flags (--kv-cache-dtype fp8_e4m3, --chunked-prefill-size, --disable-radix-cache, --num-continuous-decode-steps 2, etc.) are reasonable for throughput tuning, and the image (rocm/sgl-dev:v0.5.9-rocm720-mi35x-20260315) is publicly accessible on Docker Hub.

Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: seungrokj <seungrok.jung@amd.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward Docker image bump and benchmark server tuning flags.

Extended reasoning...

Overview

This PR touches two files: a Docker image version bump in .github/configs/amd-master.yaml (from v0.5.8.post1 to v0.5.9) and optimization flags added to the benchmark script benchmarks/single_node/qwen3.5_fp8_mi355x.sh. The script changes include switching from python3 -m sglang.launch_server to the newer sglang serve CLI, making MEM_FRAC_STATIC configurable via environment variable, and adding several performance-tuning flags (fp8 KV cache, chunked prefill, cuda graph bounds, radix cache disable, etc.).

Security risks

None. These are benchmark configuration files and a shell script that runs in a controlled CI/benchmarking environment. No auth, crypto, permissions, or user-facing code is affected.

Level of scrutiny

Low scrutiny is appropriate. This is a config and benchmark tuning change following established patterns already present in the repo for other model configurations. The author has triggered a test run via /test-config.

Other factors

No bugs were found by the automated bug hunting system. No outstanding reviewer comments exist. The changes are self-contained and follow the same patterns as other benchmark scripts in the repository.

@seungrokj
Copy link
Collaborator Author

@functionstackx can you plz run this sweep ? (I tried to run this manually but seems not working..)

@seungrokj
Copy link
Collaborator Author

/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-fp8-mi355x-sglang

@github-actions
Copy link
Contributor

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23223448478
Command: test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-fp8-mi355x-sglang
Pinned ref: dcf4337
Approval: not required (trusted collaborator).

Signed-off-by: seungrokj <seungrok.jung@amd.com>
@functionstackx
Copy link
Contributor

functionstackx commented Mar 18, 2026

@cquil11 or @Oseltamivir can u help for day to day tasks. plz ping them

Copy link
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chunfangamd chunfangamd enabled auto-merge (squash) March 18, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants