[fix][acc][sgl-atom] fix accuracy of fp8 attn weights model using ptpc quant recipe by zhuyuhua-v · Pull Request #747 · ROCm/ATOM

zhuyuhua-v · 2026-05-11T08:10:10Z

Motivation

Align with #670
The quark models, amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 has fp8 weight linear layers in attn and adopt ptpc quant recipe. But current code in ATOM forces block scale quant in _fuse_rmsnorm_quant. This pr fixed this issue.

Technical Details

_fuse_rmsnorm_quant should select correct quant type based on the quant config/recipe. For per-token quant, a new kernel: fused_qk_rmsnorm_per_token_quant is added in aiter.

Test Plan

The gsm8k dataset accuracy is validated with this pr on amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 with sglang-ATOM.

Test Result

Main branch:

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4

SGLang-ATOM:

local-completions ({'model': '/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8013/v1/completions', 'num_concurrent': 65, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.8544|±  |0.0097|
|     |       |strict-match    |     3|exact_match|↑  |0.8400|±  |0.0101|

This PR:

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4

SGLang-ATOM:

llocal-completions ({'model': '/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8013/v1/completions', 'num_concurrent': 65, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9340|±  |0.0068|
|     |       |strict-match    |     3|exact_match|↑  |0.9303|±  |0.0070|

For model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4, acc increase from 0.85 to 0.93

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…cipe Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

This PR fixes an accuracy regression for SGLang-ATOM when running DeepSeek MLA attention with FP8-weight attention projections that use a PTPC (per-token) quantization recipe, by ensuring the fused RMSNorm+quant path honors the projection layer’s configured quantization type instead of forcing a block-scale scheme.

Changes:

Route SGLang MLA’s FP8 activation quantization through ATOM’s DeepSeek-V2 _fuse_rmsnorm_quant helper instead of fused_rms_fp8_group_quant.
Derive and pass quant_type from the corresponding FP8 projection modules (q_b_proj, kv_b_proj) so per-token quant recipes are respected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zhuyuhua-v · 2026-05-11T08:53:28Z

align with Hattie, merge this PR since sglang ci case always fail now, will fix in #614

zhuyuhua-v added 3 commits May 11, 2026 08:02

[fix][acc] fix accuracy of fp8 attn weights model using ptpc quant re…

082dcb2

…cipe Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

update import

028f295

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Merge branch 'main' into yuhua/sgl-fp4-mtp-acc-fix

fb0e4f8

zhuyuhua-v marked this pull request as ready for review May 11, 2026 08:40

Copilot AI review requested due to automatic review settings May 11, 2026 08:40

Copilot started reviewing on behalf of zhuyuhua-v May 11, 2026 08:42 View session

valarLip approved these changes May 11, 2026

View reviewed changes

Copilot AI reviewed May 11, 2026

View reviewed changes

zhuyuhua-v mentioned this pull request May 11, 2026

(ci)(recipe): Add DeepSeek-R1 FP4 TP4 validation and DS recipe for SGLang-ATOM #614

Merged

zhuyuhua-v merged commit 7934d5e into main May 11, 2026
38 of 44 checks passed

zhuyuhua-v deleted the yuhua/sgl-fp4-mtp-acc-fix branch May 11, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][acc][sgl-atom] fix accuracy of fp8 attn weights model using ptpc quant recipe#747

[fix][acc][sgl-atom] fix accuracy of fp8 attn weights model using ptpc quant recipe#747
zhuyuhua-v merged 3 commits into
mainfrom
yuhua/sgl-fp4-mtp-acc-fix

zhuyuhua-v commented May 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zhuyuhua-v commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhuyuhua-v commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Main branch:

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4

This PR:

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

zhuyuhua-v commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuyuhua-v commented May 11, 2026 •

edited

Loading