[fix][acc][sgl-atom] fix accuracy of fp8 attn weights model using ptpc quant recipe#747
Merged
Conversation
…cipe Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
valarLip
approved these changes
May 11, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes an accuracy regression for SGLang-ATOM when running DeepSeek MLA attention with FP8-weight attention projections that use a PTPC (per-token) quantization recipe, by ensuring the fused RMSNorm+quant path honors the projection layer’s configured quantization type instead of forcing a block-scale scheme.
Changes:
- Route SGLang MLA’s FP8 activation quantization through ATOM’s DeepSeek-V2
_fuse_rmsnorm_quanthelper instead offused_rms_fp8_group_quant. - Derive and pass
quant_typefrom the corresponding FP8 projection modules (q_b_proj,kv_b_proj) so per-token quant recipes are respected.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
Author
|
align with Hattie, merge this PR since sglang ci case always fail now, will fix in #614 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Align with #670
The quark models, amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 has fp8 weight linear layers in attn and adopt ptpc quant recipe. But current code in ATOM forces block scale quant in _fuse_rmsnorm_quant. This pr fixed this issue.
Technical Details
_fuse_rmsnorm_quant should select correct quant type based on the quant config/recipe. For per-token quant, a new kernel: fused_qk_rmsnorm_per_token_quant is added in aiter.
Test Plan
The gsm8k dataset accuracy is validated with this pr on amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 with sglang-ATOM.
Test Result
Main branch:
amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4
SGLang-ATOM:
local-completions ({'model': '/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8013/v1/completions', 'num_concurrent': 65, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 3|exact_match|↑ |0.8544|± |0.0097| | | |strict-match | 3|exact_match|↑ |0.8400|± |0.0101|This PR:
amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 TP4
SGLang-ATOM:
llocal-completions ({'model': '/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8013/v1/completions', 'num_concurrent': 65, 'max_retries': 1, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 3|exact_match|↑ |0.9340|± |0.0068| | | |strict-match | 3|exact_match|↑ |0.9303|± |0.0070|For model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4, acc increase from 0.85 to 0.93
Submission Checklist