[Cherry-Pick][RL]Remove extra argument from per_token_quant calls in deepgemm MoE backend, fix #6850#7258
Conversation
|
Thanks for your contribution! |
| # down_proj | ||
| if not fastdeploy.envs.FD_USE_PHI_FP8_QUANT: | ||
| ffn_in_x, ffn_in_x_scale_tensor = fastdeploy.model_executor.ops.gpu.per_token_quant( | ||
| ffn_out, quant_config_weight_block_size_0, not disable_ue8m0_cast | ||
| ffn_out, quant_config_weight_block_size_0 | ||
| ) |
There was a problem hiding this comment.
该 PR 标题作为 Cherry-Pick 仍未满足仓库约定格式:需要在标题末尾追加原 develop PR 编号(例如 ...(#xxxx))。当前标题缺少原始 PR ID,可能触发 CI 的 Cherry-Pick 校验失败。
| # down_proj | ||
| if not fastdeploy.envs.FD_USE_PHI_FP8_QUANT: | ||
| ffn_in_x, ffn_in_x_scale_tensor = fastdeploy.model_executor.ops.gpu.per_token_quant( | ||
| ffn_out, self.quant_config.weight_block_size[0], self.quant_config.deepgemm_scale_ue8m0 | ||
| ffn_out, self.quant_config.weight_block_size[0] | ||
| ) |
There was a problem hiding this comment.
PR 描述的 Motivation/Modifications/Accuracy Tests 等关键字段目前为空,且 Checklist 中未解释为何不补充单测与精度结果。该改动涉及量化/后端调用(可能影响运行与数值路径),建议补充:1) 修改动机与影响范围;2) 至少给出相关单测/命令或说明为何无需;3) 如适用提供精度对比结果。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/2.5 #7258 +/- ##
==============================================
Coverage ? 68.51%
==============================================
Files ? 390
Lines ? 54405
Branches ? 8577
==============================================
Hits ? 37276
Misses ? 14435
Partials ? 2694
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review | 2026-04-09
📋 Review 摘要
PR 概述:修复 deepgemm MoE backend 中 per_token_quant 函数调用时传入多余的第三个参数
变更范围:fastdeploy/model_executor/layers/moe/fused_moe_deepgemm_backend.py
影响面 Tag:[RL]
📝 PR 规范检查
PR 描述中的 Motivation 和 Modifications 部分为空,建议补充:
描述模板(可直接复制):
## Motivation
`per_token_quant` 函数签名只接受两个参数(`input` 和 `block_size`),但代码中传入了第三个参数导致运行时报错。
## Modifications
移除 `per_token_quant` 函数调用中的第三个参数:
- `m_grouped_fp8_gemm_nt_contiguous_custom_python_op` 函数中的调用
- `apply_ep_prefill` 方法中的调用问题
未发现阻塞性问题。
总体评价
✅ 修改正确。已验证 per_token_quant 函数签名(来自 cpp_extensions.cc)只接受 input 和 block_size 两个参数,其他调用点(测试文件、triton_backend 等)也都使用两个参数。建议补充 PR 描述以提高可追溯性。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.