[XPU] add support for rope3d#7518
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-21 10:39:16
📋 Review 摘要
PR 概述:在 XPU spliced block attention 的 encoder/decoder 路径中添加 rope3d 支持,计算 3D 位置编码偏移量并传递给 rotary embedding 函数。
变更范围:custom_ops/xpu_ops/src/ops/block_attn_spliced.cc
影响面 Tag:XPU OP
📝 PR 规范检查
PR 描述的 Motivation 和 Modifications 过于简略,仅写了 "add support for rope3d",建议补充具体的变更内容和适用场景。
描述模板(可直接复制):
## Motivation
在 spliced block attention 路径中支持 rope3d(3D 旋转位置编码),使多模态模型(如 VL 模型)能够在 XPU encoder/decoder splice 模式下正确运行。此前 spliced 路径会在 rope_3d=true 时跳过,回退到非 splice 路径。
## Modifications
- 移除 `split_kvcache_encoder` 和 `split_kvcache_decoder` 中对 `rope_3d` 的限制条件
- 新增 `pos_emb_offset` 计算逻辑:当 rope_3d 为 true 时,根据 `real_batch * 2 * rope_max_seqlen + start_tokens` 公式计算位置偏移
- 将 rotary embedding 调用中的 `start_tokens` 替换为 `pos_emb_offset`,将硬编码的 `"NORMAL"` 替换为 `pos_emb_type` 变量
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | block_attn_spliced.cc:277 |
encoder 与 decoder 中 ~40 行 pos_emb_offset 计算代码完全重复,建议抽取辅助函数 |
| ❓ 疑问 | block_attn_spliced.cc:329 |
gptj 路径已使用 pos_emb_type,但 neox 路径仍硬编码 "NORMAL",需确认是否一致 |
总体评价
实现逻辑正确,rope3d 的位置偏移公式 real_batch * 2 * rope_max_seqlen + start_tokens 与注释一致,XPU/CPU 双端计算也保持同步。主要建议是消除 encoder/decoder 之间的重复代码,以及确认 neox 路径的 pos_emb_type 是否需要同步修改。
| paddle::Tensor pos_emb_offset_cpu; | ||
| api::VectorParam<int32_t> pos_emb_offset; | ||
| pos_emb_offset.len = start_tokens.len; | ||
| if (rope_3d) { |
There was a problem hiding this comment.
🟡 建议 encoder 和 decoder 中 pos_emb_offset 的计算逻辑完全一致(约 40 行),建议抽取为独立的辅助函数(如 compute_pos_emb_offset),避免代码重复,后续维护时只需修改一处。
例如:
api::VectorParam<int32_t> compute_pos_emb_offset(
api::Context* xpu_ctx,
bool rope_3d,
int batch_size,
int64_t rope_max_seqlen,
const api::VectorParam<int32_t>& start_tokens,
const api::VectorParam<int32_t>& real_batch,
const paddle::Place& place,
paddle::Tensor& offset_xpu,
paddle::Tensor& offset_cpu);| start_tokens, | ||
| "NORMAL", | ||
| pos_emb_offset, | ||
| pos_emb_type, |
There was a problem hiding this comment.
❓ 疑问 此处 vsl_rotary_embedding_gptj 调用已正确使用 pos_emb_type(可能为 "HALF_HEAD_DIM" 等值),但下方 vsl_rotary_embedding_neox 路径(第 349 行)仍然硬编码为 "NORMAL"。请确认 neox 路径是否也需要使用 pos_emb_type 以保持一致?参考 block_attn.cc 中非 spliced 路径的实现,neox 分支也传递了 pos_emb_type 变量。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7518 +/- ##
==========================================
Coverage ? 73.60%
==========================================
Files ? 398
Lines ? 54988
Branches ? 8616
==========================================
Hits ? 40472
Misses ? 11807
Partials ? 2709
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* [XPU] add support for rope3d * support decoder --------- Co-authored-by: yinwei <yinwei_hust@163.com>
Motivation
add support for rope3d
Modifications
add support for rope3d
Usage or Command
None
Accuracy Tests
None
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.