Skip to content

[XPU] add support for rope3d#7518

Merged
iosmers merged 3 commits into
PaddlePaddle:developfrom
RuohengMa:rope3d
Apr 21, 2026
Merged

[XPU] add support for rope3d#7518
iosmers merged 3 commits into
PaddlePaddle:developfrom
RuohengMa:rope3d

Conversation

@RuohengMa
Copy link
Copy Markdown
Contributor

Motivation

add support for rope3d

Modifications

add support for rope3d

Usage or Command

None

Accuracy Tests

None

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 20, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the XPU label Apr 20, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-21 10:39:16

📋 Review 摘要

PR 概述:在 XPU spliced block attention 的 encoder/decoder 路径中添加 rope3d 支持,计算 3D 位置编码偏移量并传递给 rotary embedding 函数。
变更范围custom_ops/xpu_ops/src/ops/block_attn_spliced.cc
影响面 TagXPU OP

📝 PR 规范检查

PR 描述的 Motivation 和 Modifications 过于简略,仅写了 "add support for rope3d",建议补充具体的变更内容和适用场景。

描述模板(可直接复制):

## Motivation

在 spliced block attention 路径中支持 rope3d(3D 旋转位置编码),使多模态模型(如 VL 模型)能够在 XPU encoder/decoder splice 模式下正确运行。此前 spliced 路径会在 rope_3d=true 时跳过,回退到非 splice 路径。

## Modifications

- 移除 `split_kvcache_encoder` 和 `split_kvcache_decoder` 中对 `rope_3d` 的限制条件
- 新增 `pos_emb_offset` 计算逻辑:当 rope_3d 为 true 时,根据 `real_batch * 2 * rope_max_seqlen + start_tokens` 公式计算位置偏移
- 将 rotary embedding 调用中的 `start_tokens` 替换为 `pos_emb_offset`,将硬编码的 `"NORMAL"` 替换为 `pos_emb_type` 变量

问题

级别 文件 概述
🟡 建议 block_attn_spliced.cc:277 encoder 与 decoder 中 ~40 行 pos_emb_offset 计算代码完全重复,建议抽取辅助函数
❓ 疑问 block_attn_spliced.cc:329 gptj 路径已使用 pos_emb_type,但 neox 路径仍硬编码 "NORMAL",需确认是否一致

总体评价

实现逻辑正确,rope3d 的位置偏移公式 real_batch * 2 * rope_max_seqlen + start_tokens 与注释一致,XPU/CPU 双端计算也保持同步。主要建议是消除 encoder/decoder 之间的重复代码,以及确认 neox 路径的 pos_emb_type 是否需要同步修改。

paddle::Tensor pos_emb_offset_cpu;
api::VectorParam<int32_t> pos_emb_offset;
pos_emb_offset.len = start_tokens.len;
if (rope_3d) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 encoder 和 decoder 中 pos_emb_offset 的计算逻辑完全一致(约 40 行),建议抽取为独立的辅助函数(如 compute_pos_emb_offset),避免代码重复,后续维护时只需修改一处。

例如:

api::VectorParam<int32_t> compute_pos_emb_offset(
    api::Context* xpu_ctx,
    bool rope_3d,
    int batch_size,
    int64_t rope_max_seqlen,
    const api::VectorParam<int32_t>& start_tokens,
    const api::VectorParam<int32_t>& real_batch,
    const paddle::Place& place,
    paddle::Tensor& offset_xpu,
    paddle::Tensor& offset_cpu);

start_tokens,
"NORMAL",
pos_emb_offset,
pos_emb_type,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 此处 vsl_rotary_embedding_gptj 调用已正确使用 pos_emb_type(可能为 "HALF_HEAD_DIM" 等值),但下方 vsl_rotary_embedding_neox 路径(第 349 行)仍然硬编码为 "NORMAL"。请确认 neox 路径是否也需要使用 pos_emb_type 以保持一致?参考 block_attn.cc 中非 spliced 路径的实现,neox 分支也传递了 pos_emb_type 变量。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@5e866e3). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7518   +/-   ##
==========================================
  Coverage           ?   73.60%           
==========================================
  Files              ?      398           
  Lines              ?    54988           
  Branches           ?     8616           
==========================================
  Hits               ?    40472           
  Misses             ?    11807           
  Partials           ?     2709           
Flag Coverage Δ
GPU 73.60% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@iosmers iosmers merged commit 9d3551c into PaddlePaddle:develop Apr 21, 2026
34 of 38 checks passed
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
* [XPU] add support for rope3d

* support decoder

---------

Co-authored-by: yinwei <yinwei_hust@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants