Skip to content

[XPU] Split the block_attn operator into smaller operators #6798

Merged
iosmers merged 7 commits intoPaddlePaddle:developfrom
RuohengMa:new_decouple
Apr 16, 2026
Merged

[XPU] Split the block_attn operator into smaller operators #6798
iosmers merged 7 commits intoPaddlePaddle:developfrom
RuohengMa:new_decouple

Conversation

@RuohengMa
Copy link
Copy Markdown
Contributor

Motivation

splice block_attn fused op

Modifications

splice block_attn

Usage or Command

export encoder_splice=1 to enable spliced split_rope/neox_cache_kv_encoder
export decoder_splice=1 to enable spliced split_rope/neox_cache_kv_decoder

Accuracy Tests

None

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 12, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@26d6a20). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6798   +/-   ##
==========================================
  Coverage           ?   74.47%           
==========================================
  Files              ?      383           
  Lines              ?    53619           
  Branches           ?     8412           
==========================================
  Hits               ?    39931           
  Misses             ?    10968           
  Partials           ?     2720           
Flag Coverage Δ
GPU 74.46% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

hong19860320
hong19860320 previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@iosmers
Copy link
Copy Markdown
Collaborator

iosmers commented Apr 13, 2026

LGTM

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review 摘要

PR 概述:为 XPU 平台引入 splice 方式的 block_attn 算子,支持通过环境变量 encoder_splicedecoder_splice 开启 splice 模式。

变更范围custom_ops/xpu_ops/fastdeploy/model_executor/layers/backends/xpu/fastdeploy/worker/fastdeploy/spec_decode/

影响面 Tag[XPU] [OP] [Speculative Decoding]

📝 PR 规范检查

PR 描述基本符合要求,但 Checklist 存在以下问题:

项目 状态 说明
Add at least a tag in PR title 已包含 [XPU] tag
Format your code, run pre-commit 未勾选,建议确认是否已执行
Add unit tests 未勾选,但实际新增了测试文件,建议补充单元测试覆盖说明
Provide accuracy results 未提供准确性验证结果
If submitting to release branch 当前提交到 develop 分支

标题建议:当前标题 [XPU] splice block_attn 符合规范,建议可考虑更明确地描述优化类型:

  • [Optimization][XPU] Add spliced block_attn for better performance

问题

级别 文件 概述
🟡 建议 custom_ops/xpu_ops/download_dependencies.sh:18 依赖版本改为 latest 可能导致构建不稳定性
🟡 建议 docs/ 缺少新功能的使用文档(encoder_splice/decoder_splice)
🟡 建议 custom_ops/xpu_ops/test/test_block_attn.py 测试存在多个 TODO 和被注释的用例
🟡 建议 PR 描述 缺少性能对比数据和准确性验证结果

总体评价

代码实现整体结构合理,splice 方式的实现思路清晰。但存在以下待改进之处:

  1. 依赖管理:将固定版本改为 latest 可能导致不同时间构建产生不同结果,影响可复现性
  2. 文档缺失:新增的 splice 功能缺少使用文档,用户无法了解如何正确使用
  3. 测试覆盖:测试代码中存在多处 TODO 和注释掉的测试用例(如 mixed mode、mtp branch),表明功能尚未完全验证
  4. 缺少数据:未提供性能对比数据,无法验证 splice 模式是否真正带来性能提升

建议补充文档和测试覆盖后再合并,或作为实验性功能提供明确的标识。

version_xtdk="3.4.0.1"
else
version_xvllm="20260407"
version_xvllm="latest"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 将依赖版本从固定版本号改为 latest 可能导致构建不稳定性,建议使用具体的版本号或通过配置管理。

原因:使用 latest 标签会导致不同时间拉取到不同版本的依赖,影响构建的可复现性和稳定性。

建议修复方式

# 使用明确的版本号
version_xvllm="20260415"  # 或其他明确版本

@iosmers iosmers changed the title [XPU] splice block_attn [XPU] Split the block_attn operator into smaller operators Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@yongqiangma yongqiangma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CUDAGraph

@iosmers iosmers merged commit de0c5e6 into PaddlePaddle:develop Apr 16, 2026
53 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.