[XPU] Split the block_attn operator into smaller operators by RuohengMa · Pull Request #6798 · PaddlePaddle/FastDeploy

RuohengMa · 2026-03-12T03:25:39Z

Motivation

splice block_attn fused op

Modifications

splice block_attn

Usage or Command

export encoder_splice=1 to enable spliced split_rope/neox_cache_kv_encoder
export decoder_splice=1 to enable spliced split_rope/neox_cache_kv_decoder

Accuracy Tests

None

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-12T03:25:46Z

Thanks for your contribution!

codecov-commenter · 2026-03-12T08:01:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@26d6a20). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6798   +/-   ##
==========================================
  Coverage           ?   74.47%           
==========================================
  Files              ?      383           
  Lines              ?    53619           
  Branches           ?     8412           
==========================================
  Hits               ?    39931           
  Misses             ?    10968           
  Partials           ?     2720

Flag	Coverage Δ
GPU	`74.46% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hong19860320

LGTM

iosmers · 2026-04-13T08:39:23Z

LGTM

PaddlePaddle-bot

📋 Review 摘要

PR 概述：为 XPU 平台引入 splice 方式的 block_attn 算子，支持通过环境变量 encoder_splice 和 decoder_splice 开启 splice 模式。

变更范围：custom_ops/xpu_ops/、fastdeploy/model_executor/layers/backends/xpu/、fastdeploy/worker/、fastdeploy/spec_decode/

影响面 Tag：[XPU] [OP] [Speculative Decoding]

📝 PR 规范检查

PR 描述基本符合要求，但 Checklist 存在以下问题：

项目	状态	说明
Add at least a tag in PR title	✅	已包含 `[XPU]` tag
Format your code, run pre-commit	❓	未勾选，建议确认是否已执行
Add unit tests	❌	未勾选，但实际新增了测试文件，建议补充单元测试覆盖说明
Provide accuracy results	❌	未提供准确性验证结果
If submitting to release branch	✅	当前提交到 develop 分支

标题建议：当前标题 [XPU] splice block_attn 符合规范，建议可考虑更明确地描述优化类型：

[Optimization][XPU] Add spliced block_attn for better performance

问题

级别	文件	概述
🟡 建议	`custom_ops/xpu_ops/download_dependencies.sh:18`	依赖版本改为 `latest` 可能导致构建不稳定性
🟡 建议	`docs/`	缺少新功能的使用文档（encoder_splice/decoder_splice）
🟡 建议	`custom_ops/xpu_ops/test/test_block_attn.py`	测试存在多个 TODO 和被注释的用例
🟡 建议	PR 描述	缺少性能对比数据和准确性验证结果

总体评价

代码实现整体结构合理，splice 方式的实现思路清晰。但存在以下待改进之处：

依赖管理：将固定版本改为 latest 可能导致不同时间构建产生不同结果，影响可复现性
文档缺失：新增的 splice 功能缺少使用文档，用户无法了解如何正确使用
测试覆盖：测试代码中存在多处 TODO 和注释掉的测试用例（如 mixed mode、mtp branch），表明功能尚未完全验证
缺少数据：未提供性能对比数据，无法验证 splice 模式是否真正带来性能提升

建议补充文档和测试覆盖后再合并，或作为实验性功能提供明确的标识。

PaddlePaddle-bot · 2026-04-15T04:22:16Z

    version_xtdk="3.4.0.1"
 else
-    version_xvllm="20260407"
+    version_xvllm="latest"


🟡 建议 将依赖版本从固定版本号改为 latest 可能导致构建不稳定性，建议使用具体的版本号或通过配置管理。

原因：使用 latest 标签会导致不同时间拉取到不同版本的依赖，影响构建的可复现性和稳定性。

建议修复方式：

# 使用明确的版本号 version_xvllm="20260415" # 或其他明确版本

yongqiangma

LGTM

gongshaotian

LGTM for CUDAGraph

RuohengMa temporarily deployed to Metax_ci March 12, 2026 03:25 — with GitHub Actions Inactive

paddle-bot bot added the XPU label Mar 12, 2026

RuohengMa had a problem deploying to Metax_ci March 12, 2026 05:58 — with GitHub Actions Failure

RuohengMa had a problem deploying to Metax_ci March 17, 2026 08:15 — with GitHub Actions Error

RuohengMa temporarily deployed to Metax_ci March 17, 2026 09:19 — with GitHub Actions Inactive

RuohengMa temporarily deployed to Metax_ci March 18, 2026 06:11 — with GitHub Actions Inactive

RuohengMa temporarily deployed to Metax_ci March 18, 2026 07:42 — with GitHub Actions Inactive

RuohengMa temporarily deployed to Metax_ci March 19, 2026 02:28 — with GitHub Actions Inactive

RuohengMa force-pushed the new_decouple branch from ff46945 to 3395b26 Compare March 30, 2026 11:00

RuohengMa temporarily deployed to Metax_ci March 30, 2026 11:00 — with GitHub Actions Inactive

RuohengMa had a problem deploying to Metax_ci April 3, 2026 08:13 — with GitHub Actions Failure

RuohengMa had a problem deploying to Metax_ci April 3, 2026 09:05 — with GitHub Actions Failure

RuohengMa had a problem deploying to Metax_ci April 3, 2026 09:10 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

mayang002 approved these changes Apr 3, 2026

View reviewed changes

RuohengMa had a problem deploying to Metax_ci April 7, 2026 03:03 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

RuohengMa had a problem deploying to Metax_ci April 8, 2026 10:46 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

RuohengMa had a problem deploying to Metax_ci April 9, 2026 03:25 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

RuohengMa had a problem deploying to Metax_ci April 9, 2026 06:57 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

RuohengMa temporarily deployed to Metax_ci April 9, 2026 09:42 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

RuohengMa temporarily deployed to Metax_ci April 10, 2026 09:11 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

hong19860320 previously approved these changes Apr 10, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

RuohengMa added 2 commits April 13, 2026 02:43

spliced block_attn

b90018a

adapt to latest vllm

17cd51f

RuohengMa dismissed stale reviews from hong19860320 and cmcamdy via 17cd51f April 13, 2026 03:09

RuohengMa force-pushed the new_decouple branch from c345c1c to 17cd51f Compare April 13, 2026 03:09

RuohengMa had a problem deploying to Metax_ci April 13, 2026 03:09 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix unit tests

58591b8

RuohengMa temporarily deployed to Metax_ci April 13, 2026 06:13 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

delete mtp+cudagraph 4 cards test

b3977bd

RuohengMa had a problem deploying to Metax_ci April 14, 2026 02:32 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix vl model

5f25304

RuohengMa had a problem deploying to Metax_ci April 14, 2026 03:31 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix mtp

f16623d

RuohengMa temporarily deployed to Metax_ci April 14, 2026 11:17 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

fix slot mapping

b734101

RuohengMa temporarily deployed to Metax_ci April 15, 2026 03:42 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 15, 2026

View reviewed changes

iosmers changed the title ~~[XPU] splice block_attn~~ [XPU] Split the block_attn operator into smaller operators Apr 16, 2026

yongqiangma approved these changes Apr 16, 2026

View reviewed changes

qingqing01 approved these changes Apr 16, 2026

View reviewed changes

gongshaotian approved these changes Apr 16, 2026

View reviewed changes

freeliuzc approved these changes Apr 16, 2026

View reviewed changes

iosmers merged commit de0c5e6 into PaddlePaddle:develop Apr 16, 2026
53 of 59 checks passed

Conversation

RuohengMa commented Mar 12, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 12, 2026

Uh oh!

codecov-commenter commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

iosmers commented Apr 13, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

yongqiangma left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

codecov-commenter commented Mar 12, 2026 •

edited

Loading