[XPU] add verify draft tokens by cmcamdy · Pull Request #6947 · PaddlePaddle/FastDeploy

cmcamdy · 2026-03-20T04:17:19Z

add verify draft tokens
#6685

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-20T04:17:32Z

Thanks for your contribution!

codecov-commenter · 2026-03-20T05:58:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@ea998dd). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6947   +/-   ##
==========================================
  Coverage           ?   73.51%           
==========================================
  Files              ?      383           
  Lines              ?    53614           
  Branches           ?     8412           
==========================================
  Hits               ?    39417           
  Misses             ?    11516           
  Partials           ?     2681

Flag	Coverage Δ
GPU	`73.51% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hong19860320

LGTM

Copilot

Pull request overview

该 PR 在 XPU（Kunlun3）侧新增 verify_draft_tokens 的插件/算子实现，并通过 pybind 暴露到 Python，同时补充了对应的单测，用于投机解码（Speculative Decoding）中对 draft tokens 的校验与 Phase2 采样输出。

Changes:

新增 XPU3 kernel：verify_draft_tokens.xpu，实现 Phase1 verify + Phase2 输出逻辑
新增插件 wrapper + Paddle 扩展算子 + pybind 导出：对外提供 verify_draft_tokens 调用入口
新增较完整的 Python 单测：覆盖不同 verify_strategy、verify_window、EOS/max_dec_len、skip 条件等

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
custom_ops/xpu_ops/test/test_verify_draft_tokens.py	新增 `verify_draft_tokens` 的 Python 单测与参考实现
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/verify_draft_tokens.cpp	新增 verify_draft_tokens 插件 wrapper（CPU/XPU3 wrapper + 参数校验）
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/verify_draft_tokens.xpu	新增 Kunlun3 XPU kernel 实现
custom_ops/xpu_ops/src/plugin/include/xpu/plugin.h	新增 `verify_draft_tokens` 插件函数声明
custom_ops/xpu_ops/src/ops/pybind/pybind.cc	新增 pybind 导出 `verify_draft_tokens`
custom_ops/xpu_ops/src/ops/mtp/verify_draft_token.cc	新增 Paddle 扩展算子 host 侧实现与参数校验
custom_ops/xpu_ops/src/ops/mtp/speculate_verify.cc	调整 speculate_verify 的随机数生成逻辑（持久 seed/offset）
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/speculate_verify.cpp	清理重复 typedef 声明

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-14

📋 Review 摘要

PR 概述：为 XPU 硬件添加 verify_draft_tokens 算子，支持 TOPP/GREEDY/TARGET_MATCH 三种验证策略，用于 speculative decoding 的 draft token 验证。

变更范围：custom_ops/xpu_ops/ (新增 7 个文件，修改 1 个文件)

新增 verify_draft_token.cc (host 端实现)
新增 verify_draft_tokens.xpu (XPU kernel)
新增 verify_draft_tokens.cpp (wrapper)
新增 test_verify_draft_tokens.py (单元测试)
修改 speculate_verify.cc (添加 persistent seed/offset)

影响面 Tag：[XPU] [OP]

📝 PR 规范检查

PR 描述未遵循官方模板，缺少以下必填章节：

标题建议（当前已有有效 tag）：

当前标题：[XPU] add verify draft tokens ✓

描述建议（请补充以下内容）：

## Motivation

为 XPU 硬件添加 draft token 验证算子，支持 speculative decoding 场景。参考 PR #6685 实现 XPU 版本。

## Modifications

1. 新增 `verify_draft_token.cc`：算子 host 端实现，支持三种验证策略
   - TOPP (0)：验证 draft token 是否在 top-p 候选集中
   - GREEDY (1)：验证 draft token 是否等于 argmax token
   - TARGET_MATCH (2)：验证 draft token 是否等于目标模型的采样 token

2. 新增 `verify_draft_tokens.xpu`：XPU3 kernel 实现，包含 VerifyContext 结构体和验证逻辑

3. 新增 `verify_draft_tokens.cpp`：CPU/XPU3 wrapper 实现

4. 新增 `test_verify_draft_tokens.py`：单元测试（包含 Python 参考实现）

5. 修改 `speculate_verify.cc`：使用 persistent seed/offset 替代固定 random seed

## Usage or Command

```python
from fastdeploy.model_executor.ops.xpu import verify_draft_tokens

verify_draft_tokens(
    step_output_ids,      # 输出 token ids
    step_output_len,       # 输出长度
    step_input_ids,       # 输入 draft tokens
    target_tokens,        # target 模型输出 (GREEDY/TARGET_MATCH)
    candidate_ids,        # 候选 token ids (TOPP/GREEDY)
    candidate_scores,     # 候选分数 (TOPP)
    candidate_lens,      # 候选长度 (TOPP/GREEDY)
    topp,               # top-p 阈值
    stop_flags,         # 停止标志
    seq_lens_encoder,    # encoder 序列长度
    seq_lens_this_time, # 当前序列长度
    end_tokens,         # EOS tokens
    is_block_step,      # 是否 block step
    cu_seqlens_q_output,# 累积序列长度
    reasoning_status,    # 推理状态
    max_dec_len,       # 最大解码长度
    step_idx,          # 当前步索引
    max_seq_len=4096,
    verify_window=4,
    verify_strategy=0,  # 0=TOPP, 1=GREEDY, 2=TARGET_MATCH
    reject_all=False,
    accept_all=False
)

Accuracy Tests

已添加单元测试 test_verify_draft_tokens.py，包含 Python 参考实现与 XPU 实现的对比测试。

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit. (CI PASSED)
Add unit tests. Added test_verify_draft_tokens.py with reference implementation.
Provide accuracy results. (待补充)
If the current PR is submitting to the release branch... (N/A)


### 问题

| 级别 | 文件 | 概述 |
|------|------|------|
| 🟡 建议 | - | PR 描述不完整，缺少 Motivation/Modifications/Usage/Accuracy Tests 章节 |

### 总体评价

代码实现逻辑正确，包含完整的 Python 参考实现和单元测试。VerifyContext 结构体在 XPU kernel 和 wrapper 中实现一致，emit_token/emit_final_token 的 EOS/max_dec_len 检测逻辑正确。使用 persistent seed/offset 确保随机数质量与 GPU 版本对齐。主要问题是 PR 描述未遵循模板，请补充 Motivation/Modifications/Usage/Accuracy Tests 内容。

* [XPU] cherry-pick PR-6947 * [XPU] use unified_update_model_status. * refactor xpu_model_runner. * refactor sampler. * fix codestyle. * Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path. * fix codestyle. * replace output_padding_offset with is_speculative flag in gather_next_token. * rename hiddden_states. * unify cu_seqlens_q_output and batch_id_per_token_output init. --------- Co-authored-by: cmcamdy <1027740945@qq.com>

[XPU] add verify draft tokens

ea5ae27

cmcamdy temporarily deployed to Metax_ci March 20, 2026 04:17 — with GitHub Actions Inactive

fix test

ffcec19

cmcamdy had a problem deploying to Metax_ci March 20, 2026 06:10 — with GitHub Actions Error

fix code style

e94441a

cmcamdy temporarily deployed to Metax_ci March 20, 2026 06:12 — with GitHub Actions Inactive

mayang002 suggested changes Apr 8, 2026

View reviewed changes

Comment thread custom_ops/xpu_ops/src/ops/mtp/verify_draft_token.cc Outdated

use sync cpy

025fa3f

cmcamdy had a problem deploying to Metax_ci April 8, 2026 11:56 — with GitHub Actions Error

fix code style

e929f07

cmcamdy had a problem deploying to Metax_ci April 8, 2026 12:39 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix kernel check

f6d5b52

cmcamdy temporarily deployed to Metax_ci April 13, 2026 03:17 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

fix ramdom seed

c595cf0

cmcamdy had a problem deploying to Metax_ci April 13, 2026 05:10 — with GitHub Actions Error

fix test

f2b1192

cmcamdy had a problem deploying to Metax_ci April 13, 2026 05:11 — with GitHub Actions Error

fix check

97e960b

cmcamdy temporarily deployed to Metax_ci April 13, 2026 05:17 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

hong19860320 previously approved these changes Apr 13, 2026

View reviewed changes

fix eos set

662101d

cmcamdy dismissed hong19860320’s stale review via 662101d April 13, 2026 07:55

cmcamdy had a problem deploying to Metax_ci April 13, 2026 07:55 — with GitHub Actions Failure

Jiang-Jia-Jun requested a review from Copilot April 13, 2026 11:13

Copilot started reviewing on behalf of Jiang-Jia-Jun April 13, 2026 11:14 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

fix verify

59d7059

cmcamdy had a problem deploying to Metax_ci April 14, 2026 03:26 — with GitHub Actions Failure

fix verify

d7fccfe

cmcamdy had a problem deploying to Metax_ci April 14, 2026 03:29 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Apr 14, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 13b9fe7 into PaddlePaddle:develop Apr 15, 2026
34 of 38 checks passed

Conversation

cmcamdy commented Mar 20, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 20, 2026

Uh oh!

codecov-commenter commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

Accuracy Tests

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Mar 20, 2026 •

edited

Loading