Skip to content

[Speculate Decoding] Fix step_idx semantics in reasoning_phase_token_constraint and speculate set_value kernels#7349

Merged
freeliuzc merged 1 commit intoPaddlePaddle:developfrom
lonelygsh:fix-reasoning-phase-step_idx
Apr 14, 2026
Merged

[Speculate Decoding] Fix step_idx semantics in reasoning_phase_token_constraint and speculate set_value kernels#7349
freeliuzc merged 1 commit intoPaddlePaddle:developfrom
lonelygsh:fix-reasoning-phase-step_idx

Conversation

@lonelygsh
Copy link
Copy Markdown
Contributor

@lonelygsh lonelygsh commented Apr 12, 2026

Motivation

适配 step_idx 新语义。

  • 旧语义:step_idx 表示处理后累加的 token 数量,pre_ids_now[0] 为预留位(prompt 最后一个 token)
  • 新语义:step_idx 表示历史已生成的 token 数量,pre_ids_now[0] 为第一个 output token

本 PR 同步修改相关 kernel 及其单测。

Modifications

custom_ops/gpu_ops/reasoning_phase_token_constraint.cu

  • 修复 update_reasoning_status_kernel 中读取最近 4 个历史 token 的下标逻辑
  • 新语义下 step_idx 即为历史 token 数,pre_ids_now[cur_step - 1] 正确指向最新写入的 token,cur_step >= 4 边界判断亦对应正确

tests/operators/test_reasoning_phase_token_constraint.py

  • 调整状态转移测试中使用的 step_idx 取值

Usage or Command

运行单测验证:

python -m pytest tests/operators/test_reasoning_phase_token_constraint.py -v

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 12, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 12, 2026

CLA assistant check
All committers have signed the CLA.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 12, 2026
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from 612db12 to d2b540d Compare April 12, 2026 17:59
@lonelygsh lonelygsh changed the title [Reasoning Phase] Fix step_idx semantics in reasoning_phase_token_con… [Speculate Decoding] Fix step_idx semantics across multiple kernels Apr 12, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from d2b540d to 9088ea1 Compare April 12, 2026 18:38
PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@e83d458). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7349   +/-   ##
==========================================
  Coverage           ?   74.38%           
==========================================
  Files              ?      383           
  Lines              ?    53614           
  Branches           ?     8412           
==========================================
  Hits               ?    39882           
  Misses             ?    11015           
  Partials           ?     2717           
Flag Coverage Δ
GPU 74.38% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from 9088ea1 to 8747dae Compare April 13, 2026 04:49
PaddlePaddle-bot

This comment was marked as outdated.

// Shadow state: initialize from target model (prefill just finished)
step_idx[tid] = target_step - 1;
// target_step 表示历史 token 数量,prefill 后为 0
step_idx[tid] = target_step;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前有一个结论是MTP第一步推理完后和Target模型的step_idx对齐,这里预处理直接赋值为target的stpe_idx是什么原因呢?

}
} else if (step_idx[tid] == 1 /*Encoder Step*/) {
pre_ids_all_now[1] = draft_token_now[0];
} else if (step_idx[tid] == 0 /*Encoder Step*/) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还沿用stpe_idx == 0的形式判断 prefill 阶段是否在所有场景都正确呢?是否考虑过重调度的情况,被驱逐的query再被拉回的时候,step_idx应该不是从0开始的

base_model_draft_tokens_now[substep + 1] = token_this_time;
step_idx[tid] += seq_len_this_time;
pre_ids_now[step_idx[tid]] = token_this_time;
pre_ids_now[step_idx[tid] - 1] = token_this_time;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处以及下面的所有改动都是因为MTP从Target模型的 step 开始解码而引发的修改吗?如果是的话,和之前相比,一定要从主模型的step_idx开始解码的原因是什么?

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from 8747dae to 64b2013 Compare April 13, 2026 09:24
PaddlePaddle-bot

This comment was marked as outdated.

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from 64b2013 to 0987851 Compare April 13, 2026 14:49
@lonelygsh lonelygsh changed the title [Speculate Decoding] Fix step_idx semantics across multiple kernels [Speculate Decoding] Fix step_idx semantics in reasoning_phase_token_constraint and speculate set_value kernels Apr 13, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@lonelygsh lonelygsh force-pushed the fix-reasoning-phase-step_idx branch from 0987851 to 18bd0a5 Compare April 14, 2026 03:37
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-14 11:58 CST

📋 Review 摘要

PR 概述:修复 reasoning_phase_token_constraint kernel 中的 step_idx 语义适配

变更范围custom_ops/gpu_ops/reasoning_phase_token_constraint.cutests/operators/test_reasoning_phase_token_constraint.py

影响面 Tag[Speculative Decoding] [OP]

📝 PR 规范检查

PR 标题包含有效的 [Speculative Decoding] tag,描述详细填写了 Motivation 和 Modifications,符合规范。

问题

级别 文件 概述
🔴 Bug PR 整体 PR 描述与实际修改不一致:描述中提到修改 5 个文件,但实际只修改了 2 个
🟡 建议 custom_ops/gpu_ops/speculate_decoding/draft_model/draft_model_set_value_by_flags.cu 该文件仍使用旧的 step_idx 语义,建议同步修改
🟡 建议 custom_ops/gpu_ops/speculate_decoding/speculate_set_value_by_flags_and_idx.cu 该文件仍使用旧的 step_idx 语义,建议同步修改
🟡 建议 tests/operators/test_draft_model_set_value_by_flags.py 该测试仍使用旧的 step_idx 语义,建议同步修改

总体评价

reasoning_phase_token_constraint.cutest_reasoning_phase_token_constraint.py 的修改正确,成功适配了 step_idx 的新语义。但 PR 描述与实际修改存在严重不一致:描述中提到要修改 5 个文件以适配新语义,但实际只修改了 2 个。其余 3 个文件(draft_model_set_value_by_flags.cuspeculate_set_value_by_flags_and_idx.cutest_draft_model_set_value_by_flags.py)仍使用旧的 step_idx 语义,这可能导致 speculate decoding 各 kernel 之间行为不一致,影响系统正确性。

建议:

  1. 补充剩余 3 个文件的 step_idx 语义修改,或说明这些文件将在其他 PR 中处理
  2. 如分多 PR 处理,建议在描述中说明分批计划,避免语义变更不完整导致的问题

int64_t t1 = (cur_step >= 1) ? pre_ids_now[cur_step - 1] : -1;
int64_t t2 = (cur_step >= 2) ? pre_ids_now[cur_step - 2] : -1;
int64_t t3 = (cur_step >= 3) ? pre_ids_now[cur_step - 3] : -1;
int64_t t0 = (cur_step >= 1) ? pre_ids_now[cur_step - 1] : -1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 修复验证

新语义下 pre_ids_now[0] 为第一个 output token,pre_ids_now[cur_step - 1] 正确指向最新写入的 token(当 cur_step >= 1)。修改正确。

注意:PR 描述中提到还要修改 draft_model_set_value_by_flags.cuspeculate_set_value_by_flags_and_idx.cu,但这两个文件并未包含在本 PR 中,请确认是否遗漏。

token_ids_all[0, 1] = self.line_break_id
token_ids_all[0, 2] = self.think_end_id
# batch 0: pattern \n <think_end> \n \n at pre_ids_now[0..3]
token_ids_all[0, 0] = self.line_break_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议

测试用例的修改与新语义一致:step_idx=4 时,pre_ids_now[0..3] 包含 4 个历史 token,验证了 kernel 正确读取最近 4 个 token 的逻辑。

建议:PR 描述中提到的 test_draft_model_set_value_by_flags.py 测试也应同步更新,请确认是否需要修改。

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 适配 speculate decoding 场景下 step_idx 的新语义(表示“历史已生成 token 数”,且 pre_ids_now[0] 为第一个 output token),从而修正 reasoning phase 状态机在读取最近 token 时的下标计算,并同步更新对应单测用例。

Changes:

  • update_reasoning_status_kernel 中将最近 4 个 token 的读取由 pre_ids_now[cur_step - k] 调整为 pre_ids_now[cur_step - 1 - k],并将状态转移门槛由 cur_step >= 3 修正为 cur_step >= 4
  • 更新 tests/operators/test_reasoning_phase_token_constraint.py 中构造的 token pattern 与注释说明,使其与新 step_idx/pre_ids_now 语义一致。

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
custom_ops/gpu_ops/reasoning_phase_token_constraint.cu 修正 reasoning 状态机读取最近 token 的索引与边界条件,匹配新 step_idx 语义
tests/operators/test_reasoning_phase_token_constraint.py 调整测试数据构造与说明,确保覆盖状态转移逻辑在新语义下的正确性

Comment on lines 34 to 44
// x = 1 : Generating "\n</think>\n\n"
// - Model is emitting the explicit boundary pattern
// - In non-MTP mode, accept_num is implicitly 1
// and does not need to be manually set
// - In MTP mode, accept_num must be 1 in verify kernel
//
// Transition condition (x = 1 -> x = 2):
// - step_idx >= 3
// - step_idx >= 4
// - pre_ids[-4:] exactly match:
// "\n</think>\n\n"
//
token_ids_all[1, 2] = self.think_end_id

self.token_ids_all = paddle.to_tensor(token_ids_all, dtype="int64")
self.prompt_lens = paddle.zeros([self.bs, 1], dtype="int64")
@freeliuzc freeliuzc merged commit e0a1653 into PaddlePaddle:develop Apr 14, 2026
40 of 42 checks passed
lonelygsh added a commit to lonelygsh/FastDeploy that referenced this pull request Apr 16, 2026
…el (PaddlePaddle#7349)

Co-authored-by: guanshihui] <guanshihui@baidu.com>
freeliuzc pushed a commit that referenced this pull request Apr 16, 2026
…7402, #7445 to release/online/20260415 (#7447)

* [Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166)

- speculate_limit_thinking_content_length: update current_base_step to
  step_idx+1 (step_idx now records history count before current round);
  remove incorrect step_idx decrement on accept_num truncation; mark
  step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
  step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
  formula (remove stale -accept_num offset); use <= condition so accept_idx
  maps directly to the accepted token that ends the stop sequence; fix
  accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.

* [Speculate Decoding] Fix bug of reasoning_phase_token_constraint kernel (#7349)

Co-authored-by: guanshihui] <guanshihui@baidu.com>

* [Speculate Decoding] Fix reasoning_phase_token_constraint call args in SpeculativeSampler (#7402)

* [Interrupt reasoning] Add interrupt_requests control command support

---------

Co-authored-by: guanshihui] <guanshihui@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants