Skip to content

Conversation

@freeliuzc
Copy link
Collaborator

@freeliuzc freeliuzc commented Aug 28, 2025

  • 支持 MTP 在 scheduler V1 下的正常推理
  • 具体:
    - 预取 2*MaxDraftToken + 2 位置
    - kvcacheSchedule 放到 proposer run 后面
    - 适应性修改多个 Kernel,新增 speculate_schedule_cache 管理投机解码 block;
    - 修改 draft_model_preprocess,同时支持 V0/V1,管理 MTP 的所有 block 状态,并优化 Kernel 逻辑
    - 修改 recover_decode_task,同时支持投机解码与非投机解码

@paddle-bot
Copy link

paddle-bot bot commented Aug 28, 2025

Thanks for your contribution!

self.model_inputs["block_tables"][idx : idx + 1, :encoder_block_num] = np.array(
request.block_tables, dtype="int32"
)
# if self.model_inputs["is_block_step"][idx]: # has tasks to continue to decode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这类注释需要保留吗
下同

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里保留注释的位置,是和陈坚在 Target Model 的 insert 有点区别,但我不确定是否和 PD 分离(未验证)或者 EP 有关系,希望先保留着;等后续验证完别的功能,再删除~

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里和 PD 分离没有关系,就是正常确认 worker 需不需要做下一个step 的判定逻辑。(防止空输入,也防止有输入却没有正常 step)

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@ccd52b5). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff            @@
##             develop   #3695   +/-   ##
=========================================
  Coverage           ?   8.42%           
=========================================
  Files              ?       6           
  Lines              ?      95           
  Branches           ?       8           
=========================================
  Hits               ?       8           
  Misses             ?      87           
  Partials           ?       0           
Flag Coverage Δ
diff 8.42% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rainyfly
rainyfly previously approved these changes Sep 1, 2025
self.model_inputs["block_tables"][idx : idx + 1, :encoder_block_num] = np.array(
request.block_tables, dtype="int32"
)
# if self.model_inputs["is_block_step"][idx]: # has tasks to continue to decode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里和 PD 分离没有关系,就是正常确认 worker 需不需要做下一个step 的判定逻辑。(防止空输入,也防止有输入却没有正常 step)

DDDivano
DDDivano previously approved these changes Sep 2, 2025
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 88d44a2 into PaddlePaddle:develop Sep 4, 2025
22 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants