fix MTP bugs in TP and overlap#7172
Conversation
|
Thanks for your contribution! |
| model_output.mp_rank, | ||
| save_each_rank, | ||
| ) | ||
| share_inputs["last_preempted_idx"][:] = 0 |
There was a problem hiding this comment.
不能提前return,不然这一行执行不到,重调度可能会有问题
There was a problem hiding this comment.
那只能在save_output前面return?
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-03 11:07 CST
📋 Review 摘要
PR 概述:修复 MTP(Multi-Token Prediction)在 TP(Tensor Parallelism)和 overlap scheduling 场景下的 bug
变更范围:model_executor/pre_and_post_process.py、worker/gpu_model_runner.py
影响面 Tag:[Speculative Decoding] [Engine]
📝 PR 规范检查
PR 标题缺少官方 Tag,描述中 Motivation/Modifications 部分未填写。
标题建议(可直接复制):
[BugFix][Speculative Decoding] fix MTP bugs in TP and overlap
描述建议(可直接复制):
## Motivation
修复 MTP 在 TP > 1 和 overlap scheduling 场景下的两个问题:
1. 在 TP 场景下,非 rank 0 的进程重复发送 sampling 输出到消息队列
2. overlap scheduling 中预测下一批次 token 数量的计算逻辑错误
## Modifications
1. `save_output_specualate`: 添加 mp_rank 检查,跳过非 rank 0 的进程
2. `_predict_next_launch_token_num`: 修正 token 数量预测公式,使用 batch size 乘以每步 token 数
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | pre_and_post_process.py:534 |
提前 return 导致 last_preempted_idx 未被清零 |
总体评价
PR 的修复方向正确,但 save_output_specualate 的早期返回会跳过 last_preempted_idx 的清理操作,可能导致 TP > 1 场景下的状态脏数据问题,需要修复。
| # NOTE(yaohuicong): Skip non-zero TP ranks — they share identical sampling | ||
| # outputs, so only rank 0 needs to send results to the message queue. | ||
| if model_output.mp_rank > 0: | ||
| return |
There was a problem hiding this comment.
🔴 Bug 提前 return 会跳过第 592 行的 share_inputs["last_preempted_idx"][:] = 0 清理操作
在 TP > 1 场景下,非 rank 0 的进程不会清理 last_preempted_idx,可能导致后续批次使用脏数据。
建议修复方式:将清理操作移到函数开头,或在 return 前执行清理:
if model_output.mp_rank > 0:
share_inputs["last_preempted_idx"][:] = 0
return
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7172 +/- ##
==========================================
Coverage ? 74.19%
==========================================
Files ? 376
Lines ? 53297
Branches ? 8525
==========================================
Hits ? 39545
Misses ? 11052
Partials ? 2700
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run ci_xpu |
|
/re-run approval |
Motivation
修复 MTP 在 TP > 1 和 overlap scheduling 场景下的两个问题:
Modifications
save_output_specualate: 添加 mp_rank 检查,跳过非 rank 0 的进程_predict_next_launch_token_num: 修正 token 数量预测公式,使用 batch size 乘以每步 token 数Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.