-
Notifications
You must be signed in to change notification settings - Fork 660
[Feature][MTP]support mtp in v1_scheduler mode #3695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
| self.model_inputs["block_tables"][idx : idx + 1, :encoder_block_num] = np.array( | ||
| request.block_tables, dtype="int32" | ||
| ) | ||
| # if self.model_inputs["is_block_step"][idx]: # has tasks to continue to decode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这类注释需要保留吗
下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里保留注释的位置,是和陈坚在 Target Model 的 insert 有点区别,但我不确定是否和 PD 分离(未验证)或者 EP 有关系,希望先保留着;等后续验证完别的功能,再删除~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里和 PD 分离没有关系,就是正常确认 worker 需不需要做下一个step 的判定逻辑。(防止空输入,也防止有输入却没有正常 step)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #3695 +/- ##
=========================================
Coverage ? 8.42%
=========================================
Files ? 6
Lines ? 95
Branches ? 8
=========================================
Hits ? 8
Misses ? 87
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| self.model_inputs["block_tables"][idx : idx + 1, :encoder_block_num] = np.array( | ||
| request.block_tables, dtype="int32" | ||
| ) | ||
| # if self.model_inputs["is_block_step"][idx]: # has tasks to continue to decode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里和 PD 分离没有关系,就是正常确认 worker 需不需要做下一个step 的判定逻辑。(防止空输入,也防止有输入却没有正常 step)
2c415cb to
ce99dbe
Compare
- 预取 2*MaxDraftToken + 2 位置
- kvcacheSchedule 放到 proposer run 后面
- 适应性修改多个 Kernel,新增 speculate_schedule_cache 管理投机解码 block;
- 修改 draft_model_preprocess,同时支持 V0/V1,管理 MTP 的所有 block 状态,并优化 Kernel 逻辑
- 修改 recover_decode_task,同时支持投机解码与非投机解码