[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141)#7181
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-03 15:33 CST
📋 Review 摘要
PR 概述:Cherry-pick scheduler slot 计算修复到 release/2.6 分支
变更范围:resource_manager_v1.py(调度器)、common_engine.py(引擎)、mooncake_store.py(缓存)
影响面 Tag:Scheduler Engine
📝 PR 规范检查
PR 描述与实际变更不一致:
- 描述中提到:只修改
resource_manager_v1.py - 实际变更:还包含
common_engine.py和mooncake_store.py的修改
请确认这些额外变更是否应该包含在此 Cherry-pick PR 中,如果不是,建议分离到单独的 PR。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | common_engine.py:1155 |
异常处理逻辑变更导致 shutdown 行为改变,且会导致现有测试失败 |
总体评价
resource_manager_v1.py 的核心修复逻辑正确,正确地将 to_be_rescheduled、to_be_aborted 和 PREEMPTED 状态的请求计入 slot 占用检查。但 common_engine.py 的变更存在问题,将原本的优雅 shutdown 处理改为直接抛出异常,这与现有测试 test_schedule_request_to_worker_v1_threadpool_shutdown_breaks 的预期行为冲突,需要修复或确认是否为有意变更。
| except RuntimeError as e: | ||
| if "cannot schedule new futures after shutdown" in str(e): | ||
| break | ||
| raise e |
There was a problem hiding this comment.
🔴 Bug 异常处理逻辑变更导致行为不兼容
问题分析:
- 原代码:捕获
RuntimeError,如果是"cannot schedule new futures after shutdown"则优雅退出循环 (break) - 新代码:直接
raise e,会导致异常向上传播
影响:
- 现有测试
test_schedule_request_to_worker_v1_threadpool_shutdown_breaks期望此异常被优雅处理(不抛出),此变更会导致测试失败 - 生产环境中 ThreadPool shutdown 时的行为会从「优雅退出」变为「异常抛出」
建议:
此变更与 PR 描述的 "scheduler slot-accounting fix" 目标无关,疑似为 cherry-pick 过程中的冲突解决引入。请确认:
- 如果是有意变更,需同步更新测试用例
- 如果是误引入,建议恢复原逻辑:
except RuntimeError as e:
if "cannot schedule new futures after shutdown" in str(e):
break
raise e
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7181 +/- ##
==============================================
Coverage ? 73.75%
==============================================
Files ? 376
Lines ? 52886
Branches ? 8249
==============================================
Hits ? 39007
Misses ? 11152
Partials ? 2727
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
55dbc83
into
PaddlePaddle:release/2.6
…thout a slot(PaddlePaddle#7141) (PaddlePaddle#7181) * [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (PaddlePaddle#7163) * Set MC_MAX_MR_SIZE to avoid register hang * up * [fix] prevent requests from entering running state without a slot * [fix] count abort set * [fix] count preempted task in waiting list --------- Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>
…state without a slot(PaddlePaddle#7141) (PaddlePaddle#7181)" This reverts commit 80f4a72.
modify test modify test support empty tensor and modify test fix test_linear config issues modify test name add edge test case modify format fix conflict modify default max token num in trtllm_allreduce_fusion add max token num branch for trtllm_allreduce_fusion fix format fix rmsnorm config issue modify 2025 to 2026 enable trtllm_allreduce fusion Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (PaddlePaddle#7186) (PaddlePaddle#7195)" This reverts commit ca2f38b. Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(PaddlePaddle#7141) (PaddlePaddle#7181)" This reverts commit 80f4a72. clean flashinfer cache and modify test fix dumpy patch issue fix some issues
modify test modify test support empty tensor and modify test fix test_linear config issues modify test name add edge test case modify format fix conflict modify default max token num in trtllm_allreduce_fusion add max token num branch for trtllm_allreduce_fusion fix format fix rmsnorm config issue modify 2025 to 2026 enable trtllm_allreduce fusion Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (PaddlePaddle#7186) (PaddlePaddle#7195)" This reverts commit ca2f38b. Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(PaddlePaddle#7141) (PaddlePaddle#7181)" This reverts commit 80f4a72. clean flashinfer cache and modify test fix dumpy patch issue fix some issues
… glm model (#6660) (#7228) * enable trtllm_all_reduce fusion kernel in glm model * update flashinfer paddle version * format update modify test modify test support empty tensor and modify test fix test_linear config issues modify test name add edge test case modify format fix conflict modify default max token num in trtllm_allreduce_fusion add max token num branch for trtllm_allreduce_fusion fix format fix rmsnorm config issue modify 2025 to 2026 enable trtllm_allreduce fusion Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7186) (#7195)" This reverts commit ca2f38b. Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141) (#7181)" This reverts commit 80f4a72. clean flashinfer cache and modify test fix dumpy patch issue fix some issues * remove redundent * enable moe reduce fusion * fix test * fix cuda context issue * update flashinfer version
Motivation
Cherry-pick the waiting-list preempted-task counting fix from develop PR #7141 into release/2.6.
This cherry-pick keeps the release branch aligned with the scheduler slot-accounting fix that prevents requests from being admitted when effective occupied slots have already reached max_num_seqs.
Modifications
Usage or Command
No new usage is introduced.
Validation performed during git cherry-pick --continue via pre-commit hooks:
Accuracy Tests
This change only affects scheduler accounting logic and does not change model forward, kernel logic, or numerical outputs.
No accuracy impact is expected.
Checklist