Skip to content

[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141)#7181

Merged
Jiang-Jia-Jun merged 5 commits into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260403_fix_schedule
Apr 3, 2026
Merged

[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141)#7181
Jiang-Jia-Jun merged 5 commits into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260403_fix_schedule

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

Motivation

Cherry-pick the waiting-list preempted-task counting fix from develop PR #7141 into release/2.6.

This cherry-pick keeps the release branch aligned with the scheduler slot-accounting fix that prevents requests from being admitted when effective occupied slots have already reached max_num_seqs.

Modifications

  • Update fastdeploy/engine/sched/resource_manager_v1.py
  • Count RequestStatus.PREEMPTED requests in self.waiting when checking max_num_seqs
  • Preserve the existing release/2.6 slot accounting for running, abort-pending, and reschedule-pending requests

Usage or Command

No new usage is introduced.

Validation performed during git cherry-pick --continue via pre-commit hooks:

  • black
  • isort
  • flake8
  • ruff
  • check for merge conflicts
  • fix end of files
  • trim trailing whitespace
  • detect private key
  • check for added large files

Accuracy Tests

This change only affects scheduler accounting logic and does not change model forward, kernel logic, or numerical outputs.

No accuracy impact is expected.

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. No dedicated unit test is added because this is a targeted scheduler guard fix being cherry-picked to the release branch.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 3, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-03 15:33 CST

📋 Review 摘要

PR 概述:Cherry-pick scheduler slot 计算修复到 release/2.6 分支
变更范围:resource_manager_v1.py(调度器)、common_engine.py(引擎)、mooncake_store.py(缓存)
影响面 TagScheduler Engine

📝 PR 规范检查

PR 描述与实际变更不一致:

  • 描述中提到:只修改 resource_manager_v1.py
  • 实际变更:还包含 common_engine.pymooncake_store.py 的修改

请确认这些额外变更是否应该包含在此 Cherry-pick PR 中,如果不是,建议分离到单独的 PR。

问题

级别 文件 概述
🔴 Bug common_engine.py:1155 异常处理逻辑变更导致 shutdown 行为改变,且会导致现有测试失败

总体评价

resource_manager_v1.py 的核心修复逻辑正确,正确地将 to_be_rescheduledto_be_abortedPREEMPTED 状态的请求计入 slot 占用检查。但 common_engine.py 的变更存在问题,将原本的优雅 shutdown 处理改为直接抛出异常,这与现有测试 test_schedule_request_to_worker_v1_threadpool_shutdown_breaks 的预期行为冲突,需要修复或确认是否为有意变更。

except RuntimeError as e:
if "cannot schedule new futures after shutdown" in str(e):
break
raise e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 异常处理逻辑变更导致行为不兼容

问题分析

  • 原代码:捕获 RuntimeError,如果是 "cannot schedule new futures after shutdown" 则优雅退出循环 (break)
  • 新代码:直接 raise e,会导致异常向上传播

影响

  1. 现有测试 test_schedule_request_to_worker_v1_threadpool_shutdown_breaks 期望此异常被优雅处理(不抛出),此变更会导致测试失败
  2. 生产环境中 ThreadPool shutdown 时的行为会从「优雅退出」变为「异常抛出」

建议
此变更与 PR 描述的 "scheduler slot-accounting fix" 目标无关,疑似为 cherry-pick 过程中的冲突解决引入。请确认:

  1. 如果是有意变更,需同步更新测试用例
  2. 如果是误引入,建议恢复原逻辑:
except RuntimeError as e:
    if "cannot schedule new futures after shutdown" in str(e):
        break
    raise e

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@b24765a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
.../transfer_factory/mooncake_store/mooncake_store.py 0.00% 1 Missing ⚠️
fastdeploy/engine/common_engine.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7181   +/-   ##
==============================================
  Coverage               ?   73.75%           
==============================================
  Files                  ?      376           
  Lines                  ?    52886           
  Branches               ?     8249           
==============================================
  Hits                   ?    39007           
  Misses                 ?    11152           
  Partials               ?     2727           
Flag Coverage Δ
GPU 73.75% <33.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 55dbc83 into PaddlePaddle:release/2.6 Apr 3, 2026
34 of 38 checks passed
BingooYang pushed a commit to BingooYang/FastDeploy that referenced this pull request Apr 11, 2026
…thout a slot(PaddlePaddle#7141) (PaddlePaddle#7181)

* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (PaddlePaddle#7163)

* Set MC_MAX_MR_SIZE to avoid register hang

* up

* [fix] prevent requests from entering running state without a slot

* [fix] count abort set

* [fix] count preempted task in waiting list

---------

Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>
BingooYang added a commit to BingooYang/FastDeploy that referenced this pull request Apr 11, 2026
BingooYang added a commit to BingooYang/FastDeploy that referenced this pull request Apr 28, 2026
modify test

modify test

support empty tensor and modify test

fix test_linear config issues

modify test name

add edge test case

modify format

fix conflict

modify default max token num in trtllm_allreduce_fusion

add max token num branch for trtllm_allreduce_fusion

fix format

fix rmsnorm config issue

modify 2025 to 2026

enable trtllm_allreduce fusion

Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (PaddlePaddle#7186) (PaddlePaddle#7195)"

This reverts commit ca2f38b.

Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(PaddlePaddle#7141) (PaddlePaddle#7181)"

This reverts commit 80f4a72.

clean flashinfer cache and modify test

fix dumpy patch issue

fix some issues
BingooYang added a commit to BingooYang/FastDeploy that referenced this pull request May 12, 2026
modify test

modify test

support empty tensor and modify test

fix test_linear config issues

modify test name

add edge test case

modify format

fix conflict

modify default max token num in trtllm_allreduce_fusion

add max token num branch for trtllm_allreduce_fusion

fix format

fix rmsnorm config issue

modify 2025 to 2026

enable trtllm_allreduce fusion

Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (PaddlePaddle#7186) (PaddlePaddle#7195)"

This reverts commit ca2f38b.

Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(PaddlePaddle#7141) (PaddlePaddle#7181)"

This reverts commit 80f4a72.

clean flashinfer cache and modify test

fix dumpy patch issue

fix some issues
K11OntheBoat pushed a commit that referenced this pull request May 12, 2026
… glm model (#6660) (#7228)

* enable trtllm_all_reduce fusion kernel in glm model

* update flashinfer paddle version

* format update

modify test

modify test

support empty tensor and modify test

fix test_linear config issues

modify test name

add edge test case

modify format

fix conflict

modify default max token num in trtllm_allreduce_fusion

add max token num branch for trtllm_allreduce_fusion

fix format

fix rmsnorm config issue

modify 2025 to 2026

enable trtllm_allreduce fusion

Revert "[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7186) (#7195)"

This reverts commit ca2f38b.

Revert "[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141) (#7181)"

This reverts commit 80f4a72.

clean flashinfer cache and modify test

fix dumpy patch issue

fix some issues

* remove redundent

* enable moe reduce fusion

* fix test

* fix cuda context issue

* update flashinfer version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants