[Optim][Cherry-pick] Reduce preemption occurrence when blocks not enough(#5696) by rainyfly · Pull Request #5808 · PaddlePaddle/FastDeploy

rainyfly · 2025-12-29T02:34:58Z

Motivation

当解码 block不足时，调度会抢占正在解码的请求，释放对应的 block 资源。并分配给剩余的解码请求。
之前的调度逻辑，在 waiting 队列里有请求的时候，发现剩余的 block可以容纳下一条新请求部分的 chunk（new_token_num)，就会将其调度回去做 prefill。在 block已经严重不足的时候，会造成反复抢占->调度 prefill->抢占->调度 prefill的重调度行为，造成性能下降。
为了解决这一问题，在调度新请求做 prefill 时，考虑给正在解码的请求预留部分 block，只有在给正在解码的每条请求所预留的 block剔除后，并且剩余的 block 还可以容纳整条当前需要 prefill 的请求，才把请求从 waiting 队列里调度出来做 prefill。

Modifications

新增环境变量：FD_RESERVE_OUTPUT_BLOCK_NUM_FOR_DECODE_WHEN_SCHEDULE_NEW_PREFILL
含义：从 waiting 队列里调度新请求做 prefill 时，需要给每条正在解码的请求预留的 block 数量，默认为 16

Usage or Command

v1 下默认使用

Accuracy Tests

None

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-29T02:35:04Z

Thanks for your contribution!

codecov-commenter · 2025-12-29T03:58:48Z

Codecov Report

❌ Patch coverage is 29.16667% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.4@fb59f56). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/sched/resource_manager_v1.py	29.16%	17 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.4    #5808   +/-   ##
==============================================
  Coverage               ?   58.81%           
==============================================
  Files                  ?      329           
  Lines                  ?    40836           
  Branches               ?     6221           
==============================================
  Hits                   ?    24019           
  Misses                 ?    14945           
  Partials               ?     1872

Flag	Coverage Δ
GPU	`58.81% <29.16%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ease24

Copilot

Pull request overview

这是一个从#5696 cherry-pick过来的优化PR，旨在减少在block资源不足时的抢占发生频率。该PR引入了一个预留块（reserve block）机制，在调度新的prefill请求时，会为正在解码的请求预留部分block资源，避免反复发生"抢占->调度prefill->抢占"的重调度行为。

主要变更：

添加了三个新的环境变量来控制预留块机制，包括初始预留数、衰减率和最小预留数
修改了调度逻辑，在检查是否可以调度新的prefill请求时，会考虑为当前运行中的decode请求预留block
实现了预留块数量的衰减机制，在正常调度时逐渐减少预留，在发生抢占时重置为初始值

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File	Description
fastdeploy/envs.py	添加了三个新的环境变量用于配置预留块机制的参数（初始值、衰减率、最小值）
fastdeploy/engine/sched/resource_manager_v1.py	在ResourceManagerV1的初始化中添加了预留块相关的实例变量；修改了schedule()方法中的调度逻辑，在检查是否可以调度新prefill时计入预留块；在_trigger_preempt()中添加了预留块重置逻辑；在每次schedule()结束时实现预留块的衰减逻辑

fastdeploy/envs.py

fastdeploy/engine/sched/resource_manager_v1.py

fastdeploy/envs.py

fastdeploy/engine/sched/resource_manager_v1.py

into optimize_scheduler_for_preemption_release24

[Optim] Reduce preemption occurrence when blocks not enough

00c00d8

rainyfly had a problem deploying to Metax_ci December 29, 2025 02:35 — with GitHub Actions Failure

optimize performance using adaptive block reservation

4242162

rainyfly had a problem deploying to Metax_ci January 4, 2026 12:33 — with GitHub Actions Failure

Jiang-Jia-Jun requested a review from Copilot January 4, 2026 12:41

Merge branch 'release/2.4' into optimize_scheduler_for_preemption_rel…

1d4ee09

…ease24

Jiang-Jia-Jun had a problem deploying to Metax_ci January 4, 2026 12:42 — with GitHub Actions Failure

Copilot started reviewing on behalf of Jiang-Jia-Jun January 4, 2026 12:42 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

Merge branch 'release/2.4' of https://github.com/PaddlePaddle/FastDeploy

4eb84ab

into optimize_scheduler_for_preemption_release24

rainyfly had a problem deploying to Metax_ci January 5, 2026 09:52 — with GitHub Actions Failure

optimize performance

9eb3674

rainyfly had a problem deploying to Metax_ci January 5, 2026 16:27 — with GitHub Actions Failure

fix

d8501ce

rainyfly had a problem deploying to Metax_ci January 6, 2026 06:30 — with GitHub Actions Failure

fix

431b97f

rainyfly had a problem deploying to Metax_ci January 6, 2026 06:37 — with GitHub Actions Failure

Merge branch 'release/2.4' of https://github.com/PaddlePaddle/FastDeploy

081264f

into optimize_scheduler_for_preemption_release24

rainyfly had a problem deploying to Metax_ci January 7, 2026 10:13 — with GitHub Actions Failure

Jiang-Jia-Jun merged commit 1e8de96 into PaddlePaddle:release/2.4 Jan 8, 2026
13 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optim][Cherry-pick] Reduce preemption occurrence when blocks not enough(#5696)#5808

[Optim][Cherry-pick] Reduce preemption occurrence when blocks not enough(#5696)#5808
Jiang-Jia-Jun merged 8 commits intoPaddlePaddle:release/2.4from
rainyfly:optimize_scheduler_for_preemption_release24

rainyfly commented Dec 29, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Dec 29, 2025

Uh oh!

codecov-commenter commented Dec 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rainyfly commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 29, 2025

Uh oh!

codecov-commenter commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rainyfly commented Dec 29, 2025 •

edited

Loading

codecov-commenter commented Dec 29, 2025 •

edited

Loading