[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

juncaipeng · 2025-11-20T11:52:54Z

Motivation

修复 PD 分离下调度问题，从 #5027 进行拆分。

Modifications

调度循环中注意释放cache block，避免block 不够
local_scheduler 拉取至少一条请求，避免一直拉不到请求

Usage or Command

没有变化

Accuracy Tests

单侧覆盖

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-20T11:53:04Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR fixes scheduling issues in PD disaggregation (splitwise deployment) by addressing two key problems: block resource management and request pulling logic.

Ensures cache blocks are freed at the start of each scheduling cycle to prevent resource exhaustion
Guarantees that the local scheduler pulls at least one request before applying token budget constraints

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
fastdeploy/scheduler/local_scheduler.py	Modified request pulling logic to ensure at least one request is retrieved before breaking on token budget limit
fastdeploy/engine/sched/resource_manager_v1.py	Added block table cleanup at the beginning of the schedule loop to free cache blocks when needed

Fix schedule error in splitwise deployment

8b512f7

Copilot AI review requested due to automatic review settings November 20, 2025 11:52

Copilot started reviewing on behalf of juncaipeng November 20, 2025 11:53 View session

Copilot finished reviewing on behalf of juncaipeng November 20, 2025 11:55

Copilot AI reviewed Nov 20, 2025

View reviewed changes

kevincheng2 approved these changes Nov 20, 2025

View reviewed changes

juncaipeng requested a review from Jiang-Jia-Jun November 20, 2025 13:17

Jiang-Jia-Jun approved these changes Nov 20, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 01c30f6 into PaddlePaddle:develop Nov 20, 2025
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

Uh oh!

juncaipeng commented Nov 20, 2025 •

edited by yuanlehome

Loading

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

Uh oh!

Conversation

juncaipeng commented Nov 20, 2025 • edited by yuanlehome Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juncaipeng commented Nov 20, 2025 •

edited by yuanlehome

Loading