[PD Disaggregation] decode use cpu buffer to receive cache from prefill #5027

juncaipeng · 2025-11-14T08:40:17Z

Motivation

decode use cpu buffer to receive cache from prefill

Modifications

add create_pinned_shm and open_pinned_shm
cache_messager and cache_transfer_manager support splitwise cpu cache buffer
resource_manager_v1 and prefix_cache_manager.py support splitwise cpu cache buffer
fix error of local_scheduler

Usage or Command

Decode can set --splitwise-cache-buffer-size 10 args

Accuracy Tests

TODO

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-14T08:40:24Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR implements a feature for decode instances to use CPU buffer to receive cache from prefill in PD (Prefill-Decode) disaggregation deployments. This optimization allows the decode instance to buffer incoming cache data in CPU memory before swapping it to GPU, potentially improving resource utilization and system throughput.

Key Changes

Refactored cache information sending logic to separate prefill-to-messager and decode-to-prefill communication paths
Added CPU buffer allocation and management for splitwise cache in decode mode via the --splitwise-cache-buffer-size parameter
Implemented CPU-to-GPU cache swapping mechanism for buffered cache data
Updated resource management to support pre-allocation and deferred GPU resource assignment for prefilled requests

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`fastdeploy/splitwise/splitwise_connector.py`	Refactored `send_cache_infos` into two separate methods: `send_cache_info_to_messager` for prefill and `send_cache_info_to_prefill` for decode
`fastdeploy/engine/sched/resource_manager_v1.py`	Added splitwise CPU buffer support with `preallocated_reqs` tracking, `pre_recycle_resource`, and `add_prefilled_request` methods
`fastdeploy/engine/common_engine.py`	Updated task insertion and prefilled request processing to support CPU buffer workflow
`fastdeploy/cache_manager/cache_messager.py`	Implemented CPU buffer allocation and CPU-to-GPU swap thread for decode instances
`fastdeploy/cache_manager/prefix_cache_manager.py`	Added splitwise CPU buffer management APIs including allocate/recycle/swap operations
`fastdeploy/engine/args_utils.py`	Added `--splitwise-cache-buffer-size` parameter with validation for decode mode only
`fastdeploy/config.py`	Added cache buffer configuration with block number calculation
`fastdeploy/cache_manager/utils.py`	Extracted cache byte size and dtype conversion logic into reusable utility functions
`fastdeploy/cache_manager/cache_data.py`	Added `SPLITWISE_CPU2GPU` cache status enum value
`examples/splitwise/*.sh`	Updated example scripts with port checking utilities and improved consistency
`examples/splitwise/README.md`	Added documentation for running splitwise examples

fastdeploy/cache_manager/prefix_cache_manager.py

fastdeploy/engine/common_engine.py

fastdeploy/engine/sched/resource_manager_v1.py

examples/splitwise/start_v0_tp1.sh

fastdeploy/engine/common_engine.py

examples/splitwise/utils.sh

examples/splitwise/start_v1_tp1.sh

examples/splitwise/README.md

fastdeploy/cache_manager/cache_messager.py

Copilot

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 17 comments.

fastdeploy/cache_manager/cache_messager.py

fastdeploy/engine/sched/resource_manager_v1.py

fastdeploy/cache_manager/prefix_cache_manager.py

examples/splitwise/README.md

fastdeploy/cache_manager/prefix_cache_manager.py

fastdeploy/cache_manager/cache_data.py

fastdeploy/splitwise/splitwise_connector.py

fastdeploy/engine/sched/resource_manager_v1.py

fastdeploy/engine/common_engine.py

fastdeploy/config.py

Copilot AI review requested due to automatic review settings November 14, 2025 08:40

Copilot started reviewing on behalf of juncaipeng November 14, 2025 08:40 View session

Copilot finished reviewing on behalf of juncaipeng November 14, 2025 08:42

Copilot AI reviewed Nov 14, 2025

View reviewed changes

juncaipeng requested a review from Copilot November 14, 2025 09:04

Copilot started reviewing on behalf of juncaipeng November 14, 2025 09:04 View session

Copilot finished reviewing on behalf of juncaipeng November 14, 2025 09:08

Copilot AI reviewed Nov 14, 2025

View reviewed changes

juncaipeng force-pushed the pd_2 branch from edc2900 to 8ac1012 Compare November 18, 2025 08:42

juncaipeng requested review from Jiang-Jia-Jun and rainyfly November 18, 2025 09:39

juncaipeng force-pushed the pd_2 branch from acd64b4 to db52bea Compare November 18, 2025 10:57

juncaipeng added 9 commits November 20, 2025 01:57

decode use cpu buffer to receive cache from prefill

dd568d8

fix

b8b3d33

reuse the cache_transfer_manager

5fcce7c

fix

33b6eac

add unit test

5aac621

add test

f0b1427

fix test

5bf512b

fix test

4b3cee2

fix ce test

0ab7eae

juncaipeng force-pushed the pd_2 branch from fe2d045 to 0ab7eae Compare November 20, 2025 04:11

fix ce test

245887c

Copilot AI mentioned this pull request Nov 20, 2025

[PD Disaggregation] [Refine] Refine splitwise deployment #5151

Merged

5 tasks

yuanlehome mentioned this pull request Nov 20, 2025

[BugFix] [PD Disaggregation] Fix schedule error in splitwise deployment #5149

Merged

5 tasks

[PD Disaggregation] decode use cpu buffer to receive cache from prefill #5027

Are you sure you want to change the base?

[PD Disaggregation] decode use cpu buffer to receive cache from prefill #5027

Uh oh!

Conversation

juncaipeng commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juncaipeng commented Nov 14, 2025 •

edited

Loading