Skip to content

Conversation

@juncaipeng
Copy link
Collaborator

@juncaipeng juncaipeng commented Nov 14, 2025

Motivation

decode use cpu buffer to receive cache from prefill

Modifications

  • add create_pinned_shm and open_pinned_shm
  • cache_messager and cache_transfer_manager support splitwise cpu cache buffer
  • resource_manager_v1 and prefix_cache_manager.py support splitwise cpu cache buffer
  • fix error of local_scheduler

Usage or Command

Decode can set --splitwise-cache-buffer-size 10 args

Accuracy Tests

TODO

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 14, 2025 08:40
@paddle-bot
Copy link

paddle-bot bot commented Nov 14, 2025

Thanks for your contribution!

Copilot finished reviewing on behalf of juncaipeng November 14, 2025 08:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a feature for decode instances to use CPU buffer to receive cache from prefill in PD (Prefill-Decode) disaggregation deployments. This optimization allows the decode instance to buffer incoming cache data in CPU memory before swapping it to GPU, potentially improving resource utilization and system throughput.

Key Changes

  • Refactored cache information sending logic to separate prefill-to-messager and decode-to-prefill communication paths
  • Added CPU buffer allocation and management for splitwise cache in decode mode via the --splitwise-cache-buffer-size parameter
  • Implemented CPU-to-GPU cache swapping mechanism for buffered cache data
  • Updated resource management to support pre-allocation and deferred GPU resource assignment for prefilled requests

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
fastdeploy/splitwise/splitwise_connector.py Refactored send_cache_infos into two separate methods: send_cache_info_to_messager for prefill and send_cache_info_to_prefill for decode
fastdeploy/engine/sched/resource_manager_v1.py Added splitwise CPU buffer support with preallocated_reqs tracking, pre_recycle_resource, and add_prefilled_request methods
fastdeploy/engine/common_engine.py Updated task insertion and prefilled request processing to support CPU buffer workflow
fastdeploy/cache_manager/cache_messager.py Implemented CPU buffer allocation and CPU-to-GPU swap thread for decode instances
fastdeploy/cache_manager/prefix_cache_manager.py Added splitwise CPU buffer management APIs including allocate/recycle/swap operations
fastdeploy/engine/args_utils.py Added --splitwise-cache-buffer-size parameter with validation for decode mode only
fastdeploy/config.py Added cache buffer configuration with block number calculation
fastdeploy/cache_manager/utils.py Extracted cache byte size and dtype conversion logic into reusable utility functions
fastdeploy/cache_manager/cache_data.py Added SPLITWISE_CPU2GPU cache status enum value
examples/splitwise/*.sh Updated example scripts with port checking utilities and improved consistency
examples/splitwise/README.md Added documentation for running splitwise examples

@juncaipeng juncaipeng requested a review from Copilot November 14, 2025 09:04
Copilot finished reviewing on behalf of juncaipeng November 14, 2025 09:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 17 comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant