Skip to content

[BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors#6531

Merged
Jiang-Jia-Jun merged 4 commits intoPaddlePaddle:developfrom
kevincheng2:fix_reset_cache_bug_dev
Mar 2, 2026
Merged

[BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors#6531
Jiang-Jia-Jun merged 4 commits intoPaddlePaddle:developfrom
kevincheng2:fix_reset_cache_bug_dev

Conversation

@kevincheng2
Copy link
Collaborator

@kevincheng2 kevincheng2 commented Feb 27, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

  • Add prefix tree status check before recycling GPU blocks to skip during tree clearing
  • Validate gpu_block_ids input to ensure it's a list
  • Add overflow check to prevent free block count from exceeding total GPU blocks, avoiding potential memory allocation errors

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…ocation errors

- Check prefix tree status before recycling GPU blocks
- Validate gpu_block_ids is a list
- Add overflow check to prevent free block count exceeding total blocks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link

paddle-bot bot commented Feb 27, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8e67fb4). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/cache_manager/prefix_cache_manager.py 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6531   +/-   ##
==========================================
  Coverage           ?   70.40%           
==========================================
  Files              ?      394           
  Lines              ?    53869           
  Branches           ?     8466           
==========================================
  Hits               ?    37927           
  Misses             ?    13210           
  Partials           ?     2732           
Flag Coverage Δ
GPU 70.40% <66.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…atus_signal not initialized

- Add hasattr check before accessing prefix_tree_status_signal
- The signal is only initialized in launch_cache_messager, not in __init__
- Fixes CI test failure in test_prefix_cache_manager.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Call self.reset() before setting status to NORMAL in UPDATING state
- Ensure cache consistency when model weights change
- Consistent with CLEARING state handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Jiang-Jia-Jun pushed a commit that referenced this pull request Mar 2, 2026
…refix_tree_status_signal not initialized(#6531) (#6559)

* fix mtp acceptance rate decline

* [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized

- Add hasattr check before accessing prefix_tree_status_signal
- The signal is only initialized in launch_cache_messager, not in __init__
- Fixes CI test failure in test_prefix_cache_manager.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix] Reset prefix cache when model weights are updating

- Call self.reset() before setting status to NORMAL in UPDATING state
- Ensure cache consistency when model weights change
- Consistent with CLEARING state handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Jiang-Jia-Jun pushed a commit that referenced this pull request Mar 2, 2026
…ent block allocation errors(#6531) (#6530)

* fix mtp acceptance rate decline cp

* [BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors

- Check prefix tree status before recycling GPU blocks
- Validate gpu_block_ids is a list
- Add overflow check to prevent free block count exceeding total blocks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized

- Add hasattr check before accessing prefix_tree_status_signal
- The signal is only initialized in launch_cache_messager, not in __init__
- Fixes CI test failure in test_prefix_cache_manager.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix] Reset prefix cache when model weights are updating

- Call self.reset() before setting status to NORMAL in UPDATING state
- Ensure cache consistency when model weights change
- Consistent with CLEARING state handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit ecfd088 into PaddlePaddle:develop Mar 2, 2026
20 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants