Skip to content

[offloader] fix async scheduling support with KV cache offloader#1596

Merged
AlpinDale merged 1 commit into
mainfrom
offloader-async
Nov 4, 2025
Merged

[offloader] fix async scheduling support with KV cache offloader#1596
AlpinDale merged 1 commit into
mainfrom
offloader-async

Conversation

@AlpinDale

Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: AlpinDale <alpindale@gmail.com>
@AlpinDale AlpinDale merged commit 7016e79 into main Nov 4, 2025
1 check passed
@AlpinDale AlpinDale deleted the offloader-async branch November 4, 2025 11:01

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash in the KV cache offloader during asynchronous scheduling by removing an assertion. My review suggests reintroducing this assertion conditionally. This change ensures that the safety check remains active for synchronous operations, thereby enhancing code robustness, while still accommodating the specific requirements of asynchronous scheduling that led to the original issue.

Comment on lines +280 to +281
# NOTE: In async scheduling, placeholders may temporarily make
# len(req.block_hashes) < num_blocks * self.block_size_factor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While removing the assertion fixes the crash with async scheduling, it also removes a valuable safety check that can catch other potential bugs. A safer approach would be to make the assertion conditional, so it only applies to requests that are not currently undergoing an asynchronous KV cache load. This preserves the safeguard for synchronous cases.

            # The assertion is skipped for requests with an ongoing async load.
            if req_id not in self._reqs_being_loaded:
                num_gpu_blocks = num_blocks * self.block_size_factor
                assert len(req.block_hashes) >= num_gpu_blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant