[offloader] fix async scheduling support with KV cache offloader#1596
Conversation
Signed-off-by: AlpinDale <alpindale@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a crash in the KV cache offloader during asynchronous scheduling by removing an assertion. My review suggests reintroducing this assertion conditionally. This change ensures that the safety check remains active for synchronous operations, thereby enhancing code robustness, while still accommodating the specific requirements of asynchronous scheduling that led to the original issue.
| # NOTE: In async scheduling, placeholders may temporarily make | ||
| # len(req.block_hashes) < num_blocks * self.block_size_factor. |
There was a problem hiding this comment.
While removing the assertion fixes the crash with async scheduling, it also removes a valuable safety check that can catch other potential bugs. A safer approach would be to make the assertion conditional, so it only applies to requests that are not currently undergoing an asynchronous KV cache load. This preserves the safeguard for synchronous cases.
# The assertion is skipped for requests with an ongoing async load.
if req_id not in self._reqs_being_loaded:
num_gpu_blocks = num_blocks * self.block_size_factor
assert len(req.block_hashes) >= num_gpu_blocks
No description provided.