-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) #19728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
457540f
to
a34e6ac
Compare
This pull request has merge conflicts that must be resolved before it can be |
a34e6ac
to
15001b4
Compare
This pull request has merge conflicts that must be resolved before it can be |
15001b4
to
914876e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me but would be good to get approval from @heheda12345
and @WoosukKwon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My major concern is that the block_hash
list is updated here
vllm/vllm/v1/core/block_pool.py
Line 177 in d3f05c9
block_hashes.append(block_hash) |
It is quite far from
Request
class. It will make the number of element in the list magic to connectors, and hurt the modularity of the code.Therefore, if you want to put it into
Request
class, can you move this logic from "update when block hash is needed" to "update when token ids of the request are updated"?
Today, I'm trying to interpret your suggestion "update when token ids of the request are updated". |
914876e
to
234f334
Compare
Yes I mean Lines 223 to 240 in 72d14d0
|
@heheda12345 Consider the following scenario:
So to sum-up, because of the delay in setting of |
But I think the hash of the prompt can be initialized when the request is created |
ac256d3
to
908cb7b
Compare
1b6c4b5
to
7403762
Compare
This pull request has merge conflicts that must be resolved before it can be |
vllm/utils/__init__.py
Outdated
@@ -3236,6 +3236,25 @@ def sha256_cbor_64bit(input) -> int: | |||
return full_hash & ((1 << 64) - 1) | |||
|
|||
|
|||
@lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a cache. This function should only be called once during initialization.
vllm/v1/core/sched/scheduler.py
Outdated
block_hash = hash_block_tokens(self.caching_hash_fn, | ||
prev_block_hash_value, | ||
block_tokens, extra_keys) | ||
request.block_hashes.append(block_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to add the hash function as an attribute of a request? Then we can update the hash during constructing the request and append_output_token_ids
7403762
to
8d48ef9
Compare
@heheda12345 @njhill Moved all hashing to be triggered by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late response. Can you add more comments in hash_request_tokens
and cache_full_blocks
. I think there are many tricks that needs to be explained here
vllm/v1/request.py
Outdated
@@ -108,8 +111,17 @@ def __init__( | |||
# indicates that the output is corrupted | |||
self.num_nans_in_logits = 0 | |||
|
|||
self.block_hashes: list[BlockHash] = [] | |||
self.block_hasher: Optional[Callable[[Request], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally prefer self.block_hasher= partial(block_hasher, request=self)
so that we can call it with self.block_hasher()
without passing self
.
You mean in |
I want to say move some comments in |
5ea914f
to
0241ee6
Compare
@heheda12345 I made another set of changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your great work and patience.
@orozery could you rebase on latest main? Should hopefully help with at least some of the CI failures. |
This commit move the request block hashes from the KVCacheManager to the Request object itself. In particular, this will allow connectors to access the request block hashes. Signed-off-by: Or Ozeri <oro@il.ibm.com>
Head branch was pushed to by a user without write access
0241ee6
to
564a359
Compare
done |
This PR move the request block hashes from the KVCacheManager to the Request object itself.
In particular, this will allow connectors to access the request block hashes.
Part of the work described in RFC #19854