-
Notifications
You must be signed in to change notification settings - Fork 48
[Feature]refactor ucconnector #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Find no location where _load_req_to_blocks is assigned, so the blocks that failed to load cannot be properly returned. |
50f88e0 to
50ab55d
Compare
|
pr title should include [feature] / [xx] |
| assert len(fetch_block_ids) == len(fetch_block_hashes) | ||
| blocks_len = len(fetch_block_ids) | ||
|
|
||
| storage_block_ids = [block[0] for block in request.load_blocks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add some comments to help understand this code, block_hash of ReqMeta.load_blocks is storage_block_ids could be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already add comments
| ) | ||
|
|
||
| if request.load_async: | ||
| if request.load_async and request.request_id in self.layerwise_load_tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the judgment of request.request_id in self.layerwise_load_tasks necessary? So as the judgment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case
| + save_param.num_blocks_to_save | ||
| ] | ||
| blocks_len = len(vllm_block_ids) | ||
| storage_block_ids = [block[0] for block in request.dump_blocks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add some comments to help understand this code too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already add comments
ucm/integration/vllm/uc_connector.py
Outdated
|
|
||
| need_load_tokens = max(num_external_computed_tokens - num_computed_tokens, 0) | ||
| # Load async when Decode instance need to load. | ||
| # Load async when Decode instance need to load.kv_consumer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove kv_consumer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
e475a01 to
628f6fe
Compare
628f6fe to
ac7b6a3
Compare
|
log format should be unified |
Purpose
What this PR does / why we need it?
Fix hole match problem when lookup and create
Remove some unnecessary parameters when build metadata
Modifications
Does this PR introduce any user-facing change?
No just metadata change
Test
Tested by offline inferrence



also use benchmark to test online service
How was this patch tested?
No new patch added