Skip to content

Conversation

@qyh111
Copy link
Contributor

@qyh111 qyh111 commented Sep 8, 2025

Purpose

What this PR does / why we need it?

Fix hole match problem when lookup and create
Remove some unnecessary parameters when build metadata

Modifications

Does this PR introduce any user-facing change?

No just metadata change

Test

image

Tested by offline inferrence
image
image
image

also use benchmark to test online service

How was this patch tested?

No new patch added

@qyh111 qyh111 changed the title [WIP]Dev qyh 0908 [WIP]refactor ucconnector Sep 8, 2025
@flesher0813
Copy link
Contributor

Find no location where _load_req_to_blocks is assigned, so the blocks that failed to load cannot be properly returned.

@qyh111 qyh111 changed the title [WIP]refactor ucconnector refactor ucconnector Sep 10, 2025
@ygwpz
Copy link
Contributor

ygwpz commented Sep 10, 2025

pr title should include [feature] / [xx]

@qyh111 qyh111 changed the title refactor ucconnector [Feature]refactor ucconnector Sep 11, 2025
assert len(fetch_block_ids) == len(fetch_block_hashes)
blocks_len = len(fetch_block_ids)

storage_block_ids = [block[0] for block in request.load_blocks]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comments to help understand this code, block_hash of ReqMeta.load_blocks is storage_block_ids could be confusing

Copy link
Contributor Author

@qyh111 qyh111 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already add comments

)

if request.load_async:
if request.load_async and request.request_id in self.layerwise_load_tasks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the judgment of request.request_id in self.layerwise_load_tasks necessary? So as the judgment below

Copy link
Contributor Author

@qyh111 qyh111 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case

+ save_param.num_blocks_to_save
]
blocks_len = len(vllm_block_ids)
storage_block_ids = [block[0] for block in request.dump_blocks]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comments to help understand this code too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already add comments


need_load_tokens = max(num_external_computed_tokens - num_computed_tokens, 0)
# Load async when Decode instance need to load.
# Load async when Decode instance need to load.kv_consumer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove kv_consumer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ygwpz
Copy link
Contributor

ygwpz commented Sep 11, 2025

log format should be unified

@ygwpz ygwpz merged commit 62ed31f into ModelEngine-Group:develop Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants