[Misc]: Question about prefix cache

### Anything you want to discuss about ucm.

Hi, if I disable VLLM's prefix caching (`--no-enable-prefix-caching`) and use UCM's NFS storage to offload the KV cache to an SSD, is there still kv/prefix cache in HBM?
And I noticed that when I enabled VLLM's prefix caching (`--enable-prefix-caching`) and tested offloading the KV cache to the SSD, the KV cache files also generated on the SSD at the start of the program, even though my available GPU memory was much larger than the model size.
At this time, is the KV cache in the SSD the same as the KV cache in HBM?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc]: Question about prefix cache #304

Anything you want to discuss about ucm.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Misc]: Question about prefix cache #304

Description

Anything you want to discuss about ucm.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions