Skip to content

[Misc]: Question about prefix cache #304

@LIULIU7450

Description

@LIULIU7450

Anything you want to discuss about ucm.

Hi, if I disable VLLM's prefix caching (--no-enable-prefix-caching) and use UCM's NFS storage to offload the KV cache to an SSD, is there still kv/prefix cache in HBM?
And I noticed that when I enabled VLLM's prefix caching (--enable-prefix-caching) and tested offloading the KV cache to the SSD, the KV cache files also generated on the SSD at the start of the program, even though my available GPU memory was much larger than the model size.
At this time, is the KV cache in the SSD the same as the KV cache in HBM?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions