-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
Description
Anything you want to discuss about ucm.
Hi, if I disable VLLM's prefix caching (--no-enable-prefix-caching) and use UCM's NFS storage to offload the KV cache to an SSD, is there still kv/prefix cache in HBM?
And I noticed that when I enabled VLLM's prefix caching (--enable-prefix-caching) and tested offloading the KV cache to the SSD, the KV cache files also generated on the SSD at the start of the program, even though my available GPU memory was much larger than the model size.
At this time, is the KV cache in the SSD the same as the KV cache in HBM?