You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with num_key_value_heads * num_key_value_groups heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost by num_key_value_groups. Is there a way we can solve this?
The text was updated successfully, but these errors were encountered:
In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with
num_key_value_heads * num_key_value_groups
heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost bynum_key_value_groups
. Is there a way we can solve this?The text was updated successfully, but these errors were encountered: