Skip to content

Adjust kv_cache_size calculation based on local_attn_size#1008

Merged
gushiqiao merged 1 commit intomainfrom
gushiqiao-patch-1
Apr 14, 2026
Merged

Adjust kv_cache_size calculation based on local_attn_size#1008
gushiqiao merged 1 commit intomainfrom
gushiqiao-patch-1

Conversation

@gushiqiao
Copy link
Copy Markdown
Contributor

No description provided.

@gushiqiao gushiqiao merged commit 90457e8 into main Apr 14, 2026
2 checks passed
@gushiqiao gushiqiao deleted the gushiqiao-patch-1 branch April 14, 2026 05:04
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the KV cache initialization in transformer_infer.py to account for local attention settings. A review comment suggests using self.max_attention_size instead of manual calculation to maintain consistency with configuration overrides and avoid potential runtime crashes or memory waste.

Comment on lines +55 to +59
if self.local_attn_size != -1:
kv_cache_size = self.local_attn_size * self.frame_seq_length // ws
else:
kv_cache_size = self._kv_size // ws
self.kv_cache_size = kv_cache_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for calculating kv_cache_size ignores the max_attention_size override from the configuration (cfg_max in reinit_caches). If cfg_max is set to a value larger than the calculated local_attn_size * self.frame_seq_length, the attention mechanism will attempt to access indices outside the allocated KV cache, leading to a crash. Conversely, if cfg_max is smaller, memory is wasted. Since self.max_attention_size already correctly accounts for this override and sequence parallelism when local_attn_size != -1, it should be used to set the cache size to ensure consistency and robustness.

Suggested change
if self.local_attn_size != -1:
kv_cache_size = self.local_attn_size * self.frame_seq_length // ws
else:
kv_cache_size = self._kv_size // ws
self.kv_cache_size = kv_cache_size
if self.local_attn_size != -1:
self.kv_cache_size = self.max_attention_size
else:
self.kv_cache_size = self._kv_size // ws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants