-
Notifications
You must be signed in to change notification settings - Fork 1
Some questions about the chunk and window #2
Copy link
Copy link
Open
Description
Nice work!
I didn't quite understand some of the settings in the paper. What does "chunk" refer to in the paper, and how does it differ from "window"? Doesn't the key-value cache only exist within the window?
Besides, the paper states, "For a non-keyframe i assigned to keyframe k, its tokens attend only to tokens from k and from frames between k and i under the causal or window mask." Thus, for non-keyframe k+1, it can only interact with the kv chche of k-th?Wouldn't such a small attention window (1?) easily drift away? Is it equivalent to starting attn from k again? Am I misunderstanding something?
Looking forward to your reply.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels