Open
Description
In the file flash_fwd_mla_kernel.h, there are several double-buffering processes. In each of these processes, the target offsets are either sK_offset / 8
or sK_offset
, where sK_offset
is equal to 576 * 64
. However, I am unclear about the purpose of the /8
operation.
If the iteration is intended to proceed along the N-dimension, the offset in matrix K should naturally be 576 * 64
. Why is the additional division by 8 necessary?
Metadata
Metadata
Assignees
Labels
No labels