Closed
Description
In the "Fully Packed Layout (THD)" under Case 3 on this page, I noticed the following description:
`Q = aabb`
dimension = [B = 2, H = 1, S = 8, D = 64]
stride = [S × H × D = 512, D = 64, H × D = 64, 1]
What confuses me is that, despite using ragged_tensors, the dimensions still appear the same as they would be without ragged_tensors.
From my understanding, ragged_tensors should offer two key benefits:
- Improved memory access efficiency (due to more compact data arrangement).
- Memory savings (when sequences within a batch have varying lengths, ragged_tensors provide a more compact memory layout, as shown by the example
Q=aabbb
instead ofQ[b=0]=aa000000, Q[b=1]=BBB00000
).
However, in this case, the dimensions are still given as [B, H, S, D], which seems to suggest that the purpose of using ragged_tensors here is purely to improve memory access efficiency, without any memory savings.
Could you kindly clarify whether my understanding is correct?
Metadata
Metadata
Assignees
Labels
No labels