Skip to content

[QST] Why do you need partition_contiguous_idx in Congruous layout (crosswise is always set to 64)? #511

@peisun1115

Description

@peisun1115

Though you can still partition the layout to be 4 '4x4' (128bits per element) blocks, it seems unnecessary when crosswise is 64 and it just adds more compute? it can just be treated as 1 '8x8' block?

For example, these lines,

if (Policy::LdsmShape::kContiguous == 4) {
// Matrix multiply 1688 A/B
// Q0 Q1 Q2 Q3 (Q stands for 1 8x128bit block).
// Four blocks are next to each other in the contiguous dimension.
partition_contiguous_idx = ((lane_in_quad_pair >> 2) ^ i);
access_contiguous_idx = (quad_pair ^ lane_in_quad);
access_strided_idx = lane_in_quad_pair;
}

Why isn't it just written as the following?

if (Policy::LdsmShape::kContiguous == 4) {
// Matrix multiply 1688 A/B
// Q0 Q1 Q2 Q3 (Q stands for 1 8x128bit block).
// Four blocks are next to each other in the contiguous dimension.
partition_contiguous_idx = 0; // not needed, 8x8 block, no partition
access_contiguous_idx = (quad_pair + (i << 2)) ^ lane_in_quad_pair;
access_strided_idx = lane_in_quad_pair;
}

Is there any case where this partition have to happen?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions