Skip to content

how each block sync in cluster #851

@mengchihe

Description

@mengchihe

Hi
In sm90_mma_tma_gmma_ss.hpp, for example cluster shape is <2, 2, 1>, it seems that each block in the cluster will issue two tma.multicast for both inputA and inputB in each stage, and it's full barrier's tx_count is the sum of the data needed in this stage.
My question is that, since each tma.multicast will reduce the tx_count of all barriers in the mask, one block's full barrier will reduced twice and arrived when only half of the data is ready. It will make sense when just block with cluster.x=0 or cluster.y=0 issue tma.multicast, is there some place has this condition that I didn't find out? thanks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions