Skip to content

[QST] Where MMA operation is performed? #2187

@IzanCatalan

Description

@IzanCatalan

What is your question?
Hi, I am trying to understand how it is performed the MMA operation inside the convolution kernel, specifically in this part of the code:

mma(params.gemm_k_iterations, accumulators, iterator_A, iterator_B, accumulators, params.gemm_k_iterations_per_channel);

What work is done by each thread inside this code? Which channel filters are computed for each thread using iteratorB? Where is the result, and is it in shared memory accessed by all threads?

A little review of the workflow from this line of code to the end would greatly help me understand how the convolution is implemented in GPU.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions