[QST] Where MMA operation is performed?

**What is your question?**
Hi, I am trying to understand how it is performed the MMA operation inside the convolution kernel, specifically in this part of the code:

https://github.com/NVIDIA/cutlass/blob/62750a2b75c802660e4894434dc55e839f322277/include/cutlass/conv/kernel/implicit_gemm_convolution.h#L350

What work is done by each thread inside this code? Which channel filters are computed for each thread using iteratorB? Where is the result, and is it in shared memory accessed by all threads?

A little review of the workflow from this line of code to the end would greatly help me understand how the convolution is implemented in GPU.

Thank you. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Where MMA operation is performed? #2187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Where MMA operation is performed? #2187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions