-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
Description
What is your question?
Hi, I am trying to understand how it is performed the MMA operation inside the convolution kernel, specifically in this part of the code:
| mma(params.gemm_k_iterations, accumulators, iterator_A, iterator_B, accumulators, params.gemm_k_iterations_per_channel); |
What work is done by each thread inside this code? Which channel filters are computed for each thread using iteratorB? Where is the result, and is it in shared memory accessed by all threads?
A little review of the workflow from this line of code to the end would greatly help me understand how the convolution is implemented in GPU.
Thank you.
Reactions are currently unavailable