Conversation
|
What's the argument for precomputing these? I forget, but the modules.cu stuff isn't included in the timing right? So are we expecting this quantity to be on the GPU already when we integrate into CMSSW? Edit: Never mind, we discussed this in the meeting, but I agree that the issue of putting the modules.cu calculations in the timing should be done in a separate PR. Would be outside the scope of this PR. |
|
Is there any benefit to the kernel timing from the reduced stalls btw? From the profiler. |
Yes, about 7% - 1.98 to 1.84 ms on the A100 of the NVIDIA cluster. |
|
Thank you for this checks Manos! Gavin has asked some good questions and I think the commit is clear for me to merge. |
As per title, we identified some variables that were computed per kernel, even though they are properties of the modules and, hence, they can be computed up front. These variables are used in the Triplet and Quintuplet kernels.
On cgpu-1 (A30)

Before:
After:

I guess the timing is within the usual variations.
Stall reduction

Before (T3):
After (T3):

Before (T5):

After (T5):
