avoid the dPhi calculation in createTriplet by YonsiG · Pull Request #296 · SegmentLinking/TrackLooper

YonsiG · 2023-06-22T17:09:28Z

The dPhi calculation is very time consuming from the compiler. (Which is the largest warp stalls of the Triplet kernel). We store already the dPhi information in the mdsInGPU, so we can avoid some of the dPhi calculation in creating the T3 and use the variable directly.
After this change, the timing of T3 kernel is decreased. From single stream to multi stream, the timing decrease is visible.
As a validation check, using Phi instead of computed from XY has the same physics performance as before.
master: http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/fix_dphi/master_again_again_PU200_NEVT-1_b498de-PU200/compare/TC_eff_base_0.html
after this PR: http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/fix_dphi/T3T5_removedPhi_PU200_NEVT-1_61a75dD-PU200/compare/TC_eff_base_0.html
The master timing is

The timing after using anchorPhi in Triplet kernels

Furthermore, apply this change to the Quintuplet.cu kernels. The timing after using anchorPhi in Quintuplet kernels:

The Segments kernels do not have many usage of the deltaPhi, optimizing it does not give obvious performance gains. Timing after change:

slava77 · 2023-06-22T17:31:14Z

@YonsiG
please prepare/post the profiler reports

YonsiG · 2023-06-22T17:49:12Z

@YonsiG
please prepare/post the profiler reports

Seems like from profiling results, kernel time for createTripletsInGPUv2 is decreased by 25%. createQuintupletsInGPUv2 also speed up ~25%, createSegmentsInGPUv2 speeds up ~3%
profiler results in cgpu
profiler results in simplify dPhi:
http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/fix_dphi/profiling_fixdPhiLST3T5.ncu-rep
profiler results in master:
http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/fix_dphi/profiling_master_cgpu2.ncu-rep

YonsiG · 2023-06-28T17:42:39Z

Adding the highedge and lowedge phi does not seem help from sdl_timing

But if we read the profiler reports, it is decreasing the LS kernel time by ~0.01ms, T3 for ~0.1ms, T5 for ~0.15ms, at the cost of increase MD kernel time by 0.1ms. Going at the correct direction at least.
http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/fix_dphi/profiling_precomputedPhi.ncu-rep

VourMa · 2023-06-30T15:12:46Z

@YonsiG I am unclear on the timeline for this PR. What else needs to be done and when do you foresee it to be ready for review? Or is it ready already?

YonsiG · 2023-06-30T16:00:28Z

@YonsiG I am unclear on the timeline for this PR. What else needs to be done and when do you foresee it to be ready for review? Or is it ready already?

Hi Manos, this PR is ready to review, and I have finished adding 2 variables saved in MD for usage: HighEdgePhi and LowEdgePhi

VourMa · 2023-06-30T17:18:06Z

@YonsiG As I am starting to review this, could you post some screenshots from the profiler results, so that people have a general picture of what you improved and can know where to look for more in the profiler? For example, it would be interesting to compare how the stalls change in the lines you modified.

YonsiG · 2023-07-04T19:23:02Z

@YonsiG As I am starting to review this, could you post some screenshots from the profiler results, so that people have a general picture of what you improved and can know where to look for more in the profiler? For example, it would be interesting to compare how the stalls change in the lines you modified.

I can paste a few profiler reports here for quick reference. This is a comparison of the T5 in master and createT5 after change. The green is the baseline master while the blue is current after change. The "stalls wait" and "stalls no instructions" have been reduced a lot, while the third longest "stall long scoreboard" increased a bit.

The overall memory throughput is increased

YonsiG · 2023-07-04T19:31:31Z

This is the atan line in Hit.cuh, it has warp stats and instructions executed in create T5 kernels. It's a bit hard to read from the lines, since those number doesn't precisely match the warp usage in each line and they move around up and down. But still can see that the numbers generally decreased after changing
In master

after the change

VourMa · 2023-07-04T21:39:34Z

Thanks for the profiler screenshots, they are useful. Could you modify the comment with the one where you show specific lines, so that you explain the color code and the different numbers?

VourMa

Thanks a lot for the updates, @YonsiG! Merging this...

VourMa · 2023-07-27T00:23:37Z

+    mdsInGPU.anchorHighEdgePhi[idx] = atan2f(mdsInGPU.anchorHighEdgeY[idx], mdsInGPU.anchorHighEdgeX[idx]);
+    mdsInGPU.anchorLowEdgePhi[idx] = atan2f(mdsInGPU.anchorLowEdgeY[idx], mdsInGPU.anchorLowEdgeX[idx]);


Should we be applying the phi function here now?

I think we can, using phi(x,y)

avoid the dPhi calculation in createTriplet

61a75dc

slava77 reviewed Jun 22, 2023

View reviewed changes

Comment thread SDL/Hit.cuh Outdated

add anchorPhi in T5 calculation

b63c67a

GNiendorf reviewed Jun 23, 2023

View reviewed changes

Comment thread SDL/Quintuplet.cu Outdated

YonsiG added 2 commits June 28, 2023 10:00

add pre-computed dPhi in segments

b098104

pre-compute high edge phi and low edge phi

58b9bd3

VourMa reviewed Jun 30, 2023

View reviewed changes

Comment thread SDL/Hit.cuh Outdated

Comment thread SDL/Quintuplet.cu Outdated

YonsiG added 2 commits July 7, 2023 08:53

add protection on phi_mpi_pi

32c54ff

change the atan2 function into a wrapped SDL::phi function

0acd6fc

VourMa approved these changes Jul 8, 2023

View reviewed changes

VourMa merged commit 6e6e2d0 into master Jul 8, 2023

slava77 reviewed Jul 26, 2023

View reviewed changes

Comment thread SDL/Hit.cuh

VourMa reviewed Jul 27, 2023

View reviewed changes

YonsiG mentioned this pull request Jul 27, 2023

Fixing hackathon bugs #308

Closed

ariostas deleted the fix-dPhi branch May 8, 2024 21:06

		mdsInGPU.anchorHighEdgePhi[idx] = atan2f(mdsInGPU.anchorHighEdgeY[idx], mdsInGPU.anchorHighEdgeX[idx]);
		mdsInGPU.anchorLowEdgePhi[idx] = atan2f(mdsInGPU.anchorLowEdgeY[idx], mdsInGPU.anchorLowEdgeX[idx]);

Conversation

YonsiG commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slava77 commented Jun 22, 2023

Uh oh!

Uh oh!

YonsiG commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

YonsiG commented Jun 28, 2023

Uh oh!

VourMa commented Jun 30, 2023

Uh oh!

YonsiG commented Jun 30, 2023

Uh oh!

VourMa commented Jun 30, 2023

Uh oh!

Uh oh!

Uh oh!

YonsiG commented Jul 4, 2023

Uh oh!

YonsiG commented Jul 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VourMa commented Jul 4, 2023

Uh oh!

VourMa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VourMa Jul 27, 2023

Choose a reason for hiding this comment

Uh oh!

YonsiG Jul 27, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YonsiG commented Jun 22, 2023 •

edited

Loading

YonsiG commented Jun 22, 2023 •

edited

Loading

YonsiG commented Jul 4, 2023 •

edited

Loading