Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPCClusterFinder: Improve performance of noisy pad filter on CPU. #5249

Merged
merged 1 commit into from Jan 26, 2021

Conversation

fweig
Copy link
Contributor

@fweig fweig commented Jan 21, 2021

This PR adds a separate, faster CPU implementation of the noisy pad filter.

@fweig fweig requested a review from davidrohr as a code owner January 21, 2021 10:58
Copy link
Collaborator

@davidrohr davidrohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fweig
I just tested this on CPU and GPU. On the NVIDIA GPU the output is the same as before and the performance is 50% better, which is quite nice.

But on the CPU something must be wrong. The performance seems has not improved. It is more or less identical to before, or sometimes worse.
And on some events I see that number of clusters changes significantly, this seems to happen when I change number of threads, if I run with OMP_NUM_THREADS=1 the output seems correct. Also, I believe this might happen when there are empty TPC sectors. Could you have a look?

@davidrohr
Copy link
Collaborator

For reference, a perfomance output I see here (obtained with OMP_NUM_THREADS=1 o2-tpc-reco-workflow -b --infile tpcdigits.root --configKeyValues "GPU_rec.TrackReferenceX=83;GPU_proc.debugLevel=1" --output-type clusters,tracks):

[77380:tpc-tracker]: [17:20:30][INFO] Event has 774 8kb TPC ZS pages, 22545 digits
[77380:tpc-tracker]: [17:20:34][INFO] Event has 2898 TPC Clusters, 0 TRD Tracklets
[77380:tpc-tracker]: Execution Time: Task (K      188x):                                      GPUMemClean16 Time:     461893 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):          GPUTPCCFChargeMapFiller_findFragmentStart Time:         61 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                                   GPUTPCCFDecodeZS Time:        771 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                                GPUTPCConvertKernel Time:        323 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCCreateSliceData Time:        169 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCNeighboursFinder Time:        238 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                            GPUTPCNeighboursCleaner Time:         32 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCStartHitsFinder Time:         27 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):              GPUTPCTrackletConstructor_singleSlice Time:        417 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCTrackletSelector Time:         18 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                    GPUTPCGlobalTrackingCopyNumbers Time:          2 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                               GPUTPCGlobalTracking Time:         71 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                       GPUTPCGMMergerUnpackResetIds Time:          2 us
[77380:tpc-tracker]: Execution Time: Task (K       73x):                     GPUTPCGMMergerUnpackSaveNumber Time:          2 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                           GPUTPCGMMergerSliceRefit Time:        526 us
[77380:tpc-tracker]: Execution Time: Task (K       36x):                         GPUTPCGMMergerUnpackGlobal Time:          1 us
[77380:tpc-tracker]: Execution Time: Task (K        3x):                           GPUTPCGMMergerClearLinks Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                   GPUTPCGMMergerMergeWithinPrepare Time:          1 us
[77380:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step0 Time:         11 us
[77380:tpc-tracker]: Execution Time: Task (K        6x):                   GPUTPCGMMergerMergeBorders_step1 Time:          6 us
[77380:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step2 Time:          9 us
[77380:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step0 Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step1 Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step2 Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step3 Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step4 Time:          1 us
[77380:tpc-tracker]: Execution Time: Task (K        3x):                   GPUTPCGMMergerMergeSlicesPrepare Time:          5 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                     GPUTPCGMMergerLinkGlobalTracks Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerCollect Time:         21 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerMergeCE Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step0 Time:          0 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                        GPUTPCGMMergerSortTracksQPt Time:          1 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step1 Time:          8 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step2 Time:          5 us
[77380:tpc-tracker]: Execution Time: Task (K        2x):                             GPUTPCGMMergerTrackFit Time:       1235 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step0 Time:          1 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step1 Time:          7 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step2 Time:          3 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):             GPUTPCCompressionKernels_step0attached Time:        204 us
[77380:tpc-tracker]: Execution Time: Task (K        1x):           GPUTPCCompressionKernels_step1unattached Time:        105 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):               GPUTPCCFChargeMapFiller_fillIndexMap Time:      22259 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):                           GPUTPCCFCheckPadBaseline Time:    3528825 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):                                 GPUTPCCFPeakFinder Time:       1634 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):          GPUTPCCFNoiseSuppression_noiseSuppression Time:       1121 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):               GPUTPCCFNoiseSuppression_updatePeaks Time:         28 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):                              GPUTPCCFDeconvolution Time:       1011 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):                                GPUTPCCFClusterizer Time:       2437 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):             GPUTPCCFMCLabelFlattener_setRowOffsets Time:         26 us
[77380:tpc-tracker]: Execution Time: Task (K       23x):                   GPUTPCCFMCLabelFlattener_flatten Time:        114 us
[77380:tpc-tracker]: Execution Time: Step              :       Tasks                     TPC Transformation Time:        323 us ( Total Time :            332 us)
[77380:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Sector Tracking Time:        977 us ( Total Time :           1171 us)
[77380:tpc-tracker]: Execution Time: Step              :       Tasks              TPC Track Merging and Fit Time:       1858 us ( Total Time :           1953 us)
[77380:tpc-tracker]: Execution Time: Step              :       Tasks                        TPC Compression Time:        309 us ( Total Time :           2438 us)
[77380:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Cluster Finding Time:    4020185 us ( Total Time :        4037049 us)
[77380:tpc-tracker]: Execution Time: General Step      :                                            Prepare Time:       2509 us
[77380:tpc-tracker]: Execution Time: Total   :                                       Total Kernel Time:    4023654 us
[77380:tpc-tracker]: Execution Time: Total   :                                         Total Wall Time:    4042965 us
[77380:tpc-tracker]: [17:20:34][INFO] found 20 track(s)
[77380:tpc-tracker]: [17:20:34][INFO] sending 20 track label(s)
[77380:tpc-tracker]: [17:20:34][INFO] TPC CATracker time for this TF 4.04 s
[77425:tpc-track-writer]: [17:20:34][INFO] writing 20 track(s)

The baseline filter takes 3.5 seconds, which is much too long I would say compared to the rest. If we cannot get this down significantly, we have to think about a different algorithm.

@fweig
Copy link
Contributor Author

fweig commented Jan 21, 2021

@davidrohr Ugh, it should be much faster than that. I'll have a closer look tomorrow.

@davidrohr
Copy link
Collaborator

thx, simulation is

o2-sim -n 20 -g pythia8
o2-sim-digitizer-workflow --TPCtriggered

@fweig fweig force-pushed the pad-baseline-fix branch 3 times, most recently from 054dde9 to 4214815 Compare January 24, 2021 17:03
@fweig
Copy link
Contributor Author

fweig commented Jan 24, 2021

@davidrohr Just pushed another update. CPU pad filter is now vectorized. I'm seeing a speedup of ~6 compared to the version in dev. I hope this also fixes any bugs when multiple threads are used. At least the number of clusters looks identical when I increase the number of threads.

@davidrohr
Copy link
Collaborator

davidrohr commented Jan 24, 2021 via email

@fweig
Copy link
Contributor Author

fweig commented Jan 24, 2021

@davidrohr I used your commands from above for the simulation. And also copied your command to run the reco-workflow.
Here's an example output i get with my changes:

[63882:tpc-tracker]: [17:46:35][INFO] Event has 1144 8kb TPC ZS pages, 231512 digits                                                                                                                                                                                                                                   
[63882:tpc-tracker]: [17:46:37][INFO] Event has 26571 TPC Clusters, 0 TRD Tracklets                                                                                                                                                                                                           
[63882:tpc-tracker]: Execution Time: Task (K      188x):                                      GPUMemClean16 Time:    1451104 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):          GPUTPCCFChargeMapFiller_findFragmentStart Time:        650 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                                   GPUTPCCFDecodeZS Time:       7439 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                                GPUTPCConvertKernel Time:       3622 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCCreateSliceData Time:       1448 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCNeighboursFinder Time:       2871 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                            GPUTPCNeighboursCleaner Time:        176 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCStartHitsFinder Time:        131 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):              GPUTPCTrackletConstructor_singleSlice Time:       6130 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCTrackletSelector Time:        200 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                    GPUTPCGlobalTrackingCopyNumbers Time:          2 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                               GPUTPCGlobalTracking Time:        343 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                       GPUTPCGMMergerUnpackResetIds Time:          2 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       73x):                     GPUTPCGMMergerUnpackSaveNumber Time:          4 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                           GPUTPCGMMergerSliceRefit Time:       5617 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                         GPUTPCGMMergerUnpackGlobal Time:          9 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        3x):                           GPUTPCGMMergerClearLinks Time:          1 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                   GPUTPCGMMergerMergeWithinPrepare Time:         16 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step0 Time:         26 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        6x):                   GPUTPCGMMergerMergeBorders_step1 Time:         29 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step2 Time:         26 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step0 Time:          1 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step1 Time:          4 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step2 Time:          0 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step3 Time:          3 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step4 Time:         10 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        3x):                   GPUTPCGMMergerMergeSlicesPrepare Time:         79 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                     GPUTPCGMMergerLinkGlobalTracks Time:          0 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerCollect Time:        440 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerMergeCE Time:          2 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step0 Time:          0 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                        GPUTPCGMMergerSortTracksQPt Time:          8 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step1 Time:        104 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step2 Time:         56 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        2x):                             GPUTPCGMMergerTrackFit Time:      14353 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step0 Time:         13 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step1 Time:        179 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step2 Time:         48 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                           GPUTPCGMO2Output_prepare Time:         26 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMO2Output_sort Time:          2 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                            GPUTPCGMO2Output_output Time:         59 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):                                GPUTPCGMO2Output_mc Time:        157 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):             GPUTPCCompressionKernels_step0attached Time:       1662 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K        1x):           GPUTPCCompressionKernels_step1unattached Time:        933 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):               GPUTPCCFChargeMapFiller_fillIndexMap Time:      71152 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                           GPUTPCCFCheckPadBaseline Time:     963685 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                                 GPUTPCCFPeakFinder Time:      20116 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):          GPUTPCCFNoiseSuppression_noiseSuppression Time:      14100 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):               GPUTPCCFNoiseSuppression_updatePeaks Time:        260 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCCFDeconvolution Time:      18030 us                                                                                                                                                               
[63882:tpc-tracker]: Execution Time: Task (K       36x):                                GPUTPCCFClusterizer Time:      27115 us
[63882:tpc-tracker]: Execution Time: Task (K       36x):             GPUTPCCFMCLabelFlattener_setRowOffsets Time:        119 us
[63882:tpc-tracker]: Execution Time: Task (K       36x):                   GPUTPCCFMCLabelFlattener_flatten Time:        831 us
[63882:tpc-tracker]: Execution Time: Step              :       Tasks                     TPC Transformation Time:       3622 us ( Total Time :           3645 us)
[63882:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Sector Tracking Time:      11306 us ( Total Time :          11815 us)
[63882:tpc-tracker]: Execution Time: Step              :       Tasks              TPC Track Merging and Fit Time:      21288 us ( Total Time :          21548 us)
[63882:tpc-tracker]: Execution Time: Step              :       Tasks                        TPC Compression Time:       2595 us ( Total Time :           4936 us)
[63882:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Cluster Finding Time:    2574605 us ( Total Time :        2650640 us)
[63882:tpc-tracker]: Execution Time: General Step      :                                            Prepare Time:       1110 us
[63882:tpc-tracker]: Execution Time: Total   :                                       Total Kernel Time:    2613417 us
[63882:tpc-tracker]: Execution Time: Total   :                                         Total Wall Time:    2692627 us
[63882:tpc-tracker]: [17:46:38][INFO] found 166 track(s)
[63882:tpc-tracker]: [17:46:38][INFO] TPC CATracker time for this TF 2.69 s

Performance output from dev for the same data:

[56155:tpc-tracker]: [17:43:08][INFO] Event has 1144 8kb TPC ZS pages, 231512 digits
[56155:tpc-tracker]: [17:43:16][INFO] Event has 26571 TPC Clusters, 0 TRD Tracklets
[56155:tpc-tracker]: Execution Time: Task (K      188x):                                      GPUMemClean16 Time:    1361579 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):          GPUTPCCFChargeMapFiller_findFragmentStart Time:        604 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                                   GPUTPCCFDecodeZS Time:       7525 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                                GPUTPCConvertKernel Time:      10192 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCCreateSliceData Time:       1426 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCNeighboursFinder Time:       2867 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                            GPUTPCNeighboursCleaner Time:        187 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCStartHitsFinder Time:        126 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):              GPUTPCTrackletConstructor_singleSlice Time:       6016 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                             GPUTPCTrackletSelector Time:        229 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                    GPUTPCGlobalTrackingCopyNumbers Time:          1 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                               GPUTPCGlobalTracking Time:        349 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                       GPUTPCGMMergerUnpackResetIds Time:          4 us
[56155:tpc-tracker]: Execution Time: Task (K       73x):                     GPUTPCGMMergerUnpackSaveNumber Time:          4 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                           GPUTPCGMMergerSliceRefit Time:       5355 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                         GPUTPCGMMergerUnpackGlobal Time:          7 us
[56155:tpc-tracker]: Execution Time: Task (K        3x):                           GPUTPCGMMergerClearLinks Time:          1 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                   GPUTPCGMMergerMergeWithinPrepare Time:         17 us
[56155:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step0 Time:         21 us
[56155:tpc-tracker]: Execution Time: Task (K        6x):                   GPUTPCGMMergerMergeBorders_step1 Time:         26 us
[56155:tpc-tracker]: Execution Time: Task (K      180x):                   GPUTPCGMMergerMergeBorders_step2 Time:         26 us
[56155:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step0 Time:          1 us
[56155:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step1 Time:          3 us
[56155:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step2 Time:          1 us
[56155:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step3 Time:          3 us
[56155:tpc-tracker]: Execution Time: Task (K        4x):                        GPUTPCGMMergerResolve_step4 Time:          9 us
[56155:tpc-tracker]: Execution Time: Task (K        3x):                   GPUTPCGMMergerMergeSlicesPrepare Time:         75 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                     GPUTPCGMMergerLinkGlobalTracks Time:          0 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerCollect Time:        404 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMMergerMergeCE Time:          3 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step0 Time:          0 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                        GPUTPCGMMergerSortTracksQPt Time:          7 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step1 Time:         97 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                GPUTPCGMMergerPrepareClusters_step2 Time:         58 us
[56155:tpc-tracker]: Execution Time: Task (K        2x):                             GPUTPCGMMergerTrackFit Time:      13407 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step0 Time:         12 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step1 Time:        188 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                       GPUTPCGMMergerFinalize_step2 Time:         45 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                           GPUTPCGMO2Output_prepare Time:         24 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                              GPUTPCGMO2Output_sort Time:          2 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                            GPUTPCGMO2Output_output Time:         57 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):                                GPUTPCGMO2Output_mc Time:        139 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):             GPUTPCCompressionKernels_step0attached Time:       1535 us
[56155:tpc-tracker]: Execution Time: Task (K        1x):           GPUTPCCompressionKernels_step1unattached Time:        913 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):               GPUTPCCFChargeMapFiller_fillIndexMap Time:      69588 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                           GPUTPCCFCheckPadBaseline Time:    6866671 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                                 GPUTPCCFPeakFinder Time:      20491 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):          GPUTPCCFNoiseSuppression_noiseSuppression Time:      14108 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):               GPUTPCCFNoiseSuppression_updatePeaks Time:        261 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                              GPUTPCCFDeconvolution Time:      17096 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                                GPUTPCCFClusterizer Time:      27812 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):             GPUTPCCFMCLabelFlattener_setRowOffsets Time:        131 us
[56155:tpc-tracker]: Execution Time: Task (K       36x):                   GPUTPCCFMCLabelFlattener_flatten Time:        834 us
[56155:tpc-tracker]: Execution Time: Step              :       Tasks                     TPC Transformation Time:      10192 us ( Total Time :          10217 us)
[56155:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Sector Tracking Time:      11205 us ( Total Time :          11774 us)
[56155:tpc-tracker]: Execution Time: Step              :       Tasks              TPC Track Merging and Fit Time:      20009 us ( Total Time :          20283 us)
[56155:tpc-tracker]: Execution Time: Step              :       Tasks                        TPC Compression Time:       2449 us ( Total Time :           4681 us)
[56155:tpc-tracker]: Execution Time: Step              :       Tasks                    TPC Cluster Finding Time:    8386705 us ( Total Time :        8472795 us)
[56155:tpc-tracker]: Execution Time: General Step      :                                            Prepare Time:       1643 us
[56155:tpc-tracker]: Execution Time: Total   :                                       Total Kernel Time:    8430563 us
[56155:tpc-tracker]: Execution Time: Total   :                                         Total Wall Time:    8519795 us
[56155:tpc-tracker]: [17:43:16][INFO] found 166 track(s)
[56155:tpc-tracker]: [17:43:16][INFO] TPC CATracker time for this TF 8.53 s

@davidrohr
Copy link
Collaborator

@fweig : With the new version, I see a significant speedup in the CPU version, and the same speedup in the GPU version as before. GPU results seem correct, but the CPU version still looses clusters, now I see it even with 1 OMP thread.
For comparison, here are my logs of the tpc-reco-workflow before and after your commit:
old.log
new.log

I have also tested it in the standalone benchmark, with the same result.
Before:

qon@qon ~/standalone $ ./ca -e o2-pp-10 --debug 1 -c --omp 1
[...]
Event has 46778 TPC Clusters, 0 TRD Tracklets
Output Tracks: 489 (0 / 30436 / 0 / 46778 clusters (fitted / attached / adjacent / total))
[...]
Execution Time: Task (K       36x):                           GPUTPCCFCheckPadBaseline Time:  4.377.006 us

After:

qon@qon ~/standalone $ ./ca -e o2-pp-10 --debug 1 -c --omp 1
[...]
Event has 24574 TPC Clusters, 0 TRD Tracklets
Output Tracks: 147 (0 / 4653 / 0 / 24574 clusters (fitted / attached / adjacent / total))
[...]
Execution Time: Task (K       36x):                           GPUTPCCFCheckPadBaseline Time:    283.996 us

I have uploaded the standalone data I used for the test here: https://qon.jwdt.org/nmls/tmp/o2-pp-10.tar.bz2
Could you please check again?

@davidrohr
Copy link
Collaborator

For reference, please see 2 event display screenshots before and after the commit. There seems no geometrically systematic loss of clusters, but just a general loss all over the TPC.
before
after

@fweig
Copy link
Contributor Author

fweig commented Jan 25, 2021

Ok, thx. I'm on it.

@fweig
Copy link
Contributor Author

fweig commented Jan 25, 2021

I was able to reproduce the missing clusters and hopefully fix the problem. Apparently I was using the wrong way to initialize the Vc vectors which caused undefined behavior. When compiled with gcc 7, which I used, it still worked fine. But broke for newer gcc versions.

Tested your dataset with gcc 7 and gcc 10 on the standalone benchmark. Got the correct amount of clusters for both compilers and with one and multiple omp threads, as well as the GPU version.

@davidrohr
Copy link
Collaborator

@fweig : Thx, this seems correct now. On unfortunate point is that there is no fallback without Vc, while the standalone benchmark can still work without Vc at the moment. Could you add a fallback CPU code with a

#ifdef GPUCA_NO_VC

protection, such that it works also without Vc?

@fweig
Copy link
Contributor Author

fweig commented Jan 26, 2021

@davidrohr Ah, forgot to ask about that. Fallback when Vc is not available is now up.

@davidrohr davidrohr merged commit cd07082 into AliceO2Group:dev Jan 26, 2021
@fweig fweig deleted the pad-baseline-fix branch January 27, 2021 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants