Improve Various Patatrack kernels #35835

VinInn · 2021-10-25T15:20:41Z

Simplify and improve the logic of various Kernels.
Some maths have been sync with CPU version.
Fixed and improved the statistics collection and printing (off by default).

Technical.
No regression observed: math has changed so some regression cannot be excluded (even in CPU wf).

include bug fix for modules with large occupancy

Superseeds #35598

spelling Co-authored-by: Slava Krutelyov <slava77@gmail.com>

VinInn · 2021-10-25T15:24:12Z

enable gpu

VinInn · 2021-10-25T15:24:23Z

@cmsbuild, please test

VinInn · 2021-10-25T15:25:53Z

RecoLocalTracker/SiPixelClusterizer/plugins/gpuClustering.h

@@ -135,6 +135,11 @@ namespace gpuClustering {
 #ifdef __CUDA_ARCH__
      // assume that we can cover the whole module with up to 16 blockDim.x-wide iterations
      constexpr int maxiter = 16;
+      if (threadIdx.x == 0 && (hist.size() / blockDim.x) >= maxiter)
+        printf("THIS IS NOT SUPPOSED TO HAPPEN too many hits in module %d: %d for block size %d\n",


there is an assert below but it's compiled away!
We should fix this!

Should we add __trap() here then?

when we wiil have a Heterogeneous version, yes.

cmsbuild · 2021-10-25T15:27:54Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35835/26200

This PR adds an extra 116KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File CalibTracker/SiPixelESProducers/src/SiPixelGainCalibrationForHLTService.cc modified in PR(s): Improve various Patatrack Kernels #35598
- File CalibTracker/SiPixelESProducers/src/SiPixelGainCalibrationService.cc modified in PR(s): Improve various Patatrack Kernels #35598
- File CondFormats/SiPixelObjects/interface/SiPixelGainCalibrationForHLT.h modified in PR(s): Improve various Patatrack Kernels #35598
- File CondFormats/SiPixelObjects/interface/SiPixelGainForHLTonGPU.h modified in PR(s): Improve various Patatrack Kernels #35598
- File RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu modified in PR(s): Improve various Patatrack Kernels #35598
- File RecoLocalTracker/SiPixelClusterizer/plugins/gpuCalibPixel.h modified in PR(s): Improve various Patatrack Kernels #35598
- File RecoLocalTracker/SiPixelClusterizer/plugins/gpuClusterChargeCut.h modified in PR(s): Improve various Patatrack Kernels #35598, Fix for crash in gpuClusterChargeCut.h after "warning too many clusters ..." #35829
- File RecoLocalTracker/SiPixelClusterizer/plugins/gpuClustering.h modified in PR(s): Improve various Patatrack Kernels #35598, Use cooperative groups to populate Associations (Histograms) in Pixel Patatrack #35713
- File RecoLocalTracker/SiPixelRecHits/interface/pixelCPEforGPU.h modified in PR(s): Improve various Patatrack Kernels #35598
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc modified in PR(s): Improve various Patatrack Kernels #35598, Use cooperative groups to populate Associations (Histograms) in Pixel Patatrack #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsImpl.h modified in PR(s): Improve various Patatrack Kernels #35598, Use cooperative groups to populate Associations (Histograms) in Pixel Patatrack #35713
- File RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h modified in PR(s): Improve various Patatrack Kernels #35598

slava77 · 2021-10-26T16:01:01Z

@cmsbuild please test

to pick up cms-sw/cms-bot#1651 for reco comparisons

cmsbuild · 2021-10-26T20:08:16Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ecaae3/19962/summary.html
COMMIT: 58fd079
CMSSW: CMSSW_12_1_X_2021-10-26-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35835/19962/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19782
DQMHistoTests: Total failures: 16
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19766
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 41
DQMHistoTests: Total histograms compared: 2797338
DQMHistoTests: Total failures: 5
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 2797310
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.004 KiB( 40 files compared)
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 173 log files, 37 edm output root files, 41 DQM output files
TriggerResults: no differences found

slava77 · 2021-10-26T22:56:29Z

+reconstruction

for #35835 58fd079

code changes are essentially technical, in line with the PR description and the follow up review initially done in Improve various Patatrack Kernels #35598
jenkins tests pass and comparisons with the baseline show no relevant differences.
- DQM differences appearing in 11634.506 are apparently at numerical precision level and appear just in various profile histogram fits

ggovi · 2021-10-27T14:50:55Z

+db

makortel · 2021-10-27T15:27:25Z

+heterogeneous

@fwyzard said he could run timing measurements after signing.

cmsbuild · 2021-10-27T15:27:51Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2021-10-27T15:58:56Z

+1

fwyzard · 2021-10-28T15:08:31Z

RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsImpl.h

@@ -107,9 +107,9 @@ __global__ void kernel_checkOverflows(HitContainer const *foundNtuplets,
      printf("Tracks overflow %d in %d\n", idx, thisCell.layerPairId());
    if (thisCell.isKilled())
      atomicAdd(&c.nKilledCells, 1);
-    if (thisCell.unused())
+    if (!thisCell.unused())
      atomicAdd(&c.nEmptyCells, 1);


Shouldn't nEmptyCells be renamed to nUsedCells ?

I count now "used" because they are much less (so faster). Then in the report I do "1"-.

fwyzard · 2021-10-28T15:09:45Z

RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc

+  }
+
  if (params_.doStats_) {
    // counters (add flag???)
+    std::lock_guard guard(lock_stat);


Why do we need two separate scopes, instead of just one ?

fwyzard · 2021-10-28T15:10:36Z

RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc

+  // cuda atomics are NOT atomics on CPU so protect stat update with a mutex
+  // waiting for a more general solution (incuding multiple devices) to be proposed and implemented
+  std::mutex lock_stat;


In the cpu case, what is doing concurrent updates of the same stat ?

do you mean gpu?
On gpu atomics on device memory is used. Multiple GPUs will crash

on cpu (w/o mutex) was obviously producing wrong (lower) results (as expected)

mmusich

maybe for the next iterations, just to make code look less impenetrable.

mmusich · 2021-10-25T20:12:03Z

RecoLocalTracker/SiPixelClusterizer/plugins/gpuCalibPixel.h

+        float vcal = float(adc[i]) * gain - pedestal * gain;
+        if constexpr (isRun2) {
+          float conversionFactor = id[i] < 96 ? VCaltoElectronGain_L1 : VCaltoElectronGain;
+          float offset = id[i] < 96 ? VCaltoElectronOffset_L1 : VCaltoElectronOffset;


96 -> phase1PixelTopology::layerStart[1] ?

There is a issue open to fix this everywhere (phase1PixelTopology::layerStart[1] may not compile, or compile and produce wrong results)

why not compile and/or produce wrong results? Can you point me to the issue, (I might have missed it)

for record (thanks @VinInn ): #35370

mmusich · 2021-10-25T20:13:34Z

RecoLocalTracker/SiPixelClusterizer/plugins/gpuCalibPixel.h

+          float offset = id[i] < 96 ? VCaltoElectronOffset_L1 : VCaltoElectronOffset;
+          vcal = vcal * conversionFactor + offset;
+        }
+        adc[i] = std::max(100, int(vcal));


@cms-sw/trk-dpg-l2 would be nice if this 100 is taken from the same place as the other famous 100 in the regular clusterizer.

fwyzard · 2021-10-28T15:12:40Z

@VinInn did you run any checks of the performance or throughput with these changes ?

VinInn · 2021-10-28T16:18:28Z

@VinInn did you run any checks of the performance or throughput with these changes ?

before the last commit, yes.

VinInn and others added 14 commits October 5, 2021 14:30

avoid conversion to double

8a86782

simplify logic

2abf845

avoid conversion to double

ec82c77

avoid conversion to double

2f5a936

fix counters

92aa2a4

code format (done but forgotten)

df4460f

Merged ImprovePCC from repository VinInn with cms-merge-topic

7b435d6

try to new sync

ff2a9e8

rollback changes in CCC

38b75aa

remove comments

503fd78

fix quite obvious bug

ac23ced

Update RecoLocalTracker/SiPixelClusterizer/plugins/gpuClusterChargeCut.h

ea55c81

spelling Co-authored-by: Slava Krutelyov <slava77@gmail.com>

Merged ImprovePCC from repository VinInn with cms-merge-topic

3268720

fix blocksize to be larger than maxpix/maxiter

58fd079

cmsbuild added this to the CMSSW_12_1_X milestone Oct 25, 2021

cmsbuild added alca-pending code-checks-pending db-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Oct 25, 2021

VinInn mentioned this pull request Oct 25, 2021

Improve various Patatrack Kernels #35598

Closed

cmsbuild added tests-started and removed tests-pending labels Oct 25, 2021

VinInn commented Oct 25, 2021

View reviewed changes

cmsbuild removed the code-checks-pending label Oct 25, 2021

cmsbuild added the alca-approved label Oct 25, 2021

cmsbuild mentioned this pull request Oct 26, 2021

Fix for crash in gpuClusterChargeCut.h after "warning too many clusters ..." #35829

Closed

cmsbuild added tests-started and removed tests-approved labels Oct 26, 2021

cmsbuild added tests-approved and removed tests-started labels Oct 26, 2021

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Oct 26, 2021

cmsbuild added db-approved and removed db-pending labels Oct 27, 2021

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Oct 27, 2021

cmsbuild added orp-approved and removed orp-pending labels Oct 27, 2021

cmsbuild merged commit 69566d9 into cms-sw:master Oct 27, 2021

fwyzard reviewed Oct 28, 2021

View reviewed changes

mmusich reviewed Oct 28, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Various Patatrack kernels #35835

Improve Various Patatrack kernels #35835

VinInn commented Oct 25, 2021 •

edited

VinInn commented Oct 25, 2021

VinInn commented Oct 25, 2021

VinInn Oct 25, 2021

makortel Oct 26, 2021

VinInn Oct 27, 2021

cmsbuild commented Oct 25, 2021

slava77 commented Oct 26, 2021

cmsbuild commented Oct 26, 2021

slava77 commented Oct 26, 2021

ggovi commented Oct 27, 2021

makortel commented Oct 27, 2021

cmsbuild commented Oct 27, 2021

perrotta commented Oct 27, 2021

fwyzard Oct 28, 2021

VinInn Oct 28, 2021

fwyzard Oct 28, 2021

VinInn Oct 28, 2021

fwyzard Oct 28, 2021

VinInn Oct 28, 2021

VinInn Oct 28, 2021

mmusich left a comment

mmusich Oct 25, 2021

VinInn Oct 28, 2021

mmusich Oct 28, 2021

mmusich Oct 28, 2021

mmusich Oct 25, 2021

fwyzard commented Oct 28, 2021

VinInn commented Oct 28, 2021

Improve Various Patatrack kernels #35835

Improve Various Patatrack kernels #35835

Conversation

VinInn commented Oct 25, 2021 • edited

VinInn commented Oct 25, 2021

VinInn commented Oct 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbuild commented Oct 25, 2021

slava77 commented Oct 26, 2021

cmsbuild commented Oct 26, 2021

GPU Comparison Summary

Comparison Summary

slava77 commented Oct 26, 2021

ggovi commented Oct 27, 2021

makortel commented Oct 27, 2021

cmsbuild commented Oct 27, 2021

perrotta commented Oct 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmusich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Oct 28, 2021

VinInn commented Oct 28, 2021

VinInn commented Oct 25, 2021 •

edited