Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous HGCAL RecHit Calibration #32683

Merged
merged 52 commits into from Mar 29, 2021
Merged

Conversation

bfonta
Copy link
Contributor

@bfonta bfonta commented Jan 19, 2021

Description:

Porting of the HGCAL RecHit calibration code (RecoLocalCalo/HGCalRecProducers/plugins/HGCalRecHitProducer.cc and related) to GPUs using CUDA.
The GPU version introduces a code split between the three subdetectors which comprise HGCAL, so that their respective modules can run independently. Each subdetector includes three modules:

  • UncalibRecHit-to-SoA conversion, host-to-device transfer and GPU kernel execution (RecoLocalCalo/HGCalRecProducers/plugins/{EE,HEF,HEB}RecHitGPU.cc)
  • Output SoA device-to-host transfer (RecoLocalCalo/HGCalRecProducers/plugins/{EE,HEF,HEB}RecHitGPUtoSoA.cc)
  • Output SoA conversion to RecHit class (RecoLocalCalo/HGCalRecProducers/plugins/{EE,HEF,HEB}RecHitFromSoA.cc)

The modules can be run with:

cmsRun RecoLocalCalo/HGCalRecProducers/test/HeterogeneousHGCalRecHit_cfg.py PU=0

where PU can also be 50, 100, 140 and 200.
The input data is currently available under /home/bfontana/ in patatrack02, in both ROOT compressed and uncompressed formats (uncompressed PU200 is missing due to lack of space).

Validation:

Code available in Validation/HGCalValidation/plugins/HGCalRecHitValidation.cc. It produces ROOT histograms with the CPU and GPU outputs, and their residuals. It can be run with

cmsRun Validation/HGCalValidation/test/HeterogeneousHGCalRecHitsValidator_cfg.py PU=0

Timing:

cmsRun RecoLocalCalo/HGCalRecProducers/test/HeterogeneousRecHitsTiming_cfg.py

where PU and which version to use (GPU or CPU) must be specified manually in the above python file.
Several measurements are available here, showing that both the CPU and the GPU versions are mostly I/O bound.

Tests:

The recipe described here was followed. Everything run properly except some of the tests (scram b runtests):

  • test runtestRecoEgammaElectronIdentification had ERRORS
  • test test_CreateFileLists had ERRORS
  • test testAlignmentOfflineValidation had ERRORS
  • test testPVPlotting had ERRORS
  • test testSSTGainPCL_fromRECO had ERRORS
  • test testSSTGainPCL_fromRECO had ERRORS
  • test checkMultiRunHarvestingOutput had ERRORS
  • ...

runTheMatrix.py -l limited -i all --ibeos was successful:

35 34 33 25 17 4 1 1 1 tests passed, 0 0 0 0 0 0 0 0 0 failed

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32683/20786

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@slava77
Copy link
Contributor

slava77 commented Jan 19, 2021

In #32683 (comment)

* Found files with invalid states:

it looks like the commit history includes a lot of unrelated files that were subsequently removed.
Please try to cleanup the history by squashing or removing commits with abandoned files

@felicepantaleo
Copy link
Contributor

@rovere fyi

@kpedro88
Copy link
Contributor

in addition to #32683 (comment):

  • please apply code-checks and code-format
  • please make sure to add any new HGCal packages to the upgrade category in cms-bot

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32683/20830

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @b-fontana for master.

It involves the following packages:

CUDADataFormats/HGCal
CondFormats/DataRecord
CondFormats/HGCalObjects
DataFormats/HGCRecHit
Geometry/HGCalCommonData
HeterogeneousCore/CUDACore
RecoLocalCalo/HGCalRecAlgos
RecoLocalCalo/HGCalRecProducers
Validation/HGCalValidation

The following packages do not have a category, yet:

CUDADataFormats/HGCal
CondFormats/HGCalObjects
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@andrius-k, @ianna, @kpedro88, @ggovi, @fioriNTU, @tlampen, @pohsun, @perrotta, @civanch, @yuanchao, @makortel, @ErnestaP, @cmsbuild, @fwyzard, @Dr15Jones, @cvuosalo, @mdhildreth, @jfernan2, @slava77, @jpata, @kmaeshima, @christopheralanwest, @srimanob can you please review it and eventually sign? Thanks.
@edjtscott, @vandreev11, @lecriste, @sethzenz, @apsallid, @makortel, @felicepantaleo, @rchatter, @rovere, @lgray, @cseez, @bsunanda, @pfs, @thomreis, @hatakeyamak, @tocheng, @mmusich, @argiro, @fabiocos, @seemasharmafnal, @clelange this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@perrotta
Copy link
Contributor

please test

@slava77
Copy link
Contributor

slava77 commented Jan 21, 2021

@b-fontana b-fontana force-pushed the b-fontana:SplitModules branch from 930fce9 to f74167d 1 hour ago

there is still a lot of code from UserCode and files starting with . in the name in the commit history.
Can this be cleaned up?

@perrotta
Copy link
Contributor

+1

  • HGCAL RecHit calibration code ported on GPU
  • Comparison between the GPU/CPU versions of the code have been provided by the author, as well as some timing measurement
  • Jenkins tests pass and show no differences on the CPU output wrt baseline

@jfernan2
Copy link
Contributor

+1

@cvuosalo
Copy link
Contributor

+1

@perrotta
Copy link
Contributor

@cms-sw/db-l2 @cms-sw/alca-l2 @cms-sw/db-l2 @cms-sw/upgrade-l2 do you have any comment?

@silviodonato
Copy link
Contributor

kind reminder @cms-sw/db-l2 @cms-sw/alca-l2 @cms-sw/upgrade-l2

@yuanchao
Copy link
Contributor

+1

@srimanob
Copy link
Contributor

+Upgrade

@ggovi
Copy link
Contributor

ggovi commented Mar 29, 2021

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 655b983 into cms-sw:master Mar 29, 2021
#include "CUDADataFormats/HGCal/interface/HGCConditions.h"
#include "Geometry/HGCalCommonData/interface/HGCalDDDConstants.h"
#include "Geometry/HGCalCommonData/interface/HGCalParameters.h"
#include "RecoLocalCalo/HGCalRecProducers/plugins/KernelManagerHGCalCellPositions.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @b-fontana @cms-sw/reconstruction-l2 @cms-sw/alca-l2 -including stuff from a plugin directory is not allowed in cmssw. (the change also seems contrary to the title of this PR..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks it could simply be dropped ?
at least, I don't see KernelManagerHGCalCellPositions used anywhere in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet