Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the use of cuDNN in ONNX on ARM for CentOS 8 #7278

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Sep 8, 2021

Enable the use of cuDNN on ARM for CentOS 8, which is supported starting from CUDA 11.1 and cuDNN 8.

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_1_X/master.

@smuzaffar, @mrodozov, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test for slc7_aarch64_gcc9

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18379/summary.html
COMMIT: d6919fb
CMSSW: CMSSW_12_1_X_2021-09-07-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7278/18379/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

File "./pkgtools/cmsBuild", line 3624, in installPackage
installRpm(pkg, pkg.options.bootstrap)
File "./pkgtools/cmsBuild", line 3372, in installRpm
raise RpmInstallFailed(pkg, output)
RpmInstallFailed: Failed to install package cudnn. Reason:
error: Failed dependencies:
	libm.so.6(GLIBC_2.27)(64bit) is needed by external+cudnn+8.2.2.26-3dd206c070363aea575b46880769a41a-1-1.aarch64

* The action "install-cms+cmssw-tool-conf+52.0-da6641f345fd780b756fd23aa3688455" was not completed successfully because The following dependencies could not complete:
build-cms+cmssw-tool-conf+52.0-da6641f345fd780b756fd23aa3688455
* The action "build-external+python_tools+3.0-aada8f84c922c2ade5f7326141640a88" was not completed successfully because The following dependencies could not complete:


@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test for cc8_aarch64_gcc9

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18378/summary.html
COMMIT: d6919fb
CMSSW: CMSSW_12_1_X_2021-09-07-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7278/18378/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 3001001
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3000973
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@fwyzard fwyzard force-pushed the IB/CMSSW_12_1_X/master_enable_cuDNN_for_ONNX_on_aarch64 branch from d6919fb to dca0e21 Compare September 8, 2021 12:05
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

Pull request #7278 was updated.

@fwyzard fwyzard changed the title Enable the use of cuDNN in ONNX on ARM Enable the use of cuDNN in ONNX on ARM for CentOS 8 Sep 8, 2021
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test for cc8_aarch64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 8, 2021

please test for slc7_aarch64_gcc9

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18397/summary.html
COMMIT: dca0e21
CMSSW: CMSSW_12_1_X_2021-09-08-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7278/18397/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 3001001
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3000979
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2021

-1

Failed Tests: RelVals AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18423/summary.html
COMMIT: dca0e21
CMSSW: CMSSW_12_1_X_2021-09-07-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7278/18423/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

  • 4.534.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step2_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log
  • 5.15.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log
  • 4.224.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log
Expand to see more relval errors ...

AddOn Tests

----- Begin Fatal Exception 09-Sep-2021 03:59:07 CEST-----------------------
An exception of category 'DictionaryNotFound' occurred while
   [0] Constructing the EventProcessor
   [1] Calling OutputModuleBase::keepThisBranch, checking dictionaries for kept types
Exception Message:
No data dictionary found for the following classes:

  trigger::TriggerEvent

Most likely each dictionary was never generated, but it may
be that it was generated in the wrong package. Please add
(or move) the specification '<class name="whatever"/>' to
the appropriate classes_def.xml file along with any other
information needed there. For example, if this class has any
transient members, you need to specify them in classes_def.xml.
Also include the class header in classes.h
----- End Fatal Exception -------------------------------------------------
  • fastsimcmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Thu Sep 9 03:58:58 2021-date Thu Sep 9 03:58:55 2021 s - exit: 256
  • fastsim1cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc_l1stage1 --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns : FAILED - time: date Thu Sep 9 03:58:59 2021-date Thu Sep 9 03:58:58 2021 s - exit: 256
Expand to see more addon errors ...

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18422/summary.html
COMMIT: dca0e21
CMSSW: CMSSW_12_1_X_2021-09-08-1100/cc8_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7278/18422/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18422/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a19f56/18422/git-merge-result

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 9, 2021

The slc7_aarch64 failures seem unrelated:

ModuleNotFoundError: No module named 'RecoHGCal.TICL.ticlSeedingRegionProducer_cfi'

@smuzaffar
Copy link
Contributor

+externals

@smuzaffar smuzaffar merged commit 49636ca into cms-sw:IB/CMSSW_12_1_X/master Sep 9, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 9, 2021

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_1_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants