Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda time measurements #37141

Merged
merged 4 commits into from Mar 7, 2022
Merged

Conversation

AliinCern
Copy link
Contributor

What is the purpose of the program?

A device (GPU) in a local machine can read/write host (CPU) memory in different methods by using Cuda API. In this program, we measured these methods in a local machine in order to find the best approachable to memory with respect to time. The program has four directions in time measurements:

  1. From Host To Device.
  2. On the Device.
  3. On the Host.
  4. From Device to Host.

What methods are used in the Program?

With respect of First Direction Measurement:
Part 0) The Device reads from Host memory directly without using Cuda API.
Part 1) Measuring time to copy data from Host memory to Device memory using (cudaMemcpy).
Part 2) Measuring time to lock Host pages for Device to read/write data using (cudaHostRegister).
Part 3) Measuring time to copy from Host memory to Host page-lock using (cudaMemcpy).
Part 4) Measuring time to copy from Host memory to Host page-lock using (memcpy), then Measuring time to copy from Host
page-lock to Device memory using (cudaMemcpy).
Part 5) Measuring time to lock Host pages using (cudaHostRegister) and Measuring time to copy from Host page-lock to Device memory using (cudaMemcpy).

With respect of Second Direction Measurement:
All Parts) Measuring time operations that are done on the Device using (cudaEventRecord).

With respect of Third Direction Measurement:
All Parts) Measuring time operations that are done on the Device using Host time elapse.

With respect of Fourth Direction Measurement:
Part 0) The Device writes on Host memory directly without using Cuda API.
Part 1) Measuring time to copy data from Device memory to Host memory using (cudaMemcpy).
Part 2) Measuring time to unlock Host pages using (cudaHostUnregister).
Part 3) Measuring time to copy from Host page-lock to Host memory using (cudaMemcpy).
Part 4) Measuring time to copy from Device memory to Host page-lock using (cudaMemcpy), then Measuring time to copy from Host page-lock to Host memory using (memcpy).
Part 5) Measuring time to copy from Device memory to Host page-lock using (cudaMemcpy), then unlock Host pages using (cudaHostUnregister).

What are the command line Options to run the Program?

-p PARTS select which parts (methods) to run, for example -p 12345 will run all methods. Please note that to run Part 0, use
number 6 for example -p 6
-a COUNT repeat each set of tasks COUNT times, and compute the average.
-f to save the result to a file.

-q print the standard deviation of the measurements.
-t TASKS repeat the task on the device/gpu TASKS times.

Program Validation:

we ran the program using scram b runtests, and the result is this:
---> test cudaTimeMeasurement succeeded
TestTime:1
^^^^ End Test cudaTimeMeasurement ^^^^

…task on GPU, -q print the standerDeviation, -p part selection, -f to save the result on a file
The command line options are:
-a COUNT   repeat each set of tasks COUNT times, and compute the average.
-f         to save the result to a file.
-p PARTS   select which parts to run by order, for example -p 1 will run only part 1, while -p 123456 all parts.
-q         print the standard deviation of the measurements.
-t TASKS   repeat the task on the gpu TASKS times.
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37141/28677

  • This PR adds an extra 24KB to repository

  • Found files with invalid states:

    • HeterogeneousCore/CUDACore/test/cuda_check.h:

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

A new Pull Request was created by @AliinCern (Marafi) for master.

It involves the following packages:

  • HeterogeneousCore/CUDACore (heterogeneous)

@cmsbuild, @makortel, @fwyzard can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor

fwyzard commented Mar 4, 2022

enable gpu

@fwyzard
Copy link
Contributor

fwyzard commented Mar 4, 2022

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

-1

Failed Tests: UnitTests RelVals RelVals-GPU RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-169591/22833/summary.html
COMMIT: d9810c4
CMSSW: CMSSW_12_3_X_2022-03-03-2300/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test cudaTimeMeasurement had ERRORS

RelVals

----- Begin Fatal Exception 04-Mar-2022 12:46:30 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named step2_RAW2DIGI_L1Reco_RECO_DQM.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 04-Mar-2022 12:46:51 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named TTbar_8TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION_DQM.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 04-Mar-2022 12:46:55 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named ZEE_13TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION_DQM.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-GPU

  • 11634.52211634.522_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step3_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
  • 11634.51211634.512_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step3_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
  • 11634.50611634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step3_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log

RelVals-INPUT

  • 4.224.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log
  • 4.64.6_MinimumBias2010A+MinimumBias2010A+RECOSKIMALCA+HARVESTDR1/step2_MinimumBias2010A+MinimumBias2010A+RECOSKIMALCA+HARVESTDR1.log
  • 134.813134.813_RunCosmics2015C+RunCosmics2015C+RECOCOSDRUN2+ALCACOSDRUN2+HARVESTDCRUN2/step2_RunCosmics2015C+RunCosmics2015C+RECOCOSDRUN2+ALCACOSDRUN2+HARVESTDCRUN2.log
Expand to see more relval errors ...

AddOn Tests

----- Begin Fatal Exception 04-Mar-2022 12:46:02 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named TTbar_8TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-addon/addOnTests/fastsim


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 04-Mar-2022 12:46:05 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-addon/addOnTests/fastsim1


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 04-Mar-2022 12:46:10 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION.py
Exception Message:
 unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/beamHaloTaggerID/xgboostToTMVA_BHtagger.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22833/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/poison:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/src:/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-addon/addOnTests/fastsim2


At:
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(840): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Mixins.py(376): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(925): insertContentsInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Types.py(918): insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1143): _insertInto
  /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-2300/python/FWCore/ParameterSet/Config.py(1399): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

Copy link
Contributor

@fwyzard fwyzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test can only work if the machine has a GPU...
@AliinCern could you add cms::cudatest::requireDevices(); at the beginning of the main() functions ?

#include <thrust/device_vector.h>
#include <unistd.h>
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCheck.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "HeterogeneousCore/CUDAUtilities/interface/requireDevices.h"


void printResultEach(std::vector<Timing> &timing, int type, bool standerDeviationPrint);

int main(int argc, char *argv[]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int main(int argc, char *argv[]) {
int main(int argc, char *argv[]) {
cms::cudatest::requireDevices();

…vices.h"" and "cms::cudatest::requireDevices();"
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37141/28680

  • This PR adds an extra 24KB to repository

  • Found files with invalid states:

    • HeterogeneousCore/CUDACore/test/cuda_check.h:

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

Pull request #37141 was updated. @cmsbuild, @makortel, @fwyzard can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Mar 4, 2022

enable gpu

@@ -15,4 +15,11 @@
<use name="cuda"/>
</bin>

<bin name="cudaTimeMeasurement" file="cudaTimeMeasurement.cu">
<use name="cuda"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I forgot that you need to add here:

    <use name="HeterogeneousCore/CUDAUtilities"/>

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 5, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37141/28696

  • This PR adds an extra 12KB to repository

  • Found files with invalid states:

    • HeterogeneousCore/CUDACore/test/cuda_check.h:

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 5, 2022

Pull request #37141 was updated. @cmsbuild, @makortel, @fwyzard can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Mar 5, 2022

enable gpu

@fwyzard
Copy link
Contributor

fwyzard commented Mar 5, 2022

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-169591/22876/summary.html
COMMIT: 2b0b124
CMSSW: CMSSW_12_3_X_2022-03-05-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37141/22876/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19811
  • DQMHistoTests: Total failures: 2414
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 17397
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3987741
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3987705
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor

fwyzard commented Mar 6, 2022

+heterogeneous

@fwyzard
Copy link
Contributor

fwyzard commented Mar 6, 2022

@smuzaffar is there a way to request running the unit tests on the GPU machine ?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@smuzaffar
Copy link
Contributor

@smuzaffar is there a way to request running the unit tests on the GPU machine ?

No @fwyzard , currently there is no way to requests for unit tests to be run on GPU. As for PR tests, we are not deploying the cmssw/tmp directory under /cvmfs/cms-ci.cern.ch which means one needs to re-build on GPU machine to run tests. It is on my todo list to add support for unit tests for GPUs but will do when find some time

@perrotta
Copy link
Contributor

perrotta commented Mar 7, 2022

+1

  • It adds measurement tools in the test area
  • Verified working correctly by the author in a private test
  • Unit test not actually tested in bot because those unit tests do not run on GPU machines

@cmsbuild cmsbuild merged commit cce3982 into cms-sw:master Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants