Update cuda-compatible-runtime to add kernel checks #7419

fwyzard · 2021-10-29T12:16:40Z

No description provided.

fwyzard · 2021-10-29T12:16:48Z

enable gpu

fwyzard · 2021-10-29T12:16:51Z

please test

cmsbuild · 2021-10-29T12:17:02Z

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_2_X/master.

@smuzaffar, @iarspider, @ddaina can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

fwyzard · 2021-10-29T12:27:02Z

please test for slc7_aarch64_gcc9

fwyzard · 2021-10-29T12:27:42Z

please test for cc8_amd64_gcc9

fwyzard · 2021-10-29T12:28:05Z

please test for slc7_amd64_gcc10

fwyzard · 2021-10-29T12:28:21Z

please test for slc7_ppc64le_gcc9

cmsbuild · 2021-10-29T12:37:33Z

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20048/summary.html
COMMIT: b382f86
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20048/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ i=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh
+ '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']'
+ rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build
+ mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build
+ /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build/cuda-compatible-runtime
gcc: error: test.cu: No such file or directory
gcc: warning: '-x c++' after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.tDbydv (%build)

fwyzard · 2021-10-29T12:39:29Z

please abort

fwyzard · 2021-10-29T12:45:54Z

please test

cmsbuild · 2021-10-29T12:46:10Z

Pull request #7419 was updated.

cmsbuild · 2021-10-29T13:05:06Z

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20056/summary.html
COMMIT: c90d2da
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20056/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']'
+ rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build
+ mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build
+ /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build/cuda-compatible-runtime
/cvmfs/cms-ib.cern.ch/nweek-02704/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.0EqUMr (%build)


RPM build errors:
Macro %rpmbuild_libdir defined but not used within scope

fwyzard · 2021-10-29T14:11:12Z

@gartung this should be the first part of the fix for the automatic checks, to support the environment at NERSC

fwyzard · 2021-10-29T14:13:47Z

please test

fwyzard · 2021-10-29T17:09:37Z

Thanks for the check!

cmsbuild · 2021-10-29T18:29:38Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20067/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20067/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19782
DQMHistoTests: Total failures: 7
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19775
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 42
DQMHistoTests: Total histograms compared: 2901440
DQMHistoTests: Total failures: 5
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 2901412
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.004 KiB( 41 files compared)
DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
Checked 177 log files, 37 edm output root files, 42 DQM output files
TriggerResults: no differences found

cmsbuild · 2021-11-01T08:11:05Z

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20137/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/cc8_amd64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20137/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

cmsbuild · 2021-11-01T10:25:13Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20135/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_ppc64le_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20135/install.sh to create a dev area with all the needed externals and cmssw changes.

cmsbuild · 2021-11-01T10:58:54Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20139/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_aarch64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20139/install.sh to create a dev area with all the needed externals and cmssw changes.

cmsbuild · 2021-11-01T14:33:29Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20136/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@cms-sw [git] Added extra perl provides to avoid system package deps #7417

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/git-merge-result

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 58442 differences found in the comparisons
DQMHistoTests: Total files compared: 42
DQMHistoTests: Total histograms compared: 2901890
DQMHistoTests: Total failures: 299962
DQMHistoTests: Total nulls: 71
DQMHistoTests: Total successes: 2601835
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.174 KiB( 41 files compared)
DQMHistoSizes: changed ( 10224.0 ): 0.117 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 250202.181 ): -0.533 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 25202.0 ): 0.246 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 177 log files, 37 edm output root files, 42 DQM output files
TriggerResults: found differences in 13 / 41 workflows

smuzaffar · 2021-11-01T14:49:57Z

cuda-compatible-runtime.spec

 rm -rf %{_builddir}/build && mkdir %{_builddir}/build
-gcc -std=c99 -O2 -Wall test.c -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs -l cudart_static -l cuda -ldl -lrt -pthread -static-libgcc -o %{_builddir}/build/cuda-compatible-runtime # || true
+$CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime # || true


@fwyzard , isn't it going to fail if gcc version not compatible with cuda?

@fwyzard ping

@fwyzard , isn't it going to fail if gcc version not compatible with cuda?

Yes, good point.
If that happens, there really is no version of the runtime we want to use...

@fwyzard , should we fix this so that it does not fail the build process? I mean in that case we should still create cuda-compatible-runtime which exit with non-zero code e.g.

if [ $(%{cuda_gcc_support}) = true ] ; then $CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime else echo "CUDA ${CUDA_VERSION} is not compatiable with GCC ${GCC_VERSION}" > %{_builddir}/build/cuda-compatible-runtime echo "false" >> %{_builddir}/build/cuda-compatible-runtime chmod +x %{_builddir}/build/cuda-compatible-runtime fi

@fwyzard , what do you think about https://github.com/cms-sw/cmsdist/pull/7419/files#r741837428 ?

sorry, I forgot :-(
let me see...

If it works, sounds like a good idea.
Can we actually use %{cuda_gcc_support} before CUDA has been set up ?

I guess I can make the changes and try...

In fact, the %install part was already setting up a symlink to /usr/bin/false is the build failed; but I like the idea of a more verbose message.

as an alternative - is there a way to use the system compiler for building this, instead of the one bundled with CMSSW ?

fwyzard · 2021-11-25T17:15:32Z

please test

cmsbuild · 2021-11-25T17:15:46Z

Pull request #7419 was updated.

fwyzard · 2021-11-25T17:15:54Z

do we have an architecture that does not support CUDA ?
maybe something with gcc12 ?

smuzaffar · 2021-11-25T17:30:20Z

no, currently we do not have any arch without cuda/gcc support. I have tested your change by forcing cuda command to fail and generate the script. All look good

>./test/cuda-compatible-runtime
>echo $?
1
>./test/cuda-compatible-runtime -h
Usage: ./test/cuda-compatible-runtime [-h|-v]

Options:
  -h        Print a help message and exits.
  -v        Be more verbose.
>./test/cuda-compatible-runtime -v
CUDA 11.4.2-0939a3504c82d9c20346029080003d72 is not compatible with GCC 9.3.0

smuzaffar · 2021-11-25T17:32:29Z

please test

cmsbuild · 2021-11-25T21:04:57Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/summary.html
COMMIT: bb1cc87
CMSSW: CMSSW_12_2_X_2021-11-25-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20771/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/git-merge-result

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19798
DQMHistoTests: Total failures: 15
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19783
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 42
DQMHistoTests: Total histograms compared: 3247745
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3247723
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
Checked 177 log files, 37 edm output root files, 42 DQM output files
TriggerResults: no differences found

smuzaffar · 2021-11-25T21:37:18Z

+externals

cmsbuild · 2021-11-25T21:37:37Z

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_2_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

fwyzard · 2021-11-25T21:57:52Z

@fwyzard <https://github.com/fwyzard> , when we see a message like CUDA runtime version 11.4, driver version 11.5, NVIDIA driver version 495.29.05 CUDA device 0: Tesla T4 (sm_75) i.e. CUDA device 0 does this mean we did not find any usable device?

No, it's fine, it means that it found one device. CUDA devices count from 0.

…

cmsbuild added externals-pending orp-pending pending-signatures tests-started labels Oct 29, 2021

cmsbuild added tests-rejected and removed tests-started labels Oct 29, 2021

cmsbuild added tests-pending and removed tests-rejected labels Oct 29, 2021

fwyzard force-pushed the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch from b382f86 to c90d2da Compare October 29, 2021 12:45

cmsbuild added tests-started and removed tests-pending labels Oct 29, 2021

cmsbuild added tests-rejected and removed tests-started labels Oct 29, 2021

Update cuda-compatible-runtime to add kernel checks

47863b5

fwyzard force-pushed the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch from c90d2da to 47863b5 Compare October 29, 2021 14:13

cmsbuild removed the tests-rejected label Oct 29, 2021

cmsbuild added tests-approved and removed tests-started labels Oct 29, 2021

smuzaffar reviewed Nov 1, 2021

View reviewed changes

Provide a more informative fallback if CUDA is not supported

bb1cc87

cmsbuild added tests-started and removed tests-approved labels Nov 25, 2021

cmsbuild added tests-approved and removed tests-started labels Nov 25, 2021

smuzaffar merged commit c890ac8 into cms-sw:IB/CMSSW_12_2_X/master Nov 25, 2021

cmsbuild added externals-approved fully-signed and removed externals-pending pending-signatures labels Nov 25, 2021

cmsbuild mentioned this pull request Nov 26, 2021

Update 00-nvidia-drivers to check if the CUDA devices can run a kernel #7473

Merged

fwyzard deleted the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch April 1, 2022 11:58

Update cuda-compatible-runtime to add kernel checks #7419

Update cuda-compatible-runtime to add kernel checks #7419

Conversation

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

cmsbuild commented Oct 29, 2021

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

cmsbuild commented Oct 29, 2021

External Build

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

cmsbuild commented Oct 29, 2021

cmsbuild commented Oct 29, 2021

External Build

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021

fwyzard commented Oct 29, 2021 via email

cmsbuild commented Oct 29, 2021

GPU Comparison Summary

Comparison Summary

cmsbuild commented Nov 1, 2021

External Build

cmsbuild commented Nov 1, 2021

cmsbuild commented Nov 1, 2021

cmsbuild commented Nov 1, 2021

Comparison Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Nov 25, 2021

cmsbuild commented Nov 25, 2021

fwyzard commented Nov 25, 2021

smuzaffar commented Nov 25, 2021 • edited

smuzaffar commented Nov 25, 2021

cmsbuild commented Nov 25, 2021

GPU Comparison Summary

Comparison Summary

smuzaffar commented Nov 25, 2021

cmsbuild commented Nov 25, 2021

fwyzard commented Nov 25, 2021 via email

smuzaffar commented Nov 25, 2021 •

edited