-
Notifications
You must be signed in to change notification settings - Fork 207
Backport CUDA updates from the 12.1.x branch #7340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport CUDA updates from the 12.1.x branch #7340
Conversation
Update to CUDA 11.4.2 (SDK 11.4.20210830): * CUDA runtime version 11.4.108 * NVIDIA drivers version 470.57.02 Add support for GCC 11 and clang 12. See https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html .
|
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_0_X/master. @cmsbuild, @smuzaffar, @iarspider, @mrodozov can you please review it and eventually sign? Thanks. |
|
enable gpu |
|
please test |
|
please test for cc8_aarch64_gcc9 |
|
please test for cc8_amd64_gcc9 |
|
@smuzaffar @perrotta @qliphy this PR should backport all the CUDA related changes from CMSSW 12.1.x to 12.0.x. Please let me know if you think this is OK, or if I should make a more limited backport of the minimal changes required to fix the issue mentioned yesterday (and discussed on this GGUS ticket). |
|
@fwyzard , we also need new build rules with update nvidia runtime hook to go with this ... right? |
|
Yes - but the old rules should still work with this update, so we can test in two separate steps. |
|
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b61c8f/19222/summary.html External BuildI found compilation error when building: + for FILE in '$FILES' ++ basename src/common.cpp + /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -DALPAKA_ACC_GPU_CUDA_ENABLED -DCUPLA_STREAM_ASYNC_ENABLED=1 -DALPAKA_DEBUG=0 -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/tbb/v2021.2.0-fcaf3e8d37e2c0c2807c93f2e5bba226/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/boost/1.75.0-5ba0079faea30e2a96d0dd57a4ddb60f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/alpaka/0.6.0-e2249872940140983aaf1c217659a754/include -Iinclude -std=c++17 -O3 --generate-line-info --source-in-ptx --display-error-number --expt-relaxed-constexpr --extended-lambda -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --cudart shared -Xcompiler '-std=c++17 -O2 -pthread -fPIC -Wall -Wextra' -x cu -c src/common.cpp -o build/cuda/common.cpp.o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/alpaka/0.6.0-e2249872940140983aaf1c217659a754/include/alpaka/event/EventGenericThreads.hpp: In instantiation of 'void alpaka::traits::generic::currentThreadWaitForDevice(const TDev&) [with TDev = alpaka::DevCpu]': /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/alpaka/0.6.0-e2249872940140983aaf1c217659a754/include/alpaka/dev/cpu/Wait.hpp:33:40: required from here /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/alpaka/0.6.0-e2249872940140983aaf1c217659a754/include/alpaka/event/EventGenericThreads.hpp:280:20: error: '__T30' was not declared in this scope 280 | auto vQueues(dev.getAllQueues()); | ~~~^~~~~~~~~~~~~~~~~~~~~ error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.cnlSom (%build) |
|
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b61c8f/19223/summary.html External BuildI found compilation error when building: + for FILE in $FILES ++ basename src/common.cpp + /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/cuda/11.4.2-41ceec0a69aac72c010538b7cd374b9a/bin/nvcc -DALPAKA_ACC_GPU_CUDA_ENABLED -DCUPLA_STREAM_ASYNC_ENABLED=1 -DALPAKA_DEBUG=0 -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/cuda/11.4.2-41ceec0a69aac72c010538b7cd374b9a/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/tbb/v2021.2.0-ea64429748bfcab7118ee07caf0ec8a1/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/boost/1.75.0-a01e7a4f514707c75c38f8f8cc6f5b30/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/alpaka/0.6.0-7658d74aec51e7ed2677aa0655a7c1cb/include -Iinclude -std=c++17 -O3 --generate-line-info --source-in-ptx --display-error-number --expt-relaxed-constexpr --extended-lambda -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --cudart shared -Xcompiler '-std=c++17 -O2 -pthread -fPIC -Wall -Wextra' -x cu -c src/common.cpp -o build/cuda/common.cpp.o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/alpaka/0.6.0-7658d74aec51e7ed2677aa0655a7c1cb/include/alpaka/event/EventGenericThreads.hpp: In instantiation of 'void alpaka::traits::generic::currentThreadWaitForDevice(const TDev&) [with TDev = alpaka::DevCpu]': /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/alpaka/0.6.0-7658d74aec51e7ed2677aa0655a7c1cb/include/alpaka/dev/cpu/Wait.hpp:33:40: required from here /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/alpaka/0.6.0-7658d74aec51e7ed2677aa0655a7c1cb/include/alpaka/event/EventGenericThreads.hpp:280:20: error: '__T30' was not declared in this scope 280 | auto vQueues(dev.getAllQueues()); | ~~~^~~~~~~~~~~~~~~~~~~~~ error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.fbHwdb (%build) |
|
I forgot that we need the |
|
please test |
|
please test for cc8_amd64_gcc9 |
|
Pull request #7340 was updated. |
|
please test for cc8_aarch64_gcc9 |
|
-1 Failed Tests: Build BuildI found compilation error when building: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/Core/Solve.h(72): warning #20011-D: calling a __host__ function("Eigen::PartialPivLU< ::Eigen::Matrix > ::cols() const") from a __host__ __device__ function("Eigen::Solve< ::Eigen::PartialPivLU< ::Eigen::Matrix > , ::Eigen::CwiseNullaryOp< ::Eigen::internal::scalar_identity_op , ::Eigen::Matrix > > ::rows const") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(412): warning #20011-D: calling a __host__ function("Eigen::internal::FixedInt<(int)-1> ::operator ()(int) const") from a __host__ __device__ function("Eigen::internal::partial_lu_impl ::unblocked_lu") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(412): error: identifier "Eigen::fix<(int)-1> " is undefined in device code
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(422): warning #20011-D: calling a __host__ function("Eigen::internal::FixedInt<(int)-1> ::operator ()(int) const") from a __host__ __device__ function("Eigen::internal::partial_lu_impl ::unblocked_lu") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(422): error: identifier "Eigen::fix<(int)-1> " is undefined in device code
|
|
-1 Failed Tests: Build BuildI found compilation error when building: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/Core/Solve.h(72): warning #20011-D: calling a __host__ function("Eigen::PartialPivLU< ::Eigen::Matrix > ::cols() const") from a __host__ __device__ function("Eigen::Solve< ::Eigen::PartialPivLU< ::Eigen::Matrix > , ::Eigen::CwiseNullaryOp< ::Eigen::internal::scalar_identity_op , ::Eigen::Matrix > > ::rows const") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(412): warning #20011-D: calling a __host__ function("Eigen::internal::FixedInt<(int)-1> ::operator ()(int) const") from a __host__ __device__ function("Eigen::internal::partial_lu_impl ::unblocked_lu") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(412): error: identifier "Eigen::fix<(int)-1> " is undefined in device code
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(422): warning #20011-D: calling a __host__ function("Eigen::internal::FixedInt<(int)-1> ::operator ()(int) const") from a __host__ __device__ function("Eigen::internal::partial_lu_impl ::unblocked_lu") is not allowed
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/cc8_amd64_gcc9/external/eigen/f612df273689a19d25b45ca4f8269463207c4fee/include/eigen3/Eigen/src/LU/PartialPivLU.h(422): error: identifier "Eigen::fix<(int)-1> " is undefined in device code
|
|
... and the update to Eigen ...
|
|
Pull request #7340 was updated. |
|
please test |
|
OK, this is starting to accumulate too many changes for a simple backport - and it may need to bring in also TensorFlow and who knows what else... I've made a minimal backport at #7346 . |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b61c8f/19255/summary.html GPU Comparison SummarySummary:
Comparison SummarySummary:
|
|
(sorry, wrong PR) |
Update to CUDA 11.4.2 (SDK 11.4.20210830):
Update cuDNN to v8.2.2.26 for CUDA 11.4:
Update the CUDA external packaging:
Add the cuda-compatible-runtime test as a new external.
Update Eigen.
Backport #7257, #7277, #7278, #7279 et al.