New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to CUDA 11.1 #6267
Update to CUDA 11.1 #6267
Conversation
@cmsbuild, please test |
The tests are being triggered in jenkins.
|
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_11_2_X/master. @cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks. |
-1 Tested at: d32501c
I found compilation error when building: + ln -s ../compute-sanitizer/compute-sanitizer /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/ab0db370d574467e557d482f013ac8dc/opt/cmssw/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/bin/compute-sanitizer + mv /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/build/nvvm /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/ab0db370d574467e557d482f013ac8dc/opt/cmssw/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/ + mv /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/build/EULA.txt /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/ab0db370d574467e557d482f013ac8dc/opt/cmssw/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/ + mv /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/build/version.txt /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/ab0db370d574467e557d482f013ac8dc/opt/cmssw/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/ mv: cannot stat '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc820/external/cuda/11.1.0-f65abf/build/version.txt': No such file or directory error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.KtHgB2 (%install) RPM build errors: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.KtHgB2 (%install) You can see the results of the tests here: |
Update to CUDA 11.1: * CUDA version 11.1.74 * NVIDIA drivers version 455.23.05 From the release notes: - add support for GCC 10 and clang 10 - support multi-threaded launch to different CUDA streams - improve MPS error handling when using multiple GPUs - various bug fixes See https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html .
d32501c
to
7ccaf24
Compare
Pull request #6267 was updated. |
please test
|
The tests are being triggered in jenkins.
|
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
The tests are being triggered in jenkins.
|
-1 Tested at: 5f70e0d CMSSW: CMSSW_11_2_X_2020-09-27-0000 I found follow errors while testing this PR Failed tests: UnitTests
I found errors in the following unit tests: ---> test TestCUDATest had ERRORS |
Comparison job queued. |
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
-1 Tested at: 5f70e0d CMSSW: CMSSW_11_2_X_2020-09-27-0000 I found follow errors while testing this PR Failed tests: UnitTests
I found errors in the following unit tests: ---> test TestCUDATest had ERRORS |
Comparison job queued. |
@fwyzard , something wrong with ppc64le cuda distribution. Although we should not be using cuda from system now ( Pr tests properly use set [a] ) but still unit tests fails/crash the same way [a]
|
Is there a way to build the external (so I can test it exactly as it will be) without including it in the IBs (so they don't get broken) ? |
On power machine you can run
this should build the externals, create a CMSSW dev area and setup the new tools. Anyway, I have done it and externals are available under ibmminsky-1:/scratch/d/externals . In case you want to test the new cuda then setup /scratch/d/externals/slc7_ppc64le_gcc820/external/cuda-toolfile/2.1-cms/etc/scram.d/*.xml in your cmssw dev area. |
Thanks, will try to have a look later today. |
@smuzaffar I suspect the problem is actually with the Minsky 1 machine:
The third GPU (number 2) is in ERR! state, and when I try to run simple CUDA jobs they hang. I'll try again on Minsky 2... |
On Minsky 2 I can build and run applications using CUDA 11.1:
|
@smuzaffar @silviodonato I'd suggest to merge this PR so we can have CUDA 11.1 in pre7 ? |
OK merged now. I have disabled Minsky 1 in jenkins and will ask Openlab team to look in to this issue. |
Update to CUDA 11.1:
From the release notes:
See https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html .