New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cuda-compatible-runtime to add kernel checks #7419
Update cuda-compatible-runtime to add kernel checks #7419
Conversation
enable gpu |
please test |
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_2_X/master. @smuzaffar, @iarspider, @ddaina can you please review it and eventually sign? Thanks. |
please test for slc7_aarch64_gcc9 |
please test for cc8_amd64_gcc9 |
please test for slc7_amd64_gcc10 |
please test for slc7_ppc64le_gcc9 |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20048/summary.html External BuildI found compilation error when building: + i=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh + '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']' + rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build + mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build + /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build/cuda-compatible-runtime gcc: error: test.cu: No such file or directory gcc: warning: '-x c++' after last input file has no effect gcc: fatal error: no input files compilation terminated. error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.tDbydv (%build) |
please abort |
b382f86
to
c90d2da
Compare
please test |
Pull request #7419 was updated. |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20056/summary.html External BuildI found compilation error when building: + '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']' + rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build + mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build + /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build/cuda-compatible-runtime /cvmfs/cms-ib.cern.ch/nweek-02704/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: cannot find -lstdc++ collect2: error: ld returned 1 exit status error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.0EqUMr (%build) RPM build errors: Macro %rpmbuild_libdir defined but not used within scope |
@gartung this should be the first part of the fix for the automatic checks, to support the environment at NERSC |
c90d2da
to
47863b5
Compare
please test |
Thanks for the check!
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20067/summary.html GPU Comparison SummarySummary:
Comparison SummarySummary:
|
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20137/summary.html External BuildI found compilation warning when building: See details on the summary page. |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20135/summary.html |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20139/summary.html |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: Comparison SummarySummary:
|
cuda-compatible-runtime.spec
Outdated
rm -rf %{_builddir}/build && mkdir %{_builddir}/build | ||
gcc -std=c99 -O2 -Wall test.c -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs -l cudart_static -l cuda -ldl -lrt -pthread -static-libgcc -o %{_builddir}/build/cuda-compatible-runtime # || true | ||
$CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime # || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard , isn't it going to fail if gcc version not compatible with cuda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard , isn't it going to fail if gcc version not compatible with cuda?
Yes, good point.
If that happens, there really is no version of the runtime we want to use...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard , should we fix this so that it does not fail the build process? I mean in that case we should still create cuda-compatible-runtime which exit with non-zero code e.g.
if [ $(%{cuda_gcc_support}) = true ] ; then
$CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime
else
echo "CUDA ${CUDA_VERSION} is not compatiable with GCC ${GCC_VERSION}" > %{_builddir}/build/cuda-compatible-runtime
echo "false" >> %{_builddir}/build/cuda-compatible-runtime
chmod +x %{_builddir}/build/cuda-compatible-runtime
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard , what do you think about https://github.com/cms-sw/cmsdist/pull/7419/files#r741837428 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I forgot :-(
let me see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it works, sounds like a good idea.
Can we actually use %{cuda_gcc_support}
before CUDA has been set up ?
I guess I can make the changes and try...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the %install
part was already setting up a symlink to /usr/bin/false
is the build failed; but I like the idea of a more verbose message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an alternative - is there a way to use the system compiler for building this, instead of the one bundled with CMSSW ?
please test |
Pull request #7419 was updated. |
do we have an architecture that does not support CUDA ? |
no, currently we do not have any arch without cuda/gcc support. I have tested your change by forcing cuda command to fail and generate the script. All look good
|
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: GPU Comparison SummarySummary:
Comparison SummarySummary:
|
+externals |
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_2_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
@fwyzard <https://github.com/fwyzard> , when we see a message like
CUDA runtime version 11.4, driver version 11.5, NVIDIA driver version 495.29.05
CUDA device 0: Tesla T4 (sm_75)
i.e. CUDA device 0 does this mean we did not find any usable device?
No, it's fine, it means that it found one device.
CUDA devices count from 0.
… |
No description provided.