Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cuda-compatible-runtime to add kernel checks #7419

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Oct 29, 2021

No description provided.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_2_X/master.

@smuzaffar, @iarspider, @ddaina can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test for slc7_aarch64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test for cc8_amd64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test for slc7_amd64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test for slc7_ppc64le_gcc9

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20048/summary.html
COMMIT: b382f86
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20048/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ i=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh
+ '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']'
+ rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build
+ mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build
+ /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-90ef0bf2ef0f7ff887bd46cd16972023/build/cuda-compatible-runtime
gcc: error: test.cu: No such file or directory
gcc: warning: '-x c++' after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.tDbydv (%build)



@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please abort

@fwyzard fwyzard force-pushed the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch from b382f86 to c90d2da Compare October 29, 2021 12:45
@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test

@cmsbuild
Copy link
Contributor

Pull request #7419 was updated.

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20056/summary.html
COMMIT: c90d2da
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20056/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ '[' -f /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/./etc/profile.d/init.sh ']'
+ rm -rf /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build
+ mkdir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build
+ /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/bin/nvcc -std=c++17 -O2 -g -gencode 'arch=compute_60,code=[sm_60,compute_60]' -gencode 'arch=compute_70,code=[sm_70,compute_70]' -gencode 'arch=compute_75,code=[sm_75,compute_75]' -Wno-deprecated-gpu-targets test.cu -I /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/include -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64 -L /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/cuda/11.4.2-0939a3504c82d9c20346029080003d72/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread -static-libgcc -static-libstdc++' -o /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/cuda-compatible-runtime/1.0-00d51264154eaa7fcd67151fa4e988f5/build/cuda-compatible-runtime
/cvmfs/cms-ib.cern.ch/nweek-02704/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.0EqUMr (%build)


RPM build errors:
Macro %rpmbuild_libdir defined but not used within scope


@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

@gartung this should be the first part of the fix for the automatic checks, to support the environment at NERSC

@fwyzard fwyzard force-pushed the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch from c90d2da to 47863b5 Compare October 29, 2021 14:13
@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 29, 2021 via email

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20067/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-28-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20067/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19782
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19775
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2901412
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2021

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20137/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/cc8_amd64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20137/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20135/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_ppc64le_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20135/install.sh to create a dev area with all the needed externals and cmssw changes.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20139/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_aarch64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20139/install.sh to create a dev area with all the needed externals and cmssw changes.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/summary.html
COMMIT: 47863b5
CMSSW: CMSSW_12_2_X_2021-10-31-2300/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7419/20136/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20136/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 58442 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901890
  • DQMHistoTests: Total failures: 299962
  • DQMHistoTests: Total nulls: 71
  • DQMHistoTests: Total successes: 2601835
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.174 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): -0.533 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): 0.246 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: found differences in 13 / 41 workflows

rm -rf %{_builddir}/build && mkdir %{_builddir}/build
gcc -std=c99 -O2 -Wall test.c -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs -l cudart_static -l cuda -ldl -lrt -pthread -static-libgcc -o %{_builddir}/build/cuda-compatible-runtime # || true
$CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime # || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard , isn't it going to fail if gcc version not compatible with cuda?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard , isn't it going to fail if gcc version not compatible with cuda?

Yes, good point.
If that happens, there really is no version of the runtime we want to use...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard , should we fix this so that it does not fail the build process? I mean in that case we should still create cuda-compatible-runtime which exit with non-zero code e.g.

if [ $(%{cuda_gcc_support}) = true ] ; then
  $CUDA_ROOT/bin/nvcc %{nvcc_stdcxx} -O2 -g %{cuda_flags_4} test.cu -I $CUDA_ROOT/include -L $CUDA_ROOT/lib64 -L $CUDA_ROOT/lib64/stubs --cudart static -ldl -lrt --compiler-options '-Wall -pthread' -o %{_builddir}/build/cuda-compatible-runtime
else
  echo "CUDA ${CUDA_VERSION} is not compatiable with GCC ${GCC_VERSION}" > %{_builddir}/build/cuda-compatible-runtime
  echo "false" >> %{_builddir}/build/cuda-compatible-runtime
  chmod +x %{_builddir}/build/cuda-compatible-runtime
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I forgot :-(
let me see...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works, sounds like a good idea.
Can we actually use %{cuda_gcc_support} before CUDA has been set up ?

I guess I can make the changes and try...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the %install part was already setting up a symlink to /usr/bin/false is the build failed; but I like the idea of a more verbose message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an alternative - is there a way to use the system compiler for building this, instead of the one bundled with CMSSW ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 25, 2021

please test

@cmsbuild
Copy link
Contributor

Pull request #7419 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 25, 2021

do we have an architecture that does not support CUDA ?
maybe something with gcc12 ?

@smuzaffar
Copy link
Contributor

smuzaffar commented Nov 25, 2021

no, currently we do not have any arch without cuda/gcc support. I have tested your change by forcing cuda command to fail and generate the script. All look good

>./test/cuda-compatible-runtime
>echo $?
1
>./test/cuda-compatible-runtime -h
Usage: ./test/cuda-compatible-runtime [-h|-v]

Options:
  -h        Print a help message and exits.
  -v        Be more verbose.
>./test/cuda-compatible-runtime -v
CUDA 11.4.2-0939a3504c82d9c20346029080003d72 is not compatible with GCC 9.3.0

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/summary.html
COMMIT: bb1cc87
CMSSW: CMSSW_12_2_X_2021-11-25-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7419/20771/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49f516/20771/git-merge-result

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19798
  • DQMHistoTests: Total failures: 15
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19783
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 3247745
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3247723
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_2_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 25, 2021 via email

@fwyzard fwyzard deleted the IB/CMSSW_12_1_X/master_update_cuda-compatible-runtime branch April 1, 2022 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants