Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to CUDA 10.0, and include CUDA on ARM builds #4815

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 27, 2019

For x86_64 (amd64), revert to CUDA 10.0.130

  • CUDA version 10.0.130 for amd64
    Support for gcc 7.x and clang 6.x
    Include Nsight Compute 1.0

See the release notes at https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html .

For aarch64 (ARMv8 64), update to JetPack 4.2 (L4T R32.1, CUDA 10.0.166)

  • Linux for Tegra L4T R32.1
    GPUDirect RDMA support on Jetson AGX Xavier

  • CUDA version 10.0.166 for ARMv8
    Support for gcc 7.x
    Include Nsight Compute 1.0

  • Build CUDA code for sm_72, found on the NVIDIA Jetson Xavier.

See the release notes at https://docs.nvidia.com/jetson/jetpack/release-notes/index.html .

For x86_64 (amd64), revert to CUDA 10.0.130

  * CUDA version 10.0.130 for amd64
    Support for gcc 7.x and clang 6.x
    Include Nsight Compute 1.0

See the release notes at https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html .

For aarch64 (ARMv8 64), update to JetPack 4.2 (L4T R32.1, CUDA 10.0.166)

  * Linux for Tegra L4T R32.1
    GPUDirect RDMA support on Jetson AGX Xavier

  * CUDA version 10.0.166 for ARMv8
    Support for gcc 7.x
    Include Nsight Compute 1.0

  * Build CUDA code for sm_72, found on the NVIDIA Jetson Xavier.

See the release notes at https://docs.nvidia.com/jetson/jetpack/release-notes/index.html .
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2019

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 27, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/33790/console

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_10_6_X/gcc700.

@cmsbuild, @smuzaffar, @gudrutis, @mrodozov can you please review it and eventually sign? Thanks.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2019

@smuzaffar @fabiocos as we discussed over email and at the ORP, here is the proposal to revert CUDA to 10.0.x for the gcc 7 builds, and include support for ARMv8.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@mrodozov
Copy link
Contributor

We tested the recipe on arm machine and it builds.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4815/33790/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 32
  • DQMHistoTests: Total histograms compared: 3114829
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3114631
  • DQMHistoTests: Total skipped: 197
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 31 files compared)
  • Checked 133 log files, 14 edm output root files, 32 DQM output files

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_10_6_X/gcc700 IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 85400b9 into cms-sw:IB/CMSSW_10_6_X/gcc700 Mar 28, 2019
@smuzaffar
Copy link
Contributor

@fwyzard , for aarch64 IB, we see unit tests failing with following error. Any idea why we are not coping these libraries from build/nvidia area?

test_calo_rechit: error while loading shared libraries: libnvrm_gpu.so: cannot open shared object file: No such file or directory

---> test test_calo_rechit had ERRORS

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 8, 2019

Is there some ARMv8 machine I can log in to ?
The one I have does not have cvmfs.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 8, 2019

Looks like we need to packages these as well:

    libnvrm.so
    libnvrm_gpu.so
    libnvrm_graphics.so
    libnvos.so

I will make a PR.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 9, 2019

Here they are:

@fwyzard fwyzard deleted the IB/CMSSW_10_6_X/gcc700_update_CUDA_aarch64 branch May 21, 2019 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants