Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to CUDA 10.1 Update 1 (10.1.168), add support for Power (11.0.x backport) #5419

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Dec 12, 2019

Backport of #5418.

CUDA 10.1 Update 2 and later show a problem with running under MPS (NVIDIA Multi Process Server) or using CUDA Dynamic Parallelism within CMSSW.
While awaiting feedback from NVIDIA, the only solution seems to revert to the latest working version, which was 10.1 Update 1.

Drop the Nsight Compute and Nsigh System tools from the CUDA package, because they are released much more often as external packages.

Add support for CUDA on IBM Power architecture (ppc64le) on Linux.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_11_0_X/master.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 12, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3935/console Started: 2019/12/12 09:23

@smuzaffar
Copy link
Contributor

smuzaffar commented Dec 12, 2019

using following comment (as mentioned http://cms-sw.github.io/cms-bot-cmssw-cmds.html )

please test for <arch>

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

please test for slc7_ppc64le_gcc820

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

please test for slc7_aarch64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 12, 2019

The tests are being triggered in jenkins.
Test Parameters:

@fwyzard fwyzard changed the title Revert to CUDA 10.1 Update 1 (10.1.168), add support for Power Revert to CUDA 10.1 Update 1 (10.1.168), add support for Power (11.0.x backport) Dec 12, 2019
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

@smuzaffar is this

please test for slc7_ppc64le_gcc820

correct ?
There was not message about additional tests for it.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

enable gpu

@smuzaffar
Copy link
Contributor

@fwyzard , one needs to explicitly enable additional tests

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 12, 2019

@fwyzard , one needs to explicitly enable additional tests

I am sorry but I do not understand what that means, concretely.
Do you mean that I have to ask @cmsbot something different ?
Or do you mean that we do not support additional tests on slc7_ppc64le_gcc820 ?

If the latter, does it mean that @cmsbot will still run some basic tests ?
Or that it is not currently possible to run tests on slc7_ppc64le_gcc820 from a PR ?

Bottom line is, please test this PR for

  • Intel/AMD (x86_64) on cc7and cc8
  • Power (ppc64le) on cc7
  • ARM (aarch64) on cc7
  • with GPU tests when/where possible

Thank you!

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1693b/3935/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2793840
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2793497
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 15, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3981/console Started: 2019/12/15 16:37

@cmsbuild
Copy link
Contributor

+1
Tested at: 61f448f
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1693b/3981/summary.html
CMSSW: CMSSW_11_0_X_2019-12-15-0000
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1693b/3981/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2793840
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2793497
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@fabiocos
Copy link
Contributor

+1

backport looks consistent with the update in master

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_11_0_X/master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit dcaf2cd into cms-sw:IB/CMSSW_11_0_X/master Dec 17, 2019
@fwyzard fwyzard deleted the revert_cuda_10.1.168_cmssw_110x branch July 28, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants