Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROCm 5.0.2 for x86_64 #7795

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Apr 18, 2022

This PR aims to add AMD ROCm 5.x as an external for CMSSW, without including it in the CMSSW distribution.
Instead, the ROCm tools and libraries are used over CVMFS from the Patatrack repository, similarly to what is done for the Intel tools.

For more information, see the documentation at:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_4_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

This is currently a work in progress, in order to test the behaviour of the spec file on different architectures.
The next step will be the creation of the scram tool files.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

@cmsbuild, please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

@cmsbuild, please test for alma8_amd64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

@cmsbuild, please test for slc7_aarch64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

@cmsbuild, please test for slc7_ppc64le_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 18, 2022

@cmsbuild, please test for cs8_amd64_gcc10

@smuzaffar
Copy link
Contributor

@cmsbuild, please test for el8_amd64_gcc10

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/23993/summary.html
COMMIT: ad930c2
CMSSW: CMSSW_12_4_X_2022-04-18-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/23993/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3589937
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3589907
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

please test for el9_amd64_gcc11

@smuzaffar
Copy link
Contributor

@fwyzard , it adds up 17GB ( https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/23993/external-tools.html ) to cmssw distribution ... how about we make it an optional cmssw dependency e.g.

  • we can build and upload it
  • manually install it under cvmfs
  • build scram toolfiles to point to hardcoded cvmfs locations
  • if someone needs it then they can expplicitly setup the tools

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 19, 2022

@fwyzard , it adds up 17GB ( https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/23993/external-tools.html ) to cmssw distribution

That's unexpected - the new package should contain only a symlink, the full installation is under /cvmfs/patatrack.cern.ch .

@smuzaffar
Copy link
Contributor

ah ok, it could be the size check script is following symlinks. I will check

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 19, 2022

Also, it looks like at least for some architectures it didn't work as intended :-(

$ ll /cvmfs/cms-ci.cern.ch/week1/PR_4c298f88/slc7_amd64_gcc10/external/rocm/5.0.2-4d7f0a9aeb9846b05c15667ab5d01fa6/
total 1.0K
lrwxrwxrwx. 1 cvmfs cvmfs 66 Apr 18 22:45 '*' -> '/cvmfs/patatrack.cern.ch/externals/x86_64/unknown/amd/rocm-5.0.2/*'
drwxr-xr-x. 3 cvmfs cvmfs 23 Apr 18 22:45  etc

I have no idea why there is a * symlink instead of all the individual folders - it worked locally :-/

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 19, 2022

By the way, the plan would be

  • get started with symlinks to the patatrack CVMFS
  • set up all the scram tool files
  • add some tests to CMSSW
  • figure out what is needed and what could be left out
  • present the results, and decide whether to include ROCm in the CMSSW installation or not

There are also some practical issues with the installation itself: it is based on RPMs and does rely on the installation scripts, so I'm playing some fakeroot/fakechroot tricks to extract the content to an arbitrary location.
We can figure out later if there is a better way forward (e.g. build from source ?)

@smuzaffar
Copy link
Contributor

smuzaffar commented Apr 19, 2022

I have no idea why there is a * symlink instead of all the individual folders - it worked locally :-/

/cvmfs/patatrack.cern.ch cvmfs repository is not mounted on our cms-ib.cern.ch publisher that is why /cvmfs/patatrack.cern.ch/externals/x86_64/rhel7/amd path is not visible and ln created symlink for *. As target links are kind of hard-coded , so I would suggest to move the %post section contents under %install e.g.

%install
OSDIR=/cvmfs/patatrack.cern.ch/externals/%{_arch}/rhel%{rhel}
if ! [ -d $OSDIR ]; then
  OSDIR=/cvmfs/patatrack.cern.ch/externals/%{_arch}/unknown
fi
ln -s ${OSDIR}/amd/%{n}-%{realversion}/* %{i}/
test  -L %{i}/bin

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 19, 2022

Thanks, I'll make the change and re-run the tests.

@fwyzard fwyzard force-pushed the IB/CMSSW_12_4_X/master-ROCm-5.0.2 branch from ad930c2 to b684471 Compare April 19, 2022 12:51
@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 19, 2022

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24096/summary.html
COMMIT: e7ba230
CMSSW: CMSSW_12_4_X_2022-04-21-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/24096/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-1aad47/39434.75_TTbar_14TeV+2026D88_HLT75e33+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HLT75e33+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3695404
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@fwyzard fwyzard force-pushed the IB/CMSSW_12_4_X/master-ROCm-5.0.2 branch from e7ba230 to 1de8c41 Compare April 21, 2022 17:19
@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 21, 2022

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

Pull request #7795 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 21, 2022

@cmsbuild, please test for el8_aarch64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 21, 2022

@cmsbuild, please test for el8_amd64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 21, 2022

@cmsbuild, please test for el8_ppc64le_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 21, 2022

@cmsbuild, please test for el9_amd64_gcc11

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24103/summary.html
COMMIT: 1de8c41
CMSSW: CMSSW_12_4_X_2022-04-20-2300/el8_ppc64le_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/24103/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24103/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24103/git-merge-result

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24104/summary.html
COMMIT: 1de8c41
CMSSW: CMSSW_12_4_X_2022-04-20-2300/el8_aarch64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/24104/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24104/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24104/git-merge-result

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_4_X/master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24100/summary.html
COMMIT: 1de8c41
CMSSW: CMSSW_12_4_X_2022-04-21-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/24100/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24100/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24100/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-1aad47/39434.75_TTbar_14TeV+2026D88_HLT75e33+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HLT75e33+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3695410
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@fwyzard fwyzard deleted the IB/CMSSW_12_4_X/master-ROCm-5.0.2 branch April 21, 2022 22:07
@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24102/summary.html
COMMIT: 1de8c41
CMSSW: CMSSW_12_4_X_2022-04-20-2300/el9_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7795/24102/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24102/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1aad47/24102/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-1aad47/39434.75_TTbar_14TeV+2026D88_HLT75e33+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HLT75e33+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 66048 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 473930
  • DQMHistoTests: Total nulls: 348
  • DQMHistoTests: Total successes: 3221134
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.6520000000000001 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.975 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 11834.0 ): 0.996 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 136.874 ): 0.004 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 25202.0 ): -0.054 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 7.3 ): -0.269 KiB SiStrip/MechanicalView
  • Checked 205 log files, 45 edm output root files, 49 DQM output files

@cmsbuild
Copy link
Contributor

+1

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-1aad47/39434.75_TTbar_14TeV+2026D88_HLT75e33+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HLT75e33+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 62744 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 463529
  • DQMHistoTests: Total nulls: 381
  • DQMHistoTests: Total successes: 3231502
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.31099999999999994 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.063 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 11834.0 ): 2.372 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): 0.006 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 7.3 ): -2.009 KiB SiStrip/MechanicalView
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 14 / 48 workflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants