Skip to content

Build ROCM 7.1.0 to check tests on GPU #10600

Merged
akritkbehera merged 5 commits into
cms-sw:IB/CMSSW_17_0_X/rocmfrom
akritkbehera:IB_7.1.0_ROCM
Jun 3, 2026
Merged

Build ROCM 7.1.0 to check tests on GPU #10600
akritkbehera merged 5 commits into
cms-sw:IB/CMSSW_17_0_X/rocmfrom
akritkbehera:IB_7.1.0_ROCM

Conversation

@akritkbehera
Copy link
Copy Markdown
Contributor

No description provided.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 1, 2026

A new Pull Request was created by @akritkbehera for branch IB/CMSSW_17_0_X/rocm.

@akritkbehera, @cmsbuild, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 1, 2026

cms-bot internal usage

@akritkbehera
Copy link
Copy Markdown
Contributor Author

enable gpu

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please abort

@akritkbehera
Copy link
Copy Markdown
Contributor Author

enable gpu

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 1, 2026

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63daa1/53617/summary.html
COMMIT: 5f6067c
CMSSW: CMSSW_17_0_ROCM_X_2026-05-31-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10600/53617/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed External Build

I found compilation error when building:

libcudacxx_DIR:PATH=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/cuda/12.9.1-2f902b8cd69fc02665180a65ec16b3a4/lib64/cmake/libcudacxx
nlohmann_json_DIR:PATH=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/json/3.12.0-97f7be797298126e9adee032bbaec39f/share/cmake/nlohmann_json
pybind11_DIR:PATH=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/py3-pybind11/3.0.1-40a12f1b4fe2393aef934ccb44fb2efc/share/cmake/pybind11
rocprim_DIR:PATH=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/rocprim/rocm-7.1.0-8951effe5969e9fbd67ae983b5434da0/lib/cmake/rocprim
rocthrust_DIR:PATH=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/rocthrust/rocm-7.1.0-daff82ac6ea95b1872a079c529915cf8/lib/cmake/rocthrust
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.EjFLhh (%build)

RPM build warnings:
Macro expanded in comment on line 488: %{pkginstroot}/python




@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 2, 2026

Pull request #10600 was updated.

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 2, 2026

Pull request #10600 was updated.

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 2, 2026

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63daa1/53647/summary.html
COMMIT: 4b47232
CMSSW: CMSSW_17_0_ROCM_X_2026-06-01-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10600/53647/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Build

I found compilation error when building:

>> Leaving Package Utilities/RelMon
>> Package Utilities/RelMon built
Copying tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/SoACustomizedMethodsHip/libSoACustomizedMethodsHip_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/SoACustomizedMethodsHip/libSoACustomizedMethodsHip_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/SoACustomizedMethodsHip/libSoACustomizedMethodsHip_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/SoACustomizedMethodsHip/libSoACustomizedMethodsHip_rocm.a] Error 1
Copying tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc13/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a] Error 1
Copying tmp/el8_amd64_gcc13/src/DataFormats/TrivialSerialisation/test/TestDataFormatsTrivialSerialisationPortableROCmAsync/libTestDataFormatsTrivialSerialisationPortableROCmAsync_rocm.a to productstore area:


@akritkbehera
Copy link
Copy Markdown
Contributor Author

So ibamd_comgr.so.3 is back built to being build as a shared library the patches weren't enough. The rest errors are hip not being able to find the device library objects. Hmm

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 2, 2026

Pull request #10600 was updated.

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 3, 2026

Pull request #10600 was updated.

@akritkbehera
Copy link
Copy Markdown
Contributor Author

enable gpu

@akritkbehera
Copy link
Copy Markdown
Contributor Author

please test for CMSSW_17_0_ROCM_X

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jun 3, 2026

-1

Failed Tests: nvidia_h100UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63daa1/53670/summary.html
COMMIT: eb709b7
CMSSW: CMSSW_17_0_ROCM_X_2026-06-01-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10600/53670/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63daa1/53670/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63daa1/53670/git-merge-result

Comparison Summary

Summary:

  • You potentially added 215 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4199113
  • DQMHistoTests: Total failures: 30
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4199063
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

@akritkbehera
Copy link
Copy Markdown
Contributor Author

gpu relvals didn't run?

@akritkbehera akritkbehera changed the title [DO NOT MERGE] Build ROCM 7.1.0 to check tests on GPU Build ROCM 7.1.0 to check tests on GPU Jun 3, 2026
@akritkbehera akritkbehera merged commit 4515043 into cms-sw:IB/CMSSW_17_0_X/rocm Jun 3, 2026
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants