Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hwloc to version 2.8.0 #7979

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jul 6, 2022

API changes:

  • Add HWLOC_TOPOLOGY_FLAG_NO_DISTANCES, _NO_MEMATTRS and _NO_CPUKINDS
    to reduce the overhead when unneeded.
  • Add separate Read/Write Bandwidth/Latency memory attributes and
    implement them on Linux.

Backends changes:

  • NUMA nodes may now have a subtype such as DRAM, HBM, SPM, or NVM
    on heterogeneous memory platforms on Linux.
    • Add DAXType and DAXParent attributes on Linux to tell where a
      DAX device or its corresponding NUMA node come from (SPM for
      Specific-Purpose or NVM for Non-Volatile Memory).
  • Detect heterogeneous caches in hybrid CPUs on MacOS X,
    thanks to Paul Bone for the help.
  • Max frequencies are not ignored in Linux cpukinds anymore (they were
    ignored in hwloc 2.7.0), but they may be slightly adjusted to avoid
    reporting hybrid CPUs because Intel Turbo Boost Max 3.0.
    • See the documentation of environment variable HWLOC_CPUKINDS_MAXFREQ.
  • Hardwire the PCI locality of HPE Cray EX235a nodes.

Tools:

  • lstopo and other tools may now load Linux and x86 cpuid topology files
    from a tarball.
  • lstopo may now replace the P# and L# index prefixes with custom strings
    thanks to --os-index-prefix and --logical-index-prefix options.

Misc:

  • Add --disable-readme to avoid regenerating the top-level hwloc README
    file from the documentation.

Other change:

  • Enable ROCm support
  • Rework CUDA support

API changes:
  - Add HWLOC_TOPOLOGY_FLAG_NO_DISTANCES, _NO_MEMATTRS and _NO_CPUKINDS
    to reduce the overhead when unneeded.
  - Add separate Read/Write Bandwidth/Latency memory attributes and
    implement them on Linux.

Backends changes:
  - NUMA nodes may now have a subtype such as DRAM, HBM, SPM, or NVM
    on heterogeneous memory platforms on Linux.
    - Add DAXType and DAXParent attributes on Linux to tell where a
      DAX device or its corresponding NUMA node come from (SPM for
      Specific-Purpose or NVM for Non-Volatile Memory).
  - Detect heterogeneous caches in hybrid CPUs on MacOS X,
    thanks to Paul Bone for the help.
  - Max frequencies are not ignored in Linux cpukinds anymore (they were
    ignored in hwloc 2.7.0), but they may be slightly adjusted to avoid
    reporting hybrid CPUs because Intel Turbo Boost Max 3.0.
    - See the documentation of environment variable HWLOC_CPUKINDS_MAXFREQ.
  - Hardwire the PCI locality of HPE Cray EX235a nodes.

Tools
  - lstopo and other tools may now load Linux and x86 cpuid topology files
    from a tarball.
  - lstopo may now replace the P# and L# index prefixes with custom strings
    thanks to --os-index-prefix and --logical-index-prefix options.

Misc
  - Add --disable-readme to avoid regenerating the top-level hwloc README
    file from the documentation.
@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 6, 2022

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 6, 2022

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_5_X/master.

@smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 6, 2022

please test for el8_amd64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 6, 2022

please test for el8_aarch64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 6, 2022

please test for slc7_amd64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 6, 2022

please test for el8_ppc64le_gcc10

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 6, 2022

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26023/summary.html
COMMIT: 1440ae1
CMSSW: CMSSW_12_5_X_2022-07-06-2300/el8_aarch64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26023/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ OSDIR=/cvmfs/patatrack.cern.ch/externals/aarch64/rhel8
+ '[' -d /cvmfs/patatrack.cern.ch/externals/aarch64/rhel8 ']'
+ BASEDIR=/cvmfs/patatrack.cern.ch/externals/aarch64/rhel8/amd/rocm-5.0.2
+ mkdir /data/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/95c215630c939706b0552e3eee38861c/opt/cmssw/el8_aarch64_gcc10/external/rocm/5.0.2-95c215630c939706b0552e3eee38861c/bin
+ test -d /cvmfs/patatrack.cern.ch/externals/aarch64/rhel8/amd/rocm-5.0.2/bin
error: Bad exit status from /data/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Xd95Tq (%install)


RPM build errors:
line 35: It's not recommended to have unversioned Obsoletes: Obsoletes: external+rocm+5.0.2-95c215630c939706b0552e3eee38861c
Bad exit status from /data/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Xd95Tq (%install)


@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 6, 2022

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26024/summary.html
COMMIT: 1440ae1
CMSSW: CMSSW_12_5_X_2022-07-06-2300/el8_ppc64le_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26024/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ OSDIR=/cvmfs/patatrack.cern.ch/externals/ppc64le/rhel8
+ '[' -d /cvmfs/patatrack.cern.ch/externals/ppc64le/rhel8 ']'
+ BASEDIR=/cvmfs/patatrack.cern.ch/externals/ppc64le/rhel8/amd/rocm-5.0.2
+ mkdir /scratch/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/95c215630c939706b0552e3eee38861c/opt/cmssw/el8_ppc64le_gcc10/external/rocm/5.0.2-95c215630c939706b0552e3eee38861c/bin
+ test -d /cvmfs/patatrack.cern.ch/externals/ppc64le/rhel8/amd/rocm-5.0.2/bin
error: Bad exit status from /scratch/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.wymOqy (%install)


RPM build errors:
line 35: It's not recommended to have unversioned Obsoletes: Obsoletes: external+rocm+5.0.2-95c215630c939706b0552e3eee38861c
Bad exit status from /scratch/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.wymOqy (%install)


hwloc.spec Outdated

BuildRequires: autotools
Requires: cuda libpciaccess libxml2 numactl
Requires: cuda libpciaccess libxml2 numactl rocm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard , rocm is only available for x86_64. So I guess this should be a conditional dependency i.e.

%ifarch x86_64
Requires: rocm
%endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and also the dependency on cuda should be confitional. I'll slowly make those changes.

--enable-libxml2 \
--disable-cairo \
--disable-doxygen \
--enable-plugins=cuda,nvml \
--with-cuda=$CUDA_ROOT \
--with-rocm=$ROCM_ROOT \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (only for x86_64)

Enable ROCm support, and rework CUDA support
@fwyzard fwyzard force-pushed the IB/CMSSW_12_5_X/master-hwlock-2.8.0 branch from 1440ae1 to e5ef1c5 Compare July 7, 2022 08:53
@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2022

please test for el8_ppc64le_gcc10

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2022

Pull request #7979 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2022

please test for el8_aarch64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2022

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2022

please test for el8_amd64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2022

please test for slc7_amd64_gcc10

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2022

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26045/summary.html
COMMIT: e5ef1c5
CMSSW: CMSSW_12_5_X_2022-07-06-2300/el8_aarch64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26045/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26045/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26045/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test DRNTest had ERRORS
---> test TestFWCoreServicesDriver had ERRORS
---> test testFWCoreUtilities had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2022

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26048/summary.html
COMMIT: e5ef1c5
CMSSW: CMSSW_12_5_X_2022-07-06-2300/el8_ppc64le_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26048/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26048/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26048/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test testONNXRuntime had ERRORS
---> test DRNTest had ERRORS
---> test testFWCoreUtilities had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26042/summary.html
COMMIT: e5ef1c5
CMSSW: CMSSW_12_5_X_2022-07-06-2300/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26042/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26042/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26042/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3654771
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3654741
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2022

+1

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 59591 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3654771
  • DQMHistoTests: Total failures: 309517
  • DQMHistoTests: Total nulls: 149
  • DQMHistoTests: Total successes: 3345083
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.016999999999999963 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): -0.352 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 11834.0 ): 0.464 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): -0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.012 KiB SiStrip/MechanicalView
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 14 / 49 workflows

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26044/summary.html
COMMIT: e5ef1c5
CMSSW: CMSSW_12_5_X_2022-07-06-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7979/26044/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26044/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a21c20/26044/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 68884 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3654771
  • DQMHistoTests: Total failures: 466546
  • DQMHistoTests: Total nulls: 373
  • DQMHistoTests: Total successes: 3187830
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.8119999999999998 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): -0.063 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 11834.0 ): -0.245 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): -0.006 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): 0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 7.3 ): 2.009 KiB SiStrip/MechanicalView
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 14 / 49 workflows

@smuzaffar
Copy link
Contributor

+externals
@fwyzard , this looks good to me. Do you want to perform any tests before we merge it?

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2022

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_5_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 8, 2022

@fwyzard , this looks good to me. Do you want to perform any tests before we merge it?

No, let's merge it.

@smuzaffar smuzaffar merged commit 1791b7a into cms-sw:IB/CMSSW_12_5_X/master Jul 8, 2022
@fwyzard fwyzard deleted the IB/CMSSW_12_5_X/master-hwlock-2.8.0 branch July 8, 2022 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants