Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UCX libraries to v1.10.1, and link to the RDMA core libraries #7170

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jul 26, 2021

Add the Linux RDMA Core Userspace Libraries and Daemons, v36.0

These are the userspace components for the Linux Kernel's drivers and infiniband subsystem. Specifically this contains the userspace libraries for the following device nodes:

  • /dev/infiniband/uverbsX (libibverbs)
  • /dev/infiniband/rdma_cm (librdmacm)
  • /dev/infiniband/umadX (libibumad)

See https://github.com/linux-rdma/rdma-core/blob/master/README.md for more details.

Update the UCX libraries to v1.10.1, and link to the RDMA core libraries

For the change log since v1.9.0, see

Use the RDMA core libraries to implement support for Infiniband verbs (libibverbs) and RDMA Connection Manager (librdmacm).

These are the userspace components for the Linux Kernel's drivers
and infiniband subsystem. Specifically this contains the userspace
libraries for the following device nodes:
  - /dev/infiniband/uverbsX (libibverbs)
  - /dev/infiniband/rdma_cm (librdmacm)
  - /dev/infiniband/umadX   (libibumad)

See https://github.com/linux-rdma/rdma-core/blob/master/README.md for
more details.
For the change log since v1.9.0, see
  - https://github.com/openucx/ucx/releases/tag/v1.10.0
  - https://github.com/openucx/ucx/releases/tag/v1.10.1

Use the RDMA core libraries to implement support for Infiniband verbs
(libibverbs) and RDMA Connection Manager (librdmacm).
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_0_X/master.

@cmsbuild, @smuzaffar, @mrodozov, @iarspider can you please review it and eventually sign? Thanks.
@silviodonato, @dpiparo, @qliphy, @perrotta you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 26, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 26, 2021

please test for slc7_aarch64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 26, 2021

please test for CMSSW_12_0_X/slc7_ppc64le_gcc9

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17218/summary.html
COMMIT: f093b97
CMSSW: CMSSW_12_0_X_2021-07-26-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7170/17218/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17218/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17218/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test materialBudgetTrackerPlots had ERRORS
---> test materialBudgetHGCalPlots had ERRORS

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2998528
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@smuzaffar smuzaffar changed the base branch from IB/CMSSW_12_0_X/master to IB/CMSSW_12_1_X/master July 30, 2021 07:43
@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17368/summary.html
COMMIT: f093b97
CMSSW: CMSSW_12_1_X_2021-07-30-0900/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7170/17368/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17368/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17368/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2998535
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

please test for CMSSW_12_1_X/slc7_ppc64le_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 31, 2021

@smuzaffar why for slc7_ppc64le_gcc9 we need to specify also the CMSSW branch ?

@smuzaffar
Copy link
Contributor

@fwyzard , there are two IBs, 12.1.X and 12.1.DBG.X, for ppc64le and bot does not know which one to use that is why one need to explicitly mention the IB name for ppc64le

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17385/summary.html
COMMIT: f093b97
CMSSW: CMSSW_12_1_X_2021-07-30-2300/slc7_ppc64le_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7170/17385/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test test_PrepareInputDb had ERRORS
---> test test_MpsWorkFlow had ERRORS
---> test TestHeterogeneousCoreSonicTritonProducerGPU had ERRORS
---> test testAlignmentOfflineValidation had ERRORS
and more ...

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

1 similar comment
@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2021

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6b7893/17402/summary.html
COMMIT: f093b97
CMSSW: CMSSW_12_1_X_2021-08-01-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7170/17402/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreConcurrency had ERRORS
---> test testFWCoreUtilities had ERRORS
---> test TestFWCoreServicesDriver had ERRORS
---> test testUploadConditions had ERRORS
and more ...

@smuzaffar
Copy link
Contributor

+externals

@smuzaffar smuzaffar merged commit af30ec9 into cms-sw:IB/CMSSW_12_1_X/master Aug 2, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2021

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_1_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy, @perrotta (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard fwyzard deleted the IB/CMSSW_12_0_X/master_UCX_10.1_with_RDMA branch April 1, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants