-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce edm::Async service, and use it in CUDA and Alpaka modules #44901
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44901/40168
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@cmsbuild, @makortel, @Dr15Jones, @smuzaffar, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3cb6f/39230/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44901/40191
|
Pull request #44901 was updated. @fwyzard, @Dr15Jones, @makortel, @smuzaffar can you please check and sign again. |
-1 Failed Tests: UnitTests Unit TestsI found 1 errors in the following unit tests: ---> test RecoPPSLocalNewT2 had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
Unit test fails in IBs #45101 |
+core |
+heterogeneous Based on earlier #44901 (comment), and that on |
This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
ignore tests-rejected with ib-failure |
+1 |
merge |
PR description:
This PR adds
edm::async()
facility described in #29188, but implemented as aedm::Async
service base class andAsyncService
implementation (rationale for service-based approach instead of a free function can be found from #44901 (comment)). This PR also replaces the use ofcudaStreamAddCallback()
withedm::async()
accompanied withcudaEventSynchronize()
, and makes the CUDA/Alpaka events to be created withcudaEventBlockingSync
flag.Measurements that I showed in CHEP 2023 https://indico.jlab.org/event/459/contributions/11810/ suggested possible 1 % throughput improvement at the HLT (of that time, many things have changed since) over
cudaStreamAddCallback()
. Earlier studies done with a prototype in cms-patatrack/pixeltrack-standalone#321 that somehow the thread pool withcudaEventSynchronize()
used less CPU thancudaStreamAddCallback()
.During the CHEP study I also tested polling with
cudaEventQuery()
, but the "waiting thread pool" approach was more performant.Another benefit over
cudaStreamAddCallback()
is that that function "is slated for eventual deprecation and removal", and the "replacement"cudaLaunchHostFunc()
does not call the callback function in case of an error in the associated CUDA stream.Resolves #29188
Resolves cms-sw/framework-team#916
PR validation:
Unit tests in
FWCore/Concurrency
,HeterogeneousCore/Alpaka{Core,Test}
,HeterogeneousCore/CUDA{Utilities,Core,Test}
succeed.The deployment on CUDA and Alpaka modules still needs performance testing
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Possibly to be backported to 14_0_X.