Fix CachingAllocator debug for non-async operations #45368

fwyzard · 2024-07-03T12:33:40Z

PR description:

@VinInn pointed out that filling memory asynchronously may be incorrect if the memory is later being set synchronously, without using a queue.
This is often the case with pinned host memory buffers, where the allocation and memset may be asynchronous, but the content is accessed directly using host-only operations.

This change makes the allocator wait for the memset to complete before returning the memory buffer to the user code.

It also adds a customisation function to activate memory filling, and uses it in the non-profling Alpaka workflows.

PR validation:

The new tests pass.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

To be backported to 14.0.x to fix the same bug there.

fwyzard · 2024-07-03T12:33:46Z

type bugfix

cmsbuild · 2024-07-03T12:34:02Z

cms-bot internal usage

cmsbuild · 2024-07-03T12:38:58Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40797

This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40797/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40797/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

VinInn · 2024-07-03T12:40:10Z

I think it should fix the issue, yes.
It is clearly an overkill and a fix to alpaka memset would be more efficient.

fwyzard · 2024-07-03T12:58:15Z

Actually, I think that the behaviour of alpaka::memset is correct.

The issue is how we handle pinned host memory:

the buffer needs to be allocated using an asynchronous operation, to guarantee that the memory is freed only after any asynchronous alpaka::memcpy operations are complete; so the queue associated to the buffer is an asynchronous one;
however, since the host memory may be written to with non-alpaka operations (std::memset, immediate writes, etc.) we configure the allocator to make sure the memory is available immediately after returning from the allocation;
alpaka::memset relies on the queue to dictate if the operation should be synchronous or asynchronous;
so we end up with an asynchronous queue, where we want to schedule a (host-only) synchronous operation.

The quick solution implemented here is to wait after the memset.

I agree a better solution is to use an immediate memory write: either std::memset, or alpaka::memset with an immediate host-only queue.

cmsbuild · 2024-07-03T21:25:41Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40804

This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40804/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40804/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

fwyzard · 2024-07-03T21:28:39Z

@VinInn what do you think of this approach ?

fwyzard · 2024-07-03T21:28:49Z

enable gpu

fwyzard · 2024-07-03T21:28:54Z

please test

cmsbuild · 2024-07-03T21:31:59Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40805

This PR adds an extra 24KB to repository

fwyzard · 2024-07-04T18:12:48Z

+heterogeneous

cmsbuild · 2024-07-04T18:15:45Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45368/40828

There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): Add run3 tracking low pu era #33532, CMSSW Integration of LST #45117, TICLv5 : Superclustering DNN #45333, Clean up Phase-2 Geometry D86, D88, D91, D92, D93, D94, D97 #45370, Phase2-gex179 Make a new scenario 2026D115 and workflow 32034.0 for a setup with HFNose and V17 geometry version of HGCal #45375
- File HeterogeneousCore/AlpakaCore/README.md modified in PR(s): [RFC] Add {Copy,Move}ToDeviceCache<T> class templates and moveToDeviceAsync function template #43969

cmsbuild · 2024-07-04T18:16:13Z

Pull request #45368 was updated. @AdrianoDee, @kskovpen, @miquork, @srimanob, @subirsarkar, @sunilUIET can you please check and sign again.

cmsbuild · 2024-07-04T22:09:01Z

+1

Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ab131f/40245/summary.html
COMMIT: 70f422b
CMSSW: CMSSW_14_1_X_2024-07-04-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45368/40245/install.sh to create a dev area with all the needed externals and cmssw changes.

DAS Queries: The DAS query tests failed, see the summary page for details.

Comparison Summary

Summary:

You potentially added 2 lines to the logs
Reco comparison results: 11 differences found in the comparisons
DQMHistoTests: Total files compared: 48
DQMHistoTests: Total histograms compared: 3345088
DQMHistoTests: Total failures: 9
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3345059
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
Checked 202 log files, 165 edm output root files, 48 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

You potentially added 9 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 90290
DQMHistoTests: Total failures: 287
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 90003
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 27 log files, 32 edm output root files, 7 DQM output files
TriggerResults: no differences found

srimanob · 2024-07-05T13:01:40Z

+Upgrade

AdrianoDee · 2024-07-09T07:17:27Z

+pdmv

cmsbuild · 2024-07-09T07:17:48Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

antoniovilela · 2024-07-09T14:20:42Z

+1

cmsbuild added this to the CMSSW_14_1_X milestone Jul 3, 2024

cmsbuild added pending-signatures tests-pending orp-pending bug-fix code-checks-pending heterogeneous-pending labels Jul 3, 2024

fwyzard mentioned this pull request Jul 3, 2024

HLT crashes in Run 380399 #44923

Closed

cmsbuild added code-checks-rejected and removed code-checks-pending labels Jul 3, 2024

fwyzard force-pushed the Alpaka_CachingAllocator_debug_141x branch from b476030 to 2e5ed47 Compare July 3, 2024 21:20

cmsbuild added code-checks-pending and removed code-checks-rejected labels Jul 3, 2024

cmsbuild added code-checks-rejected and removed code-checks-pending labels Jul 3, 2024

fwyzard force-pushed the Alpaka_CachingAllocator_debug_141x branch from 2e5ed47 to 645f6a6 Compare July 3, 2024 21:27

cmsbuild added code-checks-pending and removed code-checks-rejected labels Jul 3, 2024

cmsbuild added tests-started and removed tests-pending labels Jul 3, 2024

cmsbuild added code-checks-pending heterogeneous-pending and removed heterogeneous-approved labels Jul 4, 2024

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Jul 4, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Jul 4, 2024

cmsbuild added tests-approved and removed tests-started labels Jul 4, 2024

cmsbuild added upgrade-approved and removed upgrade-pending labels Jul 5, 2024

cmsbuild added fully-signed pdmv-approved and removed pending-signatures pdmv-pending labels Jul 9, 2024

cmsbuild added orp-approved and removed orp-pending labels Jul 9, 2024

cmsbuild merged commit 247f6db into cms-sw:master Jul 9, 2024
15 checks passed

This was referenced Jul 9, 2024

Set c++20 as a default standard cms-sw/cmsdist#9288

Merged

[GCC13] Update gcc to 13.3.0 cms-sw/cmsdist#9270

Open

[GCC13] Patch applied for c++20 cms-sw/cmsdist#9295

Merged

fwyzard deleted the Alpaka_CachingAllocator_debug_141x branch July 17, 2024 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CachingAllocator debug for non-async operations #45368

Fix CachingAllocator debug for non-async operations #45368

fwyzard commented Jul 3, 2024 •

edited

Loading

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024 •

edited

Loading

cmsbuild commented Jul 3, 2024

VinInn commented Jul 3, 2024

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024

fwyzard commented Jul 3, 2024

fwyzard commented Jul 3, 2024

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024

fwyzard commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

srimanob commented Jul 5, 2024

AdrianoDee commented Jul 9, 2024

cmsbuild commented Jul 9, 2024

antoniovilela commented Jul 9, 2024

Fix CachingAllocator debug for non-async operations #45368

Fix CachingAllocator debug for non-async operations #45368

Conversation

fwyzard commented Jul 3, 2024 • edited Loading

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024 • edited Loading

cmsbuild commented Jul 3, 2024

VinInn commented Jul 3, 2024

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024

fwyzard commented Jul 3, 2024

fwyzard commented Jul 3, 2024

fwyzard commented Jul 3, 2024

cmsbuild commented Jul 3, 2024

fwyzard commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

cmsbuild commented Jul 4, 2024

Comparison Summary

GPU Comparison Summary

srimanob commented Jul 5, 2024

AdrianoDee commented Jul 9, 2024

cmsbuild commented Jul 9, 2024

antoniovilela commented Jul 9, 2024

fwyzard commented Jul 3, 2024 •

edited

Loading

cmsbuild commented Jul 3, 2024 •

edited

Loading