[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451
Open
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
Open
[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…o 256 bytes Type-erased wrappers (any_resource, resource_ref, shared_resource, synchronous_resource_adapter) were using alignof(::cuda::std::max_align_t) (16 bytes) as the default alignment for their deprecated no-alignment overloads, while concrete resources (device_memory_pool, legacy_pinned_memory_resource, etc.) use default_cuda_malloc_alignment (256 bytes). This inconsistency silently changed the effective alignment when a concrete resource was wrapped in a type-erased wrapper. Fix by: - Replacing alignof(::cuda::std::max_align_t) with ::cuda::mr::default_cuda_malloc_alignment in the deprecated overloads of __ibasic_resource and __ibasic_async_resource in any_resource.h - Updating the Doxygen stubs (basic_any_resource, basic_resource_ref) to reference default_cuda_malloc_alignment consistently - Fixing the default in shared_resource::allocate_sync/deallocate_sync - Adding default alignment = default_cuda_malloc_alignment to the async allocate/deallocate overloads of shared_resource and synchronous_resource_adapter, which previously required explicit alignment Verified: compiled on Windows (MSVC 19.50, CUDA 12.9, sm_89) Fixes NVIDIA#8063
Contributor
Contributor
Author
|
@griwes Hi, could you please take a look at this PR when you have time? This fixes the default alignment used by deprecated no-alignment overloads Currently, type-erased wrappers (e.g. This creates a behavioral mismatch and can lead to under-aligned allocations This PR:
The change only affects deprecated overloads and default parameters. Tested:
Fixes #8063 Would really appreciate your feedback, thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fix the default alignment used by deprecated no-alignment overloads in type-erased memory resource wrappers:
Before:
// Calls allocate_sync(bytes, 16) — wrong default any_synchronous_resource<> any{pool}; any.allocate_sync(bytes);After:
// Calls allocate_sync(bytes, 256) — correct default matching concrete resources any_synchronous_resource<> any{pool}; any.allocate_sync(bytes);Why
Concrete resources (
device_memory_pool,legacy_pinned_memory_resource, etc.) usedefault_cuda_malloc_alignment(256 bytes) as their default alignment. However, their type-erased wrappers (any_resource,resource_ref,shared_resource,synchronous_resource_adapter) were usingalignof(::cuda::std::max_align_t)(16 bytes) in their deprecated no-alignment overloads and function parameter defaults.This silently under-aligns device allocations when passing memory to CUDA APIs that require 256-byte alignment, and creates a behavior discrepancy between concrete resources and their type-erased equivalents.
Root cause: wrong alignment constant used across 6 locations in 3 files.
How
alignof(::cuda::std::max_align_t)with::cuda::mr::default_cuda_malloc_alignmentin the deprecated overloads of__ibasic_resourceand__ibasic_async_resource(any_resource.h)basic_any_resourceandbasic_resource_refconsistentlyshared_resource::allocate_sync/deallocate_syncdefault parameter= ::cuda::mr::default_cuda_malloc_alignmentdefault to the asyncallocate/deallocateoverloads inshared_resourceandsynchronous_resource_adapter, which previously required callers to specify alignment explicitly (unlike concrete resources)Test
Fixes #8063