Skip to content

[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451

Open
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/memory-resource-type-erased-default-alignment
Open

[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/memory-resource-type-erased-default-alignment

Conversation

@edenfunf
Copy link
Copy Markdown
Contributor

What

Fix the default alignment used by deprecated no-alignment overloads in type-erased memory resource wrappers:

Before:

// Calls allocate_sync(bytes, 16) — wrong default
any_synchronous_resource<> any{pool};
any.allocate_sync(bytes);

After:

// Calls allocate_sync(bytes, 256) — correct default matching concrete resources
any_synchronous_resource<> any{pool};
any.allocate_sync(bytes);

Why

Concrete resources (device_memory_pool, legacy_pinned_memory_resource, etc.) use default_cuda_malloc_alignment (256 bytes) as their default alignment. However, their type-erased wrappers (any_resource, resource_ref, shared_resource, synchronous_resource_adapter) were using alignof(::cuda::std::max_align_t) (16 bytes) in their deprecated no-alignment overloads and function parameter defaults.

This silently under-aligns device allocations when passing memory to CUDA APIs that require 256-byte alignment, and creates a behavior discrepancy between concrete resources and their type-erased equivalents.

Root cause: wrong alignment constant used across 6 locations in 3 files.

How

  • Replace alignof(::cuda::std::max_align_t) with ::cuda::mr::default_cuda_malloc_alignment in the deprecated overloads of __ibasic_resource and __ibasic_async_resource (any_resource.h)
  • Update the Doxygen stubs for basic_any_resource and basic_resource_ref consistently
  • Fix shared_resource::allocate_sync / deallocate_sync default parameter
  • Add = ::cuda::mr::default_cuda_malloc_alignment default to the async allocate/deallocate overloads in shared_resource and synchronous_resource_adapter, which previously required callers to specify alignment explicitly (unlike concrete resources)

Test

  • Compiled on Windows (MSVC 19.50, CUDA 12.9, RTX 5070 / sm_89) — no errors

Fixes #8063

…o 256 bytes

Type-erased wrappers (any_resource, resource_ref, shared_resource,
synchronous_resource_adapter) were using alignof(::cuda::std::max_align_t)
(16 bytes) as the default alignment for their deprecated no-alignment
overloads, while concrete resources (device_memory_pool,
legacy_pinned_memory_resource, etc.) use default_cuda_malloc_alignment
(256 bytes). This inconsistency silently changed the effective alignment
when a concrete resource was wrapped in a type-erased wrapper.

Fix by:
- Replacing alignof(::cuda::std::max_align_t) with
  ::cuda::mr::default_cuda_malloc_alignment in the deprecated overloads of
  __ibasic_resource and __ibasic_async_resource in any_resource.h
- Updating the Doxygen stubs (basic_any_resource, basic_resource_ref) to
  reference default_cuda_malloc_alignment consistently
- Fixing the default in shared_resource::allocate_sync/deallocate_sync
- Adding default alignment = default_cuda_malloc_alignment to the async
  allocate/deallocate overloads of shared_resource and
  synchronous_resource_adapter, which previously required explicit alignment

Verified: compiled on Windows (MSVC 19.50, CUDA 12.9, sm_89)

Fixes NVIDIA#8063
@edenfunf edenfunf requested a review from a team as a code owner April 15, 2026 15:52
@edenfunf edenfunf requested a review from griwes April 15, 2026 15:52
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 15, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 15, 2026
@edenfunf
Copy link
Copy Markdown
Contributor Author

@griwes Hi, could you please take a look at this PR when you have time?

This fixes the default alignment used by deprecated no-alignment overloads
in type-erased memory resource wrappers.

Currently, type-erased wrappers (e.g. any_resource, resource_ref, shared_resource)
default to alignof(::cuda::std::max_align_t) (16 bytes), while concrete resources
(e.g. device_memory_pool) use ::cuda::mr::default_cuda_malloc_alignment (256 bytes).

This creates a behavioral mismatch and can lead to under-aligned allocations
when using the type-erased interfaces with CUDA APIs expecting 256-byte alignment.

This PR:

  • updates all deprecated no-alignment overloads to use default_cuda_malloc_alignment
  • aligns behavior between concrete and type-erased resources
  • applies consistent defaults across sync/async wrappers

The change only affects deprecated overloads and default parameters.

Tested:

  • Compiles cleanly (CUDA 12.9, MSVC 19.50, sm_89)

Fixes #8063

Would really appreciate your feedback, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[BUG] Type-erasing a memory resource changes default allocation alignment from 256 to 16 bytes

1 participant