[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes) by edenfunf · Pull Request #8451 · NVIDIA/cccl

edenfunf · 2026-04-15T15:52:51Z

What

Fix the default alignment used by deprecated no-alignment overloads in type-erased memory resource wrappers:

Before:

// Calls allocate_sync(bytes, 16) — wrong default
any_synchronous_resource<> any{pool};
any.allocate_sync(bytes);

After:

// Calls allocate_sync(bytes, 256) — correct default matching concrete resources
any_synchronous_resource<> any{pool};
any.allocate_sync(bytes);

Why

Concrete resources (device_memory_pool, legacy_pinned_memory_resource, etc.) use default_cuda_malloc_alignment (256 bytes) as their default alignment. However, their type-erased wrappers (any_resource, resource_ref, shared_resource, synchronous_resource_adapter) were using alignof(::cuda::std::max_align_t) (16 bytes) in their deprecated no-alignment overloads and function parameter defaults.

This silently under-aligns device allocations when passing memory to CUDA APIs that require 256-byte alignment, and creates a behavior discrepancy between concrete resources and their type-erased equivalents.

Root cause: wrong alignment constant used across 6 locations in 3 files.

How

Replace alignof(::cuda::std::max_align_t) with ::cuda::mr::default_cuda_malloc_alignment in the deprecated overloads of __ibasic_resource and __ibasic_async_resource (any_resource.h)
Update the Doxygen stubs for basic_any_resource and basic_resource_ref consistently
Fix shared_resource::allocate_sync / deallocate_sync default parameter
Add = ::cuda::mr::default_cuda_malloc_alignment default to the async allocate/deallocate overloads in shared_resource and synchronous_resource_adapter, which previously required callers to specify alignment explicitly (unlike concrete resources)

Test

Compiled on Windows (MSVC 19.50, CUDA 12.9, RTX 5070 / sm_89) — no errors

Fixes #8063

…o 256 bytes Type-erased wrappers (any_resource, resource_ref, shared_resource, synchronous_resource_adapter) were using alignof(::cuda::std::max_align_t) (16 bytes) as the default alignment for their deprecated no-alignment overloads, while concrete resources (device_memory_pool, legacy_pinned_memory_resource, etc.) use default_cuda_malloc_alignment (256 bytes). This inconsistency silently changed the effective alignment when a concrete resource was wrapped in a type-erased wrapper. Fix by: - Replacing alignof(::cuda::std::max_align_t) with ::cuda::mr::default_cuda_malloc_alignment in the deprecated overloads of __ibasic_resource and __ibasic_async_resource in any_resource.h - Updating the Doxygen stubs (basic_any_resource, basic_resource_ref) to reference default_cuda_malloc_alignment consistently - Fixing the default in shared_resource::allocate_sync/deallocate_sync - Adding default alignment = default_cuda_malloc_alignment to the async allocate/deallocate overloads of shared_resource and synchronous_resource_adapter, which previously required explicit alignment Verified: compiled on Windows (MSVC 19.50, CUDA 12.9, sm_89) Fixes NVIDIA#8063

copy-pr-bot · 2026-04-15T15:52:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

edenfunf · 2026-04-16T14:08:56Z

@griwes Hi, could you please take a look at this PR when you have time?

This fixes the default alignment used by deprecated no-alignment overloads
in type-erased memory resource wrappers.

Currently, type-erased wrappers (e.g. any_resource, resource_ref, shared_resource)
default to alignof(::cuda::std::max_align_t) (16 bytes), while concrete resources
(e.g. device_memory_pool) use ::cuda::mr::default_cuda_malloc_alignment (256 bytes).

This creates a behavioral mismatch and can lead to under-aligned allocations
when using the type-erased interfaces with CUDA APIs expecting 256-byte alignment.

This PR:

updates all deprecated no-alignment overloads to use default_cuda_malloc_alignment
aligns behavior between concrete and type-erased resources
applies consistent defaults across sync/async wrappers

The change only affects deprecated overloads and default parameters.

Tested:

Compiles cleanly (CUDA 12.9, MSVC 19.50, sm_89)

Fixes #8063

Would really appreciate your feedback, thanks!

edenfunf requested a review from a team as a code owner April 15, 2026 15:52

github-project-automation bot added this to CCCL Apr 15, 2026

edenfunf requested a review from griwes April 15, 2026 15:52

github-project-automation bot moved this to Todo in CCCL Apr 15, 2026

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451

[libcu++] Fix type-erased memory resource default alignment (16 → 256 bytes)#8451
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/memory-resource-type-erased-default-alignment

edenfunf commented Apr 15, 2026

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

edenfunf commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edenfunf commented Apr 15, 2026

What

Why

How

Test

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

edenfunf commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant