Skip to content

Add temp storage alignment awareness #4726

@tpn

Description

@tpn

Numba CUDA recently introduced the ability to specify an alignment=N keyword arg to the cuda.(shared|local|.array() helpers in this PR. We need to update our cuda.cooperative Algorithm implementation to instrument returned callables with an additional temp_storage_alignment attribute in addition to the existing temp_storage_bytes attribute.

Specifically, temp_storage_alignment is equivalent to alignof(Alg::TempStorage), as temp_storage_bytes is equivalent to sizeof(Alg::TempStorage).

An example of a concrete issue this will fix: create two temp storage arrays in a kernel; the second one gets a byte alignment, so if you're trying to use a cuda.coop primitive with, say, a float, which needs 4-byte alignment, the kernel will trap.

Metadata

Metadata

Assignees

Labels

cuda.coopFor all items related to the cuda.coop Python module

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions