-
Notifications
You must be signed in to change notification settings - Fork 343
Closed
Labels
cuda.coopFor all items related to the cuda.coop Python moduleFor all items related to the cuda.coop Python module
Description
Numba CUDA recently introduced the ability to specify an alignment=N keyword arg to the cuda.(shared|local|.array() helpers in this PR. We need to update our cuda.cooperative Algorithm implementation to instrument returned callables with an additional temp_storage_alignment attribute in addition to the existing temp_storage_bytes attribute.
Specifically, temp_storage_alignment is equivalent to alignof(Alg::TempStorage), as temp_storage_bytes is equivalent to sizeof(Alg::TempStorage).
An example of a concrete issue this will fix: create two temp storage arrays in a kernel; the second one gets a byte alignment, so if you're trying to use a cuda.coop primitive with, say, a float, which needs 4-byte alignment, the kernel will trap.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
cuda.coopFor all items related to the cuda.coop Python moduleFor all items related to the cuda.coop Python module
Type
Projects
Status
Done