-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
performanceType: Runtime and / or memory behaviorType: Runtime and / or memory behavior
Description
Once there is a sufficiently advanced prototype that allows realistic profiling, it might be worth thinking about the device memory layout and memory coalescing:
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses
- https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#coalesced-access-to-global-memory
Right now, BlockData stores an array of structures. When accessing the same field of all tracks, this results in strided memory accesses and the hardware may not be able to do much about it. Instead, data accessed for all tracks simultaneously could be stored in a structure of arrays, if memory bandwidth is an issue for one of the kernels and memory coalescing is measured to improve performance.
Metadata
Metadata
Assignees
Labels
performanceType: Runtime and / or memory behaviorType: Runtime and / or memory behavior