Think about memory coalescing

Once there is a sufficiently advanced prototype that allows realistic profiling, it might be worth thinking about the device memory layout and memory coalescing:
 * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses
 * https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#coalesced-access-to-global-memory

Right now, `BlockData` stores an array of structures. When accessing the same field of all tracks, this results in strided memory accesses and the hardware may not be able to do much about it. Instead, data accessed for all tracks simultaneously could be stored in a structure of arrays, if memory bandwidth is an issue for one of the kernels and memory coalescing is measured to improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Think about memory coalescing #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Think about memory coalescing #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions