A1.3 log

Status: in progress

Device side include headers

The files containing CUDA kernels are including C standard library headers, as well as many other headers which are only partially needed.

static storage class specifier

static specifier is not supported in OpenCL 1.1, so it has been removed from the OpenCL device code. See chapter 6.8, point g. However it is supported in OpenCL 1.2

variadic macros

Variadic macros are not supported in OpenCL 1.1. Check chapter 6.8, point e. OpenCL 1.2 does not support them either. They have been replaced by macros with a fixed number of parameters.

CUDA warp shuffle functions

The CUDA kernels use CUDA specific warp shuffle functions (see __shfl_down calls and ***_warp_shfl functions). The code also provides and alternative implementation in case REDUCE_SHUFFLE is not defined. The OpenCL kernels are using this alternative implementation.

CUDA warp vote functions

warp vote functions are implemented using shared(local) memory. (Search for warp_any variable inside the kernel)

Warp dependent characteristics of the kernel

See #38

3 component vector data type size

For alignment reasons, the size of a 3 component vector data type is the same with the size of a 4 component vector data type. This means that sizeof(float3) = 4 * sizeof(float). In order to keep the original data layout, float3* buffers have been changed to float* buffers. See also #27

Missing atomic_add functions for float data type

OpenCL does not provide atomic_add functions for float data type. This operation has been implemented using atomic_cmpxchg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A1.3 log

Status: in progress

Clone this wiki locally