Skip to content
AncaSC edited this page Nov 13, 2014 · 7 revisions

Status: in progress

Device side include headers
The files containing CUDA kernels are including C standard library headers, as well as many other headers which are only partially needed.
static storage class specifier
static specifier is not supported in OpenCL 1.1, so it has been removed from the OpenCL device code. See chapter 6.8, point g. However it is supported in OpenCL 1.2
variadic macros
Variadic macros are not supported in OpenCL 1.1. Check chapter 6.8, point e. OpenCL 1.2 does not support them either. They have been replaced by macros with a fixed number of parameters.
CUDA warp shuffle functions
The CUDA kernels use CUDA specific warp shuffle functions (see __shfl_down calls and ***_warp_shfl functions). The code also provides and alternative implementation in case REDUCE_SHUFFLE is not defined. The OpenCL kernels are using this alternative implementation.
CUDA warp vote functions
warp vote functions are implemented using shared(local) memory. (Search for warp_any variable inside the kernel)
Warp dependent characteristics of the kernel
See #38
3 component vector data type size
For alignment reasons, the size of a 3 component vector data type is the same with the size of a 4 component vector data type. This means that sizeof(float3) = 4 * sizeof(float). In order to keep the original data layout, float3* buffers have been changed to float* buffers. See also #27
Missing atomic_add functions for float data type
OpenCL does not provide atomic_add functions for float data type. This operation has been implemented using atomic_cmpxchg
Clone this wiki locally