@ddemidov ddemidov released this Nov 15, 2013 · 598 commits to master since this release

Assets 2

CUDA backend is added!

As of v1.0.0, VexCL provides two backends: OpenCL and CUDA. In order to choose either of those, user has to define VEXCL_BACKEND_OPENCL or VEXCL_BACKEND_CUDA macros. In case neither of those are defined, OpenCL backend is chosen by default. One also has to link to either libOpenCL.so (OpenCL.dll for Windows users) or libcuda.so (cuda.dll).

For the CUDA backend to work, CUDA Toolkit has to be installed, NVIDIA CUDA compiler driver nvcc has to be in executable PATH and usable at runtime.

Benchmarks show that the CUDA backend is a couple of percents more efficient than the OpenCL backend, except for matrix-vector multiplication on multiple devices (there are some issues with asynchronous memory transfer with CUDA driver API). Note that first run of a program will take longer than usual, because there will be several invocations of nvcc compiler to compile each of compute kernels used in the program. Second and other runs will use offline kernel cache and will complete faster.

Also:

  • Added vex::Filter::General: modifiable container for device filters.
  • vex::Filter::Env supports OCL_POSITION environment variable.
  • Vector views (reduction, permutation) are all working with vector expressions.
  • Added vex::reshape() function for reshaping of multidimensional expressions.
  • Added vex::cast() function for changing deduced type of an expression.
  • Added vex::Filter::Extension and vex::Filter::GLSharing filters for the OpenCL backend (thanks, @johneih!)
  • VEXCL_SPLIT_MULTIEXPRESSIONS macro allows componentwise splitting of
    large multiexpressions.
  • Various bug fixes.