VC4CL is an implementation of the OpenCL 1.2 standard for the VideoCore IV GPU (found in all Raspberry Pi models).
The implementation consists of:
- The VC4CL OpenCL runtime library, running on the host CPU to compile, run and interact with OpenCL kernels.
- The VC4C compiler, converting OpenCL kernels into machine code. This compiler also provides an implementation of the OpenCL built-in functions.
- The VC4CLStdLib, the platform-specific implementation of the OpenCL C standard library, is linked in with the kernel by VC4C
The VC4CL implementation supports the EMBEDDED PROFILE of the OpenCL standard version 1.2.
cl_khr_icd extension is supported, to allow VC4CL to be found by an installable client driver loader (ICD). This enables VC4CL to be used in parallel with another OpenCL implementation, e.g. pocl, which executes OpenCL code on the host CPU.
The OpenCL version 1.2 was selected as target standard version, since it is the last version of the OpenCL standard where all mandatory features can be supported.
VC4CL supports the EMBEDDED PROFILE of the OpenCL-standard, which is a trimmed version of the default FULL PROFILE. The most notable features, which are not supported by the VC4CL implementation are images, the
double data-types, device-side
printf and partitioning devices. See RuntimeLibrary for more details of (not) supported features.
VideoCore IV GPU
The VideoCore IV GPU, in the configuration as found in the Raspberry Pi models, has a theoretical maximum performance of 24 GPFLOS and is therefore very powerful in comparison to the host CPU. The GPU (which is located on the same chip as the CPU) has 12 cores, able of running independent instructions each, supports a SIMD vector-width of 16 elements natively and can access the RAM directly via DMA.
- A C++14-capable compiler (e.g. GCC 6.3 or clang from the Raspbian repositories)
- The VC4C compiler to compile OpenCL C-code
- The Khronos ICD Loader (available in the official Raspbian repository as
sudo apt-get install ocl-icd-opencl-dev ocl-icd-dev) for building with ICD-support (e.g. allows to run several OpenCL implementations on one machine)
- The OpenCL headers in version >= 1.2 (available in the Raspbian repositories as
sudo apt-get install opencl-headers)
The following configuration options are available in CMake:
BUILD_TESTINGtoggles building of test program (when configured, can be built with
BUILD_DEBUGtoggles building debug or release program
CROSS_COMPILEtoggles whether to cross-compile for the Raspberry Pi. NOTE: The Raspberry Pi cross-compiler is no longer supported!
CROSS_COMPILER_PATHsets the root path to the Raspberry Pi cross compiler (.g.
INCLUDE_COMPILERwhether to include the VC4C compiler. For the compiler to be actually included, the VC4C header and library needs to be found too
VC4C_HEADER_PATHsets the path to the VC4C include headers, defaults to
VC4CC_LIBRARYsets the path to the VC4C compiler library, defaults to
BUILD_ICDtoggles whether to build with support for the Khronos ICD loader, requires the ICD loader to be installed system-wide
IMAGE_SUPPORTtoggles whether to enable the very experimental image-support
REGISTER_POKE_KERNELStoggles the use of register-poking to start kernels (if disabled, uses the mailbox system-calls). Enabling this increases performance up to 10%, but may crash the system, if any other application accesses the GPU at the same time!
BUILD_DEB_PACKAGEtoggles whether to create the necessary configuration to build
vc4cl-xxx.debpackage for installation on Raspbian. The actual packaging is started with
cpack -G DEB
Khronos ICD Loader
The Khronos ICD Loaders allows multiple OpenCL implementation to be used in parallel (e.g. VC4CL and pocl), but requires a bit of manual configuration:
Create a file
/etc/OpenCL/vendors/VC4CL.icd with a single line containing the absolute path to the VC4CL library.
The program clinfo can be used to test, whether the ICD loader finds the VC4CL implementation. Note: the program version in the official Raspbian repository is too old and has a bug (see fix), so it must be compiled from the github repository.
Because of the DMA-interface which has no MMU between the GPU and the RAM, code executed on the GPU can access any part of the main memory! This means, an OpenCL kernel could be used to read sensitive data or write into kernel memory!
Therefore, any program using the VC4CL implementation must be run as root!