Skip to content

Multithreading

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Apr 25, 2022 · 3 revisions

When a single-threaded auto-vectorized SIMD execution is not fast enough, run method can be replaced with a multithreaded version:

constexpr int numThreads = 8;
kernel.runMultithreaded<numThreads>(numWorkItems,inputBuffer,outputBuffer);

If (GCC v10 or any auto-vectorization capable) compiler is given -fopenmp flag, it uses OpenMP for the work distribution. If it does not exist, then it uses simple array of threads launched and joined for every runMultithreaded method call. This will be replaced by a load-balancer later.

Clone this wiki locally