So what kind of code can be parallelised? It's easier to say what kind of code definitely can't be parallelised. First case: Loops with dependencies

In [2]:
import numpy as np
import numba
import time

In [3]:
n = 1000000
a = np.random.randn(n)
b = np.random.randn(n)

c = np.zeros(n, dtype='float64')

In [8]:
%%time
@numba.njit
def numba_fun(arr1, arr2):

    for i in range(1,n):
        arr2[i] = arr2[i-1] + arr1[i] ** 2

numba_fun(a, c)

CPU times: user 77 ms, sys: 3.31 ms, total: 80.3 ms
Wall time: 89.5 ms


In [9]:
%%time
@numba.njit(parallel=True)
def numba_fun(arr1, arr2):

    for i in range(1,n):
        arr2[i] = arr2[i-1] + arr1[i] ** 2

numba_fun(a, c)

CPU times: user 81.2 ms, sys: 4.59 ms, total: 85.8 ms
Wall time: 94.8 ms


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.

File "<timed exec>", line 1:
<source missing, REPL/exec in use?>



Second case: Data races

In [10]:
@numba.njit(parallel=False)
def prange_right_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in numba.prange(n):
        # accumulating into the same element of `y` from different
        # parallel iterations of the loop results in a race condition
        y[:] = y[0] + x[i]
        
    return y

In [11]:
@numba.njit(parallel=True)
def prange_wrong_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in numba.prange(n):
        # accumulating into the same element of `y` from different
        # parallel iterations of the loop results in a race condition
        y[:] = y[0] + x[i]

    return y

In [12]:
x = np.random.rand(1000,4)
print(prange_right_result(x))
print(prange_wrong_result(x))

[508.01446498 508.19639403 508.05990544 508.21107149]
[2.29270284 2.47463189 2.3381433  2.48930935]


Whole array reductions are an exception to this rule. A number of unmpy operations are automatically recognized and supported

In [16]:
@numba.njit(parallel=False)
def reduction_in_serial(x):
    n = x.shape[0]
    y = 0.0
    for i in numba.prange(n):
        y += x[i]
    return y

In [17]:
@numba.njit(parallel=True)
def reduction_in_parallel(x):
    n = x.shape[0]
    y = 0.0
    for i in numba.prange(n):
        y += x[i]
    return y

In [18]:
x = np.arange(1000)
print(reduction_in_serial(x))
print(reduction_in_parallel(x))

499500.0
499500.0
