# Parallel functionality of Numba

 - Elwin van 't Wout
 - Pontificia Universidad Católica de Chile
 - IMT3870
 - 26-8-2024

Sum the values of a vector and compare the timing between parallelised versions.

In [1]:
import numpy as np
from numba import njit, prange

In [2]:
def sum_vector_python(a):
    s = 0
    for i in range(a.size):
        s += a[i]
    return s   

In [3]:
def sum_vector_numpy(a):
    s = np.sum(a)
    return s   

We can use Numba to optimize the Python code through its JIT capabilities. Moreover, Numba can automatically parallelise code through the multi-threading paradigm. For this, set the option ```parallel=False```.

In [4]:
@njit(parallel=False)
def sum_vector_numba_serial(a):
    s = 0
    for i in range(a.size):
        s += a[i]
    return s   

Adding the parallel option to the Numba decorator makes Numba search for parts of the code than can be parallelised. Add the option ```parallel=True``` for automatic parallelisation. In earlier version, this will only work when ```nopython=True```.

In [5]:
@njit(parallel=True)
def sum_vector_numba_parallel(a):
    s = 0
    for i in range(a.size):
        s += a[i]
    return s   

Instead of letting Numba search for parallelisation opportunities, you can also explicitly state that a for loop needs to be parallelised. Use the function ```prange()``` instead of the standard ```range()``` in the for loop. In this case, Numba automatically detects that the variable ```s``` for the sum is a shared variable and solves issues with race conditions.

In [6]:
@njit(parallel=True)
def sum_vector_numba_prange(a):
    s = 0
    for i in prange(a.size):
        s += a[i]
    return s   

Let us create a vector with elements $0,1,2,\dots,n-1$ and calculate the sum.

In [7]:
n = int(1e7)
vec = np.arange(n)

Before performing the timings, call the Numba functions once, so that they are compiled

In [8]:
print("Sum of vector with serial Numba:", sum_vector_numba_serial(vec))
print("Sum of vector with parallel Numba:", sum_vector_numba_parallel(vec))
print("Sum of vector with prange Numba:", sum_vector_numba_prange(vec))

Sum of vector with serial Numba: 49999995000000
Sum of vector with parallel Numba: 49999995000000


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.
[1m
File "..\..\..\..\..\..\..\AppData\Local\Temp\ipykernel_13076\1770870890.py", line 1:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m


Sum of vector with prange Numba: 49999995000000


Numba may give warnings when it cannot perform the requested optimisation of the code.

In [9]:
%%timeit
sum_vector_python(vec)

  s += a[i]


843 ms ± 35.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [10]:
%%timeit
sum_vector_numpy(vec)

3.64 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [11]:
%%timeit
sum_vector_numba_serial(vec)

2.6 ms ± 73.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [12]:
%%timeit
sum_vector_numba_parallel(vec)

2.59 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
%%timeit
sum_vector_numba_prange(vec)

760 µs ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


The number of threads used by Numba is stored in global variables.

In [14]:
from numba import config
print("The number of available CPUs detected by Numba is:", config.NUMBA_DEFAULT_NUM_THREADS)
print("The number of threads used by Numba is:", config.NUMBA_NUM_THREADS)

The number of available CPUs detected by Numba is: 8
The number of threads used by Numba is: 8


The number of threads used by Numba can be changed manually.

In [15]:
from numba import set_num_threads, get_num_threads
set_num_threads(2)
print("The current number of threads used by Numba is:", get_num_threads())

The current number of threads used by Numba is: 2
