# Working with Numba

This notebooks provides some examples of how to work with **Numba** and compare the speed-up with C++.

From the **consav** package we will use the **runtools** module to control the behavior of **Numba**.

**Links:**

- [Supported Python features](https://numba.pydata.org/numba-doc/dev/reference/pysupported.html)
- [Supported Numpy features](https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html)

# Decorating Python functions

Imports and numba settings:

In [1]:
import time
import numpy as np

from consav import runtools
runtools.write_numba_config(threads=8,threading_layer='tbb')
import numba as nb # must be imported after write_numba_config!
#nb.config.__dict__ # see all configuration

## Functions

In [2]:
def test_standard(X,Y,Z,NX,NY):

    # X is lenght NX
    # Y is lenght NY
    # Z is length NX

    for i in nb.prange(NX):
        for j in range(NY):
            Z[i] += np.exp(np.log(X[i]*Y[j]))/(X[i]*Y[j])-1
            
@nb.njit(parallel=True)
def test(X,Y,Z,NX,NY):
    for i in nb.prange(NX):
        for j in range(NY):
            Z[i] += np.exp(np.log(X[i]*Y[j]))/(X[i]*Y[j])-1

@nb.njit(parallel=True,fastmath=True)
def test_fast(X,Y,Z,NX,NY):
    for i in nb.prange(NX):
        for j in range(NY):
            Z[i] += np.exp(np.log(X[i]*Y[j]))/(X[i]*Y[j])-1

## Settings

Choose settings and make random draws:

In [3]:
# a. settings
NX = 100
NY = 20000

# b. random draws
np.random.seed(1998)
X = np.random.sample(NX)
Y = np.random.sample(NY)
Z = np.zeros(NX)

## Examples

In [4]:
tic = time.time()
test_standard(X,Y,Z,NX,NY)
toc = time.time()
print(f'numba {np.sum(Z):.8f} in {toc-tic:.1f} secs')

tic = time.time()
test(X,Y,Z,NX,NY)
toc = time.time()
print(f'numba {np.sum(Z):.8f} in {toc-tic:.1f} secs')

Z = np.zeros(NX)
tic = time.time()
test_fast(X,Y,Z,NX,NY)
toc = time.time()
print(f'numba (fastmath=true) {np.sum(Z):.8f} in {toc-tic:.1f} secs')

numba 0.00000000 in 6.5 secs
numba 0.00000000 in 0.4 secs
numba (fastmath=true) 0.00000000 in 0.2 secs


# Test parallization in Numba and C++

Compile C++ function for comparison:

In [5]:
from consav import cpptools
cpptools.compile('test_numba',compiler='vs',dllfilename='test_numba_vs')
cpptools.compile('test_numba',compiler='intel',dllfilename='test_numba_intel')

cpp files compiled
cpp files compiled


Run tests with different number of threads:

In [6]:
for threads in [8,4,2,1]:
    
        print(f'threads = {threads}')
        
        print(f' threading_layer = tbb')
        runtools.write_numba_config(threads=threads,threading_layer='tbb')
        !python test_numba.py
        
        print(f' threading_layer = omp')
        runtools.write_numba_config(threads=threads,threading_layer='omp')
        !python test_numba.py
        
        print('')

threads = 8
 threading_layer = tbb
  test 0.00000000 in 0.8 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
 threading_layer = omp
  test 0.00000000 in 0.9 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
  test (cpp, vs) 0.00000000 in 0.8 secs
  test (cpp, intel) 0.00000000 in 0.9 secs

threads = 4
 threading_layer = tbb
  test 0.00000000 in 1.7 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
 threading_layer = omp
  test 0.00000000 in 1.7 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
  test (cpp, vs) 0.00000000 in 1.7 secs
  test (cpp, intel) 0.00000000 in 1.5 secs

threads = 2
 threading_layer = tbb
  test 0.00000000 in 3.3 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
 threading_layer = omp
  test 0.00000000 in 3.3 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
  test (cpp, vs) 0.00000000 in 3.3 secs
  test (cpp, intel) 0.00000000 in 3.0 secs

threads = 1
 threading_layer = tbb
  test 0.00000000 in 6.5 secs
  test (fastmath=true) 0.00000000 in 0.0 secs
 thread