### Using `numba.jit` to speedup the computation of the Cityblock distance matrix 


In this notebook we implement a function to compute the Cityblock distance matrix using Numba's *just-it-time* compilation decorator. We compare it's performance to that of corresponding non-decorated NumPy function.

We will use two Numba functions here. The decorator ` @numba.jit` and `numba.prange`.

<a href="https://colab.research.google.com/github/Ziaeemehr/workshop_hpcpy/blob/main/notebooks/numba/cityblock-distance-matrix-numba.jit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In [1]:
import numpy as np
import numba

The Manhattan (or city block) distance between two vectors $x$ and $y$ is defined as the sum of the absolute differences between their corresponding components.
$$
d(x_i, y_j) = \sum_{k=1}^d |x_{ik} - y_{jk}|
$$



In [2]:
numba.set_num_threads(4)

In [4]:
def cityblock_python(x, y):
    """Naive python implementation."""

    nrow, ncol = x.shape
    dist_matrix = np.empty((nrow, nrow))
    for i in range(nrow):
        for j in range(nrow):
            r = 0.0
            for k in range(ncol):
                r += np.abs(x[i][k] - y[j][k])
            dist_matrix[i][j] = r

    return dist_matrix


cityblock_numba0 = numba.jit(nopython=True)(cityblock_python)


@numba.jit(nopython=True, parallel=True)
def cityblock_numba1(x, y):
    """Implementation with numba."""

    nrow, ncol = x.shape
    dist_matrix = np.empty((nrow, nrow))
    for i in range(nrow):
        for j in range(nrow):
            r = 0.0
            for k in numba.prange(ncol):
                r += np.abs(x[i][k] - y[j][k])
            dist_matrix[i][j] = r

    return dist_matrix


@numba.jit(nopython=True, parallel=True)
def cityblock_numba2(x, y):
    """Implementation with numba and numpy."""

    nrow, ncol = x.shape
    dist_matrix = np.empty((nrow, nrow))
    for i in range(nrow):
        for j in numba.prange(nrow):
            dist_matrix[i][j] = np.linalg.norm(x[i] - y[j], 1)

    return dist_matrix

### Note
Observe that the inner loop, which is a reduction, is done with `numba.prange`. `numba.prange` automatically takes care of data privatization and reductions.

In [5]:
# Let's check that they all give the same result
rng = np.random.default_rng()
x = 10. * rng.random((100, 10))

print(np.allclose(cityblock_python(x, x), cityblock_numba0(x, x)))
print(np.allclose(cityblock_python(x, x), cityblock_numba1(x, x)))
print(np.allclose(cityblock_python(x, x), cityblock_numba2(x, x)))

True
True
True


In [6]:
nrow = 200
ncol = 25

x = 10. * rng.random((nrow, ncol))

%timeit cityblock_python(x, x)
%timeit cityblock_numba0(x, x)
%timeit cityblock_numba1(x, x)
%timeit cityblock_numba2(x, x)

1.62 s ± 71.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
853 μs ± 4.21 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
83.8 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.18 ms ± 32.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


### Exercise 1
How do you explain the difference in execution times?

## Conclusions

In cases where there's no possibility to do an implementation with NumPy vectorized operations, it's worth to give a try to Numba. It offers a significant improvement in performance compared to pure python, specially in situations where loops are unavoidable.

As we have seen, the speedup doesn't come completelly for free: the way the python function is implmented is crucial to obtain a good performance from Numba. Consider different implementations with and without NumPy operations and measure their execution time.

#### Synchronizing Numba and NumPy RNG States for Consistent Behavior

The Numba and NumPy random number generator (RNG) states are completely separate. As a result, calling `np.random.seed()` only affects the NumPy RNG seed. To synchronize Numba's RNG state with NumPy's, `np.random.seed()` must be called within a JIT-compiled region, for example:"

executing the script should produce the same chain of random numbers.


In [12]:
import numpy as np 
from numba import jit 
from numba.extending import register_jitable

# This will run in JIT mode only if called from a JIT function
@register_jitable
def set_seed_compact(x):
    np.random.seed(x)
    
@jit(nopython=True)
def get_random():
    set_seed_compact(42)
    print(np.random.rand(3))
    
def get_random2():
    print(np.random.rand(3))
    
get_random()
get_random.py_func()
get_random2()

[0.37454012 0.95071431 0.73199394]
[0.37454012 0.95071431 0.73199394]
[0.59865848 0.15601864 0.15599452]
