### Numba-Dask Example

Code to demostrate usage of Numba and Dask funtions and how using both doesn't guarantee better performance. 
Inspired from - https://medium.com/capital-one-tech/dask-numba-for-efficient-in-memory-model-scoring-dfc9b68ba6ce

In [31]:
# Sample Function involving array computation
def predict_over_time(x, y, z, overlay=False):
    "Predicts a quantity at times = 0, 1, ... 14"
    out = np.zeros((x.shape[0], 15))
    for t in range(15):
        out[:, t] = t * x ** 2 + y - 2 * z - 2 * t
    adj = 1.5 if overlay else 1.0
    return adj * out

In [33]:
# Basic Numba decorator - Jit
from numba import jit
import numpy as np

# Slightly optimised due to lazy execution of jit
@jit
def jitted_func(x, y, z, overlay=False):
    "Predicts a quantity at times = 0, 1, ... 14"
    out = np.zeros((x.shape[0], 15))
    for t in range(15):
        out[:, t] = t * x ** 2 + y - 2 * z - 2 * t
    adj = 1.5 if overlay else 1.0
    return adj * out

In [34]:
# Create some artificial inputs
n = 25000
u = np.random.random(n)
x = np.random.poisson(lam=5, size=n)
y, z = np.random.normal(size=(n, 2)).T

# Actual function
%timeit predict_over_time(x, y, z)

8.39 ms ± 486 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [35]:
# Optimised function
%timeit jitted_func(x, y, z)

5.97 ms ± 287 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


* @jit reduces time required each function call by a factor of about 1.4

In [18]:
# Eagerly compiled numba decorator @guvectorize
from numba import guvectorize

# Similar function
@guvectorize('i8, f8, f8, b1, f8[:], f8[:]',
             '(), (), (), (), (s) -> (s)')
def fast_predict_over_time(x, y, z, overlay, _, out):
    adj = 1.5 if overlay else 1.0
    for t in range(len(out)):
        out[t] = adj * (t * x ** 2 + y - 2 * z - 2 * t)

In [37]:
res = np.zeros((n, 15))
%timeit fast_predict_over_time(x, y, z, False, res)

36.6 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


* @guvectorize execution time reduced over 20 times

In [38]:
# Using Dask for a change :)
from dask import delayed


# won't be evaluated until we call .compute()
fast_predict_over_time = delayed(fast_predict_over_time)

## using the same numpy arrays from above...

%timeit fast_predict_over_time(x, y, z, False, res).compute()

1.29 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


* Increases execution time, since not much scope for optimisation(@guvectorize is the best :) and inclusion of dask overhead.

In [40]:
# Maybe try improving @jit version of function
# won't be evaluated until we call .compute()
fast_jitted_func = delayed(jitted_func)

## using the same numpy arrays from above...

%timeit fast_jitted_func(x, y, z).compute()

4.64 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


* Noticeable improvement over jitted function using Dask.

##### Thus, an important selection of Numba/Dask decorators is imperative to performance.