## Easy efficiency with `numba` and `dask`

This is a toy notebook with code taken directly from [this blog post](https://medium.com/capital-one-developers/dask-numba-for-efficient-in-memory-model-scoring-dfc9b68ba6ce) on Medium. The idea was to efficiently combine the various function decorators provided by the `numba` and `dask` libraries to improve computation efficiency non-invasively.

In [8]:
import numpy as np
import pandas as pd
from numba import jit, guvectorize
from dask import delayed

In [4]:
# simple toy function for simple model
def predict_over_time(x, y, z, overlay=False):
    "Predicts a quantity at times = 0, 1, ... 14"
    out = np.zeros((x.shape[0], 15))
    for t in range(15):
        out[:, t] = t * x ** 2 + y - 2 * z - 2 * t
    adj = 1.5 if overlay else 1.0
    return adj * out

In [6]:
@jit
def jitted_predict_over_time(x, y, z, overlay = False):
    "Predicts a quantity at times = 0, 1, ... 14"
    out = np.zeros((x.shape[0], 15))
    for t in range(15):
        out[:, t] = t * x ** 2 + y - 2 * z - 2 * t
    adj = 1.5 if overlay else 1.0
    return adj * out

In [19]:
# simulate toy data
# create some artificial inputs
n = 25000
u = np.random.random(n)
x = np.random.poisson(lam = 5, size = n)
y, z = np.random.normal(size = (n, 2)).T

# data dict
data_dict = {'u': u,
             'x': x,
             'y': y,
             'z': z}

# DataFrame
data = pd.DataFrame(data = data_dict)
data.head()

Unnamed: 0,u,x,y,z
0,0.112347,8,-0.433749,0.388892
1,0.802518,5,0.224678,-0.331499
2,0.098574,6,-0.334008,-0.079211
3,0.560828,6,0.080884,-1.283361
4,0.72709,5,1.071648,-0.306963


In [20]:
%%timeit -n 100
out_normal = predict_over_time(x, y, z) # 100 loops, best of 3: 3.28 ms per loop

9.46 ms ± 422 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [21]:
%%timeit -n 100
out_jitted = jitted_func(x, y, z) # 100 loops, best of 3: 2.27 ms per loop

6.55 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


Using the `@jit` decorator from `numba` improves efficiency with effectively no change to the function written above. `@jit` works by inferring types of inputs at run time, compiling the function accordingly.

In [24]:
@guvectorize('i8, f8, f8, b1, f8[:], f8[:]',
             '(), (), (), (), (s) -> (s)')
def fast_predict_over_time(x, y, z, overlay, _, out):
    adj = 1.5 if overlay else 1.0
    for t in range(len(out)):
        out[t] = adj * (t * x ** 2 + y - 2 * z - 2 * t)

res = np.zeros((n, 15))

* Note that the `@guvectorize` decorator requires that we specify the types of all arguments as well as the shape of the output.

* This works by implementing the function over rows of the input (i.e., equivalent to calling `apply(OBJ, 1, FUN)` in `R`).

* Due to the internals of how `@guvectorize` implements the compilation of the function, minor aspects of the function need to be rewritten (compare with the above for reference), though the "feel" of implementing the function is still natural since we need only think of how the computation is to be implemented over a single row.

* Additionally, note that, breaking with standard Pythonic style, the function now lacks an explicit `return` statement and also requires that the output object be pre-allocated and passed in as an argument.

In [25]:
%%timeit -n 100
_ = fast_predict_over_time(x, y, z, False, res) # 100 loops, best of 3: 575 µs per loop

1.2 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## `Dask` applied to `numba`

`Dask` is a library that allows for delayed evaluation of complex task graphs. It has made "within-core" analysis of Big Data possible.

In [27]:
# won't be evaluated until we call .compute()
fast_predict_over_time = delayed(fast_predict_over_time)

In [29]:
%%timeit -n 100
## using the same numpy arrays from above...
_ = fast_predict_over_time(x, y, z, False, res).compute()
# 100 loops, best of 3: 1.04 ms per loop

2.04 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
