## Numba -  an overview

Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops.
ArviZ includes Numba as an optional dependency and a number of functions have been included in **utils.py** for systems in which Numba is pre-installed. An additional functionality of disabling/re-enabling numba for systems which have numba installed has also been included.

### A simple example to display the effectiveness of Numba

In [1]:
import arviz as az
import numpy as np
import timeit

from arviz.utils import conditional_jit, Numba
from arviz.stats import geweke
from arviz.stats.diagnostics import ks_summary

In [2]:
data = np.random.randn(1000000)

In [3]:
def variance(data, ddof=0):  # Method to calculate variance without using numba
    a_a, b_b = 0, 0
    for i in data:
        a_a = a_a + i
        b_b = b_b + i * i
    var = b_b / (len(data)) - ((a_a / (len(data))) ** 2)
    var = var * (len(data) / (len(data) - ddof))
    return var

In [4]:
%timeit variance(data, ddof=1)

327 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
@conditional_jit
def variance_jit(data, ddof=0):  # Calculating variance with numba
    a_a, b_b = 0, 0
    for i in data:
        a_a = a_a + i
        b_b = b_b + i * i
    var = b_b / (len(data)) - ((a_a / (len(data))) ** 2)
    var = var * (len(data) / (len(data) - ddof))
    return var

In [6]:
%timeit variance_jit(data, ddof=1)

1.23 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


**That is almost 300 times faster!! Let's compare this to numpy**

In [8]:
%timeit np.var(data, ddof=1)

2.13 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


**In certain scenarios, Numba outperforms numpy!** **Let's see Numba's effect on a few of ArviZ functions**

In [9]:
Numba.disable_numba()  # This disables numba
Numba.numba_flag

False

In [10]:
data = np.random.randn(1000000)
smaller_data = np.random.randn(1000)

In [11]:
%timeit geweke(data)

14.5 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
%timeit geweke(smaller_data)

908 µs ± 25.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [13]:
Numba.enable_numba()  # This will re-enable numba
Numba.numba_flag  # This indicates the status of Numba

True

In [14]:
%timeit geweke(data)

12.5 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [15]:
%timeit geweke(smaller_data)

470 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [16]:
Numba.enable_numba()
Numba.numba_flag

True

**Numba speeds up the code by a factor of two approximately. Let's check some other method**

In [17]:
summary_data = np.random.randn(1000, 100, 10)
school = az.load_arviz_data("centered_eight").posterior["mu"].values

In [18]:
Numba.disable_numba()
Numba.numba_flag

False

In [19]:
%timeit ks_summary(summary_data)

52.6 ms ± 765 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [20]:
%timeit ks_summary(school)

1.06 ms ± 89.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [21]:
Numba.enable_numba()
Numba.numba_flag

True

In [22]:
%timeit ks_summary(summary_data)

8.51 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [23]:
%timeit ks_summary(school)

1.18 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


**Numba has provided a substantial speedup once again.**