# Numba


Numba is a "just in time" compiler for Python.  

* Translates Python bytecode to machine code
* Useful for highly mathematical operations, especially those involving `for` loops
* Compatible with `numpy`
* Compatible with `scipy` for `numpy.linalg` operations


# Example


In [3]:
from numba import jit

In [4]:
def fib(a):
    if (a <= 0):
        return -1
    elif (a == 1):
        return 0
    elif ( a == 2 or a == 3):
        return 1
    else:
        return fib(a-2) + fib(a-1)

In [5]:
@jit(nopython=True)
def fib_numba(a):
    if (a <= 0):
        return -1
    elif (a == 1):
        return 0
    elif ( a == 2 or a == 3):
        return 1
    else:
        return fib_numba(a-2) + fib_numba(a-1)

In [6]:
%timeit fib(10)

13.1 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [7]:
%timeit fib_numba(10)

302 ns ± 7.26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# Works with `numpy`


In [8]:
def total_times_array(x):
    sum_multiple = 0
    for item in x:
        sum_multiple += item
    return sum_multiple * x

In [9]:
@jit(nopython=True)
def total_times_array_numba(x):
    sum_multiple = 0
    for item in x:
        sum_multiple += item
    return sum_multiple * x

In [10]:
import numpy as np
x = np.arange(1000)

In [11]:
%timeit total_times_array(x)

104 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [13]:
%timeit total_times_array_numba(x)

905 ns ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# `nopython` mode

 * If not set to `True`, and Numba compilation fails, it will "fall back" to plain Python
 * Recommended to ALWAYS USE if possible.
 * `@njit` is shorthange for `@jit(nopython=True)`


# Loop parallelization


In [17]:
from numba import njit, prange

@njit(parallel=True)
def total_times_array_numba_parallel(x):
    sum_multiple = 0
    for item in prange(x.shape[0]):
        sum_multiple += item
    return sum_multiple * x

x = np.arange(1000000)

In [18]:
%timeit total_times_array_numba(x)

1.07 ms ± 5.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [19]:
%timeit total_times_array_numba_parallel(x)

586 µs ± 37 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Other things of interest

 * `fastmath=True` - removes strict IEEE 754 compliance (relaxes some numerical rigor)
 * `cffi=True` - allows calling `cffi` functions
 * [NVIDIA CUDA](https://developer.nvidia.com/cuda-zone) support
 * [Numba Documentation](http://numba.pydata.org/)


In [20]:
%%javascript
function hideElements(elements, start) {
    for(var i = 0, length = elements.length; i < length;i++) {
        if(i >= start) {
            elements[i].style.display = "none";
        }
    }
}

var prompt_elements = document.getElementsByClassName("prompt");
hideElements(prompt_elements, 0)

<IPython.core.display.Javascript object>