# Numba Basics

Numba is a just-in-time compiler of Python functions.  It translates a Python function when it is called into a machine code equivalent that runs anywhere from 2x (simple NumPy operations) to 100x (complex Python loops) faster.  In this notebook, we show some basic examples of using Numba.

In [1]:
import numpy as np
import numba
from numba import jit

Let's check which version of Numba we have:

In [2]:
print(numba.__version__)

0.45.1


Numba uses Python *decorators* to transform Python functions into functions that compile themselves.  The most common Numba decorator is `@jit`, which creates a normal function for execution on the CPU.

Numba works best on numerical functions that make use of NumPy arrays.  Here's an example:

In [3]:
@jit(nopython=True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0
    # assuming square input matrix
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

The `nopython=True` option requires that the function be fully compiled (so that the Python interpreter calls are completely removed), otherwise an exception is raised.  These exceptions usually indicate places in the function that need to be modified in order to achieve better-than-Python performance.  We strongly recommend always using `nopython=True`.

The function has not yet been compiled.  To do that, we need to call the function:

In [4]:
x = np.arange(100).reshape(10, 10)
go_fast(x)

array([[  9.,  10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.],
       [ 19.,  20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.],
       [ 29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.],
       [ 39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.],
       [ 49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.],
       [ 59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.],
       [ 69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.],
       [ 79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.],
       [ 89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.],
       [ 99., 100., 101., 102., 103., 104., 105., 106., 107., 108.]])

This first time the function was called, a new version of the function was compiled and executed.  If we call it again, the previously generated function executions without another compilation step.

In [5]:
go_fast(2*x)

array([[  9.,  11.,  13.,  15.,  17.,  19.,  21.,  23.,  25.,  27.],
       [ 29.,  31.,  33.,  35.,  37.,  39.,  41.,  43.,  45.,  47.],
       [ 49.,  51.,  53.,  55.,  57.,  59.,  61.,  63.,  65.,  67.],
       [ 69.,  71.,  73.,  75.,  77.,  79.,  81.,  83.,  85.,  87.],
       [ 89.,  91.,  93.,  95.,  97.,  99., 101., 103., 105., 107.],
       [109., 111., 113., 115., 117., 119., 121., 123., 125., 127.],
       [129., 131., 133., 135., 137., 139., 141., 143., 145., 147.],
       [149., 151., 153., 155., 157., 159., 161., 163., 165., 167.],
       [169., 171., 173., 175., 177., 179., 181., 183., 185., 187.],
       [189., 191., 193., 195., 197., 199., 201., 203., 205., 207.]])

To benchmark Numba-compiled functions, it is important to time them without including the compilation step, since the compilation of a given function will only happen once for each set of input types, but the function will be called many times.

In a notebook, the `%timeit` magic function is the best to use because it runs the function many times in a loop to get a more accurate estimate of the execution time of short functions.

In [6]:
%timeit go_fast(x)

1.5 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Let's compare to the uncompiled function.  Numba-compiled function have a special `.py_func` attribute which is the original uncompiled Python function.  We should first verify we get the same results:

In [7]:
np.testing.assert_array_equal(go_fast(x), go_fast.py_func(x))

And test the speed of the Python version:

In [8]:
%timeit go_fast.py_func(x)

36.8 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The original Python function is more than 20x slower than the Numba-compiled version.  However, the Numba function used explicit loops, which are very fast in Numba and not very fast in Python.  Our example function is so simple, we can create an alternate version of `go_fast` using only NumPy array expressions:

In [9]:
def go_numpy(a):
    return a + np.tanh(np.diagonal(a)).sum()

In [10]:
np.testing.assert_array_equal(go_numpy(x), go_fast(x))

In [11]:
%timeit go_numpy(x)

15.1 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The NumPy version is more than 2x faster than Python, but still 10x slower than Numba.

### Supported Python Features

Numba works best when used with NumPy arrays, but Numba also supports other data types out of the box:

* `int`, `float`
* `tuple`, `namedtuple`
* `list` (with some restrictions)
* ... and others.  See the [Reference Manual](https://numba.pydata.org/numba-doc/latest/reference/pysupported.html) for more details.

In particular, tuples are useful for returning multiple values from functions:

In [12]:
import random

@jit(nopython=True)
def spherical_to_cartesian(r, theta, phi):
    '''Convert spherical coordinates (physics convention) to cartesian coordinates'''
    sin_theta = np.sin(theta)
    x = r * sin_theta * np.cos(phi)
    y = r * sin_theta * np.sin(phi)
    z = r * np.cos(theta)
    
    return x, y, z # return a tuple
    
@jit(nopython=True)
def random_directions(n, r):
    '''Return ``n`` 3-vectors in random directions with radius ``r``'''
    out = np.empty(shape=(n,3), dtype=np.float64)
    
    for i in range(n):
        # Pick directions randomly in solid angle
        phi = random.uniform(0, 2*np.pi)
        theta = np.arccos(random.uniform(-1, 1))
        # unpack a tuple
        x, y, z = spherical_to_cartesian(r, theta, phi)
        out[i] = x, y, z
    
    return out

In [13]:
random_directions(10, 1.0)

array([[-0.32103975,  0.88317262,  0.34196433],
       [-0.39104033,  0.68741411,  0.61200433],
       [-0.28485983,  0.11513131,  0.95163   ],
       [-0.17054732, -0.55971025,  0.81094886],
       [-0.32740123,  0.70326313,  0.6310542 ],
       [-0.60436668, -0.54019596, -0.5856016 ],
       [ 0.63577627,  0.04126427, -0.77076961],
       [ 0.05729948, -0.07325534, -0.99566582],
       [-0.54414877,  0.74117575,  0.39314199],
       [ 0.74265823, -0.44089836,  0.50405098]])

When Numba is translating Python to machine code, it uses the [LLVM](https://llvm.org/) library to do most of the optimization and final code generation.  This automatically enables a wide range of optimizations that you don't even have to think about.  If we were to inspect the output of the compiler for the previous random directions example, we would find that:

* The function body for `spherical_to_cartesian()` was inlined directly into the body of the for loop in `random_directions`, eliminating the overhead of making a function call.
* The separate calls to `sin()` and `cos()` were combined into a single, faster call to an internal `sincos()` function.

These kinds of cross-function optimizations are one of the reasons that Numba can sometimes outperform compiled NumPy code.