<a href="https://colab.research.google.com/github/cagBRT/PerformanceEnhancement/blob/main/Numba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/Intro-to-Pandas.git cloned-repo
%cd cloned-repo

In [None]:
import pandas as pd


In [None]:
adult_income = pd.read_csv("adult.csv")

# Numba

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

Let’s improve the runtime even more.

The first candidate is Numba.

We install it via pip (pip install numba) and import it. Then, we will decorate our crazy_function with its jit function. JIT stands for just in time, and it translates pure Python and NumPy code to native machine instructions, giving massive speed-ups.

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.



In [None]:
!pip install numba

# Will Numba work for my code?
This depends on what your code looks like, if your code is numerically orientated (does a lot of math), uses NumPy a lot and/or has a lot of loops, then Numba is often a good choice.

In these examples we’ll apply the most fundamental of Numba’s JIT decorators, @jit, to try and speed up some functions to demonstrate what works well and what does not.



In [None]:
import numpy as np
import numba
from numba import njit
import random

In [None]:
from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))

It won’t work very well, if at all, on code that looks like this:

In [None]:
from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit(nopython=True)
def use_pandas(a): # Function will not benefit from Numba jit
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

Numba uses Python decorators to transform Python functions into functions that compile themselves. The most common Numba decorator is @jit, which creates a normal function for execution on the CPU.

Numba works best on numerical functions that make use of NumPy arrays. Here's an example:

The nopython=True option requires that the function be fully compiled (so that the Python interpreter calls are completely removed), otherwise an exception is raised. These exceptions usually indicate places in the function that need to be modified in order to achieve better-than-Python performance. We strongly recommend always using nopython=True

In [None]:
@njit(nopython=True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    # assuming square input matrix
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

The function has not yet been compiled.  To do that, we need to call the function:

In [None]:
x = np.arange(10000).reshape(100, 100)
%time go_fast(x)

In [None]:
%time go_fast(2*x)

To benchmark Numba-compiled functions, it is important to time them without including the compilation step, since the compilation of a given function will only happen once for each set of input types, but the function will be called many times.

In a notebook, the %timeit magic function is the best to use because it runs the function many times in a loop to get a more accurate estimate of the execution time of short functions.

Let's compare to the uncompiled function.  Numba-compiled function have a special `.py_func` attribute which is the original uncompiled Python function.  We should first verify we get the same results:

In [None]:
%time go_fast(x)
%time np.testing.assert_array_equal(go_fast(x), go_fast(x))

In [None]:
def go_numpy(a):
    return a + np.tanh(np.diagonal(a)).sum()

%time np.testing.assert_array_equal(go_numpy(x), go_fast(x))

The NumPy version is more than 2x faster than Python, but still 10x slower than Numba.

In [None]:
%timeit go_numpy(x)

In [None]:
@njit
def crazy_function(col1, col2, col3):
    return (col1 ** 3 + col2 ** 2 + col3 * 10) ** 0.5

We achieved about 1.5 times speed-up.

**Note that Numba works best with functions that involve many native Python loops, a lot of math, and, even better, NumPy functions and arrays.**

Supported Python Features
Numba works best when used with NumPy arrays, but Numba also supports other data types out of the box:

int, float
tuple, namedtuple
list (with some restrictions)
... and others. See the Reference Manual for more details.
In particular, tuples are useful for returning multiple values from functions:

In [None]:
import random

@njit(nopython=True)
def spherical_to_cartesian(r, theta, phi):
    '''Convert spherical coordinates (physics convention) to cartesian coordinates'''
    sin_theta = np.sin(theta)
    x = r * sin_theta * np.cos(phi)
    y = r * sin_theta * np.sin(phi)
    z = r * np.cos(theta)

    return x, y, z # return a tuple

@njit(nopython=True)
def random_directions(n, r):
    '''Return ``n`` 3-vectors in random directions with radius ``r``'''
    out = np.empty(shape=(n,3), dtype=np.float64)

    for i in range(n):
        # Pick directions randomly in solid angle
        phi = random.uniform(0, 2*np.pi)
        theta = np.arccos(random.uniform(-1, 1))
        # unpack a tuple
        x, y, z = spherical_to_cartesian(r, theta, phi)
        out[i] = x, y, z

    return out

In [None]:
%time random_directions(10, 1.0)