# A detailed Guide to Numba

### Credits to: [Trigram](https://www.kaggle.com/code/nxrprime/a-detailed-guide-to-numba)

NUmba is a just-in-time Python compiler to get back at those C++ bullies. It is really helpful with code that uses NumPy arrays.

Numba contains a huge amount of decorators, which can be applied to your functions to instruct Numba to get that compiling done. When a call is made to Numba decorators, it is compiled to machine code and can run at the speed of **machine code**.

Numba works with:
* Windows, OS X and Linux (OS)
* x86, x86_64 (architecture)
* Nvidia CUDA (GPU)
* Latest version of NumPy
* CPython

# 1. How to use it (basic)

If your code involves a lot of mathematical heavy lifting, or involves a ton of NumPy arrays, then Numba is perfectly suited to run. In this example, we'll use the `@jit` decorator, Numba's most basic.

In [None]:
import numpy as np
from numba import jit

In [None]:
x = np.arange(100).reshape(10, 10)

In [None]:
x

Watch `@jit` in work now:

In [None]:
@jit(nopython=True)
def example1(a: np.ndarray): # Function is compiled to machine code when called the first time
    trace = 0
    for i in range(1000):   
        for i in range(a.shape[0]):   
            trace += a[i, i] 
    return a + trace              


In [None]:
%%timeit
example1(x) # Time the function call

I can hear you guys "What's that `nopython=True` do?"

Well, `nopython=True` allows Numba to compile your code **without** the interference of the Python interpreter, allowing your code to clock C++-level speeds (take that, you `cpp` bullies)

But Numba is horrid on this:

In [None]:
x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

import pandas as pd
@jit
def use_pandas(a): 
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

You can see that Numba does not understand Pandas, which means that Pandas will not benefit from `@jit`.

# 2. How to measure the performance of Numba?

Once the compilation has taken place Numba runs the machine code version of your function. If it is called again the with same types, it can reuse the cached version instead of having to compile again.

A common mistake when measuring performance is not accounting for the above behaviour and to time code once with a simple timer that includes the time taken to compile your function in the execution time.

For example:

In [None]:
import time

x = np.arange(100).reshape(10, 10)

@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
    trace = 0
    for i in range(a.shape[0]):
        trace += np.tanh(a[i, i])
    return a + trace

# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.time()
go_fast(x)
end = time.time()
print("Elapsed (with compilation) = %s" % (end - start))

# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.time()
go_fast(x)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))

# 3. @vectorize

**Numba’s vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs** <br><br> NumPy ufuncs are not the most straightforward process and involves writing C code. Numba makes this easy. Using the vectorize() decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional C ufuncs.

The vectorize() decorator has two modes of operation:

* Eager, or decoration-time, compilation
* Lazy, or call-time, compilation

In the basic case, only one signature will be passed:

In [None]:
from numba import vectorize, float64, int32, int64, float32

@vectorize([float64(float64, float64)])
def example2(x, y):
    return x + y

If you pass several signatures:

In [None]:
@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    return x + y

In [None]:
start = time.time()
f(9, 9.9)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))

# 4. @jitclass

Numba supports code generation for classes via the `numba.jitclass()` decorator. A class can be marked for optimization using this decorator along with a specification of the types of each field. We call the resulting class object a `jitclass`. 

All methods of a `jitclass` are compiled into nopython functions. The data of a `jitclass` instance is allocated on the heap as a C-compatible structure so that any compiled functions can have direct access to the underlying data, bypassing the interpreter.

Here's an example of `jitclass`:

In [None]:
import numpy as np
from numba import jitclass          # import the decorator

spec = [
    ('value', int32),               # a simple scalar field
    ('array', float32[:]),          # an array field
]

@jitclass(spec)
class Bag(object):
    def __init__(self, value):
        self.value = value
        self.array = np.zeros(value, dtype=np.float32)

    @property
    def size(self):
        return self.array.size

    def increment(self, val):
        for i in range(self.size):
            self.array[i] = val
        return self.array

# 5. cfunc

The `@cfunc` decorator has a similar usage to `@jit`, but with an important difference: **a single signature is mandatory**. It determines the signature of the C callback:

In [None]:
from numba import cfunc

@cfunc("float64(float64, float64)")
def add(x, y):
    return x + y


# 6. Stencil

Stencils are  common computational patters where array elements are updated according to a **stencil kernel**. Numba provides `@stencil` so users can specify a stencil kernel and then Numba will update the array elements with accordance to the stencil kernels.

In [None]:
from numba import stencil

@stencil
def kernel1(a):
    return 0.25 * (a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0])

# 7. Resources

* At SciPy 2017: https://www.youtube.com/watch?v=1AwG0T4gaO0
* By EuroPython: https://www.youtube.com/watch?v=UaFSnaYh2b8
* Medium: https://towardsdatascience.com/speed-up-your-algorithms-part-2-numba-293e554c5cc1