# A detailed Guide to Numba

### Credits to: [Trigram](https://www.kaggle.com/code/nxrprime/a-detailed-guide-to-numba)

NUmba is a just-in-time Python compiler to get back at those C++ bullies. It is really helpful with code that uses NumPy arrays.

Numba contains a huge amount of decorators, which can be applied to your functions to instruct Numba to get that compiling done. When a call is made to Numba decorators, it is compiled to machine code and can run at the speed of **machine code**.

Numba works with:
* Windows, OS X and Linux (OS)
* x86, x86_64 (architecture)
* Nvidia CUDA (GPU)
* Latest version of NumPy
* CPython

# 1. How to use it (basic)

If your code involves a lot of mathematical heavy lifting, or involves a ton of NumPy arrays, then Numba is perfectly suited to run. In this example, we'll use the `@jit` decorator, Numba's most basic.

In [1]:
import numpy as np
from numba import jit

In [2]:
x = np.arange(102).reshape(17, 6)

In [3]:
x

array([[  0,   1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17],
       [ 18,  19,  20,  21,  22,  23],
       [ 24,  25,  26,  27,  28,  29],
       [ 30,  31,  32,  33,  34,  35],
       [ 36,  37,  38,  39,  40,  41],
       [ 42,  43,  44,  45,  46,  47],
       [ 48,  49,  50,  51,  52,  53],
       [ 54,  55,  56,  57,  58,  59],
       [ 60,  61,  62,  63,  64,  65],
       [ 66,  67,  68,  69,  70,  71],
       [ 72,  73,  74,  75,  76,  77],
       [ 78,  79,  80,  81,  82,  83],
       [ 84,  85,  86,  87,  88,  89],
       [ 90,  91,  92,  93,  94,  95],
       [ 96,  97,  98,  99, 100, 101]])

Watch `@jit` in work now:

In [4]:
@jit(nopython=True)
def example1(a): # Function is compiled to machine code when called the first time
    trace = 0
    for i in range(a.shape[0]):   
        trace += np.tanh(a[i, i]) 
    return a + trace              

print(example1(x))

[[ 14.99999834  15.99999834  16.99999834  17.99999834  18.99999834
   19.99999834]
 [ 20.99999834  21.99999834  22.99999834  23.99999834  24.99999834
   25.99999834]
 [ 26.99999834  27.99999834  28.99999834  29.99999834  30.99999834
   31.99999834]
 [ 32.99999834  33.99999834  34.99999834  35.99999834  36.99999834
   37.99999834]
 [ 38.99999834  39.99999834  40.99999834  41.99999834  42.99999834
   43.99999834]
 [ 44.99999834  45.99999834  46.99999834  47.99999834  48.99999834
   49.99999834]
 [ 50.99999834  51.99999834  52.99999834  53.99999834  54.99999834
   55.99999834]
 [ 56.99999834  57.99999834  58.99999834  59.99999834  60.99999834
   61.99999834]
 [ 62.99999834  63.99999834  64.99999834  65.99999834  66.99999834
   67.99999834]
 [ 68.99999834  69.99999834  70.99999834  71.99999834  72.99999834
   73.99999834]
 [ 74.99999834  75.99999834  76.99999834  77.99999834  78.99999834
   79.99999834]
 [ 80.99999834  81.99999834  82.99999834  83.99999834  84.99999834
   85.99999834]
 [ 8

I can hear you guys "What's that `nopython=True` do?"

Well, `nopython=True` allows Numba to compile your code **without** the interference of the Python interpreter, allowing your code to clock C++-level speeds (take that, you `cpp` bullies)

But Numba is horrid on this:

In [5]:
x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

import pandas as pd
@jit
def use_pandas(a): 
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

Compilation is falling back to object mode WITH looplifting enabled because Function "use_pandas" failed type inference due to: [1m[1mnon-precise type pyobject[0m
[0m[1m[1] During: typing of argument at <ipython-input-5-9edb3ca9ad66> (6)[0m
[1m
File "<ipython-input-5-9edb3ca9ad66>", line 6:[0m
[1mdef use_pandas(a): 
[1m    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
[0m    [1m^[0m[0m
[0m
  @jit
[1m
File "<ipython-input-5-9edb3ca9ad66>", line 5:[0m
[1m@jit
[1mdef use_pandas(a): 
[0m[1m^[0m[0m
[0m
  state.func_ir.loc))
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
[1m
File "<ipython-input-5-9edb3ca9ad66>", line 5:[0m
[1m@jit
[1mdef use_pandas(a): 
[0m[1m^[0m[0m
[0m


      a      b
a   1.0   10.0
b  10.0  100.0


You can see that Numba does not understand Pandas, which means that Pandas will not benefit from `@jit`.

# 2. How to measure the performance of Numba?

Once the compilation has taken place Numba runs the machine code version of your function. If it is called again the with same types, it can reuse the cached version instead of having to compile again.

A common mistake when measuring performance is not accounting for the above behaviour and to time code once with a simple timer that includes the time taken to compile your function in the execution time.

For example:

In [6]:
import time

x = np.arange(100).reshape(10, 10)

@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
    trace = 0
    for i in range(a.shape[0]):
        trace += np.tanh(a[i, i])
    return a + trace

# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.time()
go_fast(x)
end = time.time()
print("Elapsed (with compilation) = %s" % (end - start))

# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.time()
go_fast(x)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))

Elapsed (with compilation) = 0.24253249168395996
Elapsed (after compilation) = 8.463859558105469e-05


# 3. @vectorize

**Numba’s vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs** <br><br> NumPy ufuncs are not the most straightforward process and involves writing C code. Numba makes this easy. Using the vectorize() decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional C ufuncs.

The vectorize() decorator has two modes of operation:

* Eager, or decoration-time, compilation
* Lazy, or call-time, compilation

In the basic case, only one signature will be passed:

In [7]:
from numba import vectorize, float64, int32, int64, float32

@vectorize([float64(float64, float64)])
def example2(x, y):
    return x + y

If you pass several signatures:

In [8]:
@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    return x + y

In [9]:
start = time.time()
f(9, 9.9)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))

Elapsed (after compilation) = 0.00014829635620117188


# 4. @jitclass

Numba supports code generation for classes via the `numba.jitclass()` decorator. A class can be marked for optimization using this decorator along with a specification of the types of each field. We call the resulting class object a `jitclass`. 

All methods of a `jitclass` are compiled into nopython functions. The data of a `jitclass` instance is allocated on the heap as a C-compatible structure so that any compiled functions can have direct access to the underlying data, bypassing the interpreter.

Here's an example of `jitclass`:

In [10]:
import numpy as np
from numba import jitclass          # import the decorator

spec = [
    ('value', int32),               # a simple scalar field
    ('array', float32[:]),          # an array field
]

@jitclass(spec)
class Bag(object):
    def __init__(self, value):
        self.value = value
        self.array = np.zeros(value, dtype=np.float32)

    @property
    def size(self):
        return self.array.size

    def increment(self, val):
        for i in range(self.size):
            self.array[i] = val
        return self.array

# 5. cfunc

The `@cfunc` decorator has a similar usage to `@jit`, but with an important difference: **a single signature is mandatory**. It determines the signature of the C callback:

In [11]:
from numba import cfunc

@cfunc("float64(float64, float64)")
def add(x, y):
    return x + y


# 6. Stencil

Stencils are  common computational patters where array elements are updated according to a **stencil kernel**. Numba provides `@stencil` so users can specify a stencil kernel and then Numba will update the array elements with accordance to the stencil kernels.

In [12]:
from numba import stencil

@stencil
def kernel1(a):
    return 0.25 * (a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0])

# 7. Resources

* At SciPy 2017: https://www.youtube.com/watch?v=1AwG0T4gaO0
* By EuroPython: https://www.youtube.com/watch?v=UaFSnaYh2b8
* Medium: https://towardsdatascience.com/speed-up-your-algorithms-part-2-numba-293e554c5cc1