<a href="https://colab.research.google.com/github/JacobDowns/CSCI-491-591/blob/main/lecture3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numba




* Numba translates Python functions to optimized machine code at runtime using LLVM compiler library
*   Numba-compiled numerical algorithms can approach the speeds of C or Fortran



## Lazy Compilation
* Numba's central features is the numba.jit() decorator
* This will mark a function for optimization by Numba's JIT compiler
* A decorator is a way to uniformly modify functions in a particular way
* You can think of them as functions that take functions as input and produce function as output

## Bubblesort Example
Here's a naoive implementation of bubble sort in Python.

In [18]:
from numba import jit, njit
import numpy as np

def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

Let's do a little basic profiling to get a sense of its performance.

In [19]:
N = 5000
x = np.linspace(0., 1., N)
shuffled = x.copy()
np.random.shuffle(shuffled)
sorted = shuffled.copy()

In [20]:
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

7.23 s ± 936 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Well, it's pretty slow. Here is the same implementation in Numba. Here data types are inferred by Numba and a function with the appropriate type signatures is compiled.

In [3]:
from numba import jit, njit

@jit
def bubblesort_numba(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [9]:
%timeit sorted[:] = shuffled[:]; bubblesort_numba(sorted)

21.4 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


If you do not want data types to be inferred by Numba you can specify types explicitly.

In [5]:
from numba import int32

@jit(int32(int32, int32))
def f(x, y):
    # A somewhat trivial example
    return x + y

In [6]:
a = 1
b = 2
f(a,b)

3

Numba-compiled functions can call other compiled functions including many NumPy functions.

In [8]:
import math

@jit
def square(x):
    return x ** 2

@jit
def hypot(x, y):
    return math.sqrt(square(x) + square(y))

In [14]:
(hypot(3, 4), square(10))

(5.0, 100)

From the Numba docs, Numba supports a number of types including these common ones:
* void is the return type of functions returning nothing (which actually return None when called from Python)

* intp and uintp are pointer-sized integers (signed and unsigned, respectively)

* intc and uintc are equivalent to C int and unsigned int integer types

* int8, uint8, int16, uint16, int32, uint32, int64, uint64 are fixed-width integers of the corresponding bit width (signed and unsigned)

* float32 and float64 are single- and double-precision floating-point numbers, respectively

* complex64 and complex128 are single- and double-precision complex numbers, respectively

* array types can be specified by indexing any numeric type, e.g. float32[:] for a one-dimensional single-precision array or int8[:,:] for a two-dimensional array of 8-bit integers.



## Compilation Options
* Numba has two compilation modes, nopython mode and object mode
* In nopython mode, the Numba will generate code that does not access the Python C API
* This yields the highest performance, but requires all native types of values can be inferred
* In object mode, the Numba compiler generates code that handles all values as Python objects and uses the C API to perform operations on those objercts
* This is typically not much faster than standard Python

In [9]:
@jit("void(f4[:])",nopython=True)
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

There is also a shortand decorator for nopython mode `@njit`.

In [12]:
@njit
def add(x, y):
    return x + y

Certain things don't work in nopython mode such as most things that aren't using primitive data types.

In [14]:
from numba import njit
from decimal import Decimal

def bubblesort(X):
    N = len(X)
    # This is an issue
    val = Decimal(100)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

## NumPy Universal Functions

* Numba's @vectorize decorator converts scalard functions to NumPy ufuncs
* A ufunc or universal function operates on ndarrays in an element-wise fashion
* They support broadcasting, type casting, and other standard Numpy features

In [15]:
import numpy as np
from numba import vectorize, int64
@vectorize([int64(int64, int64)])
def vec_add(x, y):
    return x + y

a = np.arange(6, dtype=np.int64)
b = np.linspace(0, 10, 6, dtype=np.int64)
print(vec_add(a, a))
print(vec_add(b, b))

[ 0  2  4  6  8 10]
[ 0  4  8 12 16 20]


## What Works in Numba?

This works well!

In [16]:
from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))

[[  9.  10.  11.  12.  13.  14.  15.  16.  17.  18.]
 [ 19.  20.  21.  22.  23.  24.  25.  26.  27.  28.]
 [ 29.  30.  31.  32.  33.  34.  35.  36.  37.  38.]
 [ 39.  40.  41.  42.  43.  44.  45.  46.  47.  48.]
 [ 49.  50.  51.  52.  53.  54.  55.  56.  57.  58.]
 [ 59.  60.  61.  62.  63.  64.  65.  66.  67.  68.]
 [ 69.  70.  71.  72.  73.  74.  75.  76.  77.  78.]
 [ 79.  80.  81.  82.  83.  84.  85.  86.  87.  88.]
 [ 89.  90.  91.  92.  93.  94.  95.  96.  97.  98.]
 [ 99. 100. 101. 102. 103. 104. 105. 106. 107. 108.]]


This makes Numba sad because it doesn't understand more complex objects.

In [17]:
from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit(forceobj=True, looplift=True) # Need to use object mode, try and compile loops!
def use_pandas(a): # Function will not benefit from Numba jit
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

      a      b
a   1.0   10.0
b  10.0  100.0


## A Mental Moodel for Numba
It can be helpful to have a general conceptual overview of how Numba performs JIT compilation.

1. You decorate a function (e.g., @njit). Numba doesn't compile immediately (unless you supplied an explicit signature). Instead it creates a dispatcher object that waits for the first call.

2. First call triggers specialization. The dispatcher looks at the argument types (e.g., int64, float64[:]) and either reuses an existing compiled version or builds a new specialization for that exact signature. Future calls with the same types reuse it.

3. Compilation pipeline (per specialization):

  * Typing (type inference): Numba walks your Python AST/bytecode and assigns Numba types to every value and operation. This either succeeds (pure “nopython” path) or falls back to object mode (much slower, uses Python/C API), or errors if you asked for nopython=True.


  * Lowering to IR and LLVM: The typed IR is lowered to LLVM IR via llvmlite; then LLVM optimizes and emits machine code.
  numba.readthedocs.io

  * Boxing/unboxing boundaries: Values are converted to/from Python objects only at the call boundary (entering/exiting compiled code). In strict nopython, these conversions are minimized/avoided; in object mode they;re everywhere.


4. Execution & caching:

  * The compiled machine code is stored in-memory on the dispatcher keyed by signature.

  * If you pass cache=True, Numba also keeps a disk cache (e.g., under __pycache__) so a new Python process can skip recompilation for the same function/signature/options. (Not every function is cacheable.)
