## What is Numba?

a JIT (Just-in-Time) compiler for Python that:



- generates optimized machine code using LLVM (Low Level Virtual Machine) compiler infrastructure


- provides toolbox for different targets and execution models:
    - Single-threaded CPU, multi-threaded CPU, GPU
    - regular functions, "universal functions (ufuncs)" (array functions), etc


- integrates well with the Scientific Python stack


- with a few annotations, array-oriented and math-heavy Python code provides: 
 - speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python)
  - performance similar to C, C++, Fortran, without having to switch languages or Python interpreters


- is **totally awesome!**

## Basic Example

### Lazy Compilation

- Use `@jit` decorator
- Let Numba decide when and how to optimize

In [None]:
import numpy as np 
from numba import jit

In [None]:
@jit
def do_math(x, y):
    return x + y

In this mode:

- The compilation will be deferred until the first execution
- Numba will:
    - infer the argument types at call time
    - generate optimized code based on this information
- Numba will also be able to compile separate specializations depending on the input types. For instance, calling `do_math()` with integer or complex numbers will generate different code paths:

In [None]:
do_math.inspect_types()

In [None]:
%time do_math(1, 2)

In [None]:
%time do_math(1, 2)


**What is Numba doing to make code run quickly?**

Numba examines Python bytecode and then translates this into an 'intermediate representation'.  To view this IR, after running (compiling) `do_math()` and you can access the `inspect_types` method.

In [None]:
do_math.inspect_types()

In [None]:
%time do_math(1j, 2)

In [None]:
%time do_math(1j, 2)

In [None]:
do_math.inspect_types()

### Eager compilation

- Tell Numba the function signature you are expecting

In [None]:
from numba import int32

In [None]:
@jit(int32(int32, int32))
def eager_do_math(x, y):
    return x + y

In [None]:
%time eager_do_math(1, 2)

In [None]:
%time eager_do_math(1.0, 2.0)

In [None]:
%time eager_do_math(1j, 2)

## How does Numba work?

![](./images/how-does-numba-work.png)

Source: [Scaling Python Up and Out with Numba and Dask — Travis Oliphant](https://speakerdeck.com/teoliphant/scaling-python-up-and-out-with-numba-and-dask?slide=37)



### What about the actual LLVM code?
You can see the actual LLVM code generated by Numba using the `inspect_llvm()` method. 

In [None]:
for key, value in do_math.inspect_llvm().items():
    print(key, value)

**But there's a caveat....**

## Compilation Options

Numba has two compilation modes:

- **nopython mode (recommended and best-practice way)**: produces much faster code by running the code without the involvement of the Python interpreter.

- **object mode (should be avoided)**: Numba falls back to this mode when `nopython` mode fails.

To illustrate the above, let's watch what happens when we try to do something that is natural in Python (concatenating strings), but not particularly mathematically sound:

In [None]:
%time do_math('Hello', 'World')

In [None]:
do_math.inspect_types()

`do_math (unicode_type, unicode_type)` means that is has been compiled in `object` mode. 

To prevent Numba from falling back, and instead raise an error, we need to pass `nopython=True` to `@jit` decorator:

In [None]:
@jit
def f(x, y): # Function will not befenit from Numba jit
    a = str(x) * 10 # Numba doesn't know about str
    b = str(y)
    return a + b 

In [None]:
%timeit f(1, 2)

In [None]:
@jit(nopython=True) # Fore nopython mode
def f(x, y): # Function will not befenit from Numba jit
    a = str(x) * 10 # Numba doesn't know about str
    b = str(y)
    return a + b 

In [None]:
%timeit f(1, 2)

## Benchmarks using the all pairwise distance function

### Pure Python Version

In [None]:
def allpairs_distances_python(X,Y):
    result = np.zeros( (X.shape[0], Y.shape[0]), X.dtype)
    for i in range(X.shape[0]):
        for j in range(Y.shape[0]):
            result[i,j] = np.sum( (X[i,:] - Y[j,:]) ** 2)
    return result 

In [None]:
N = 1000 
X, Y = np.random.randn(200, N), np.random.randn(400, N)
X.shape, Y.shape 

In [None]:
pure_python = %timeit -o allpairs_distances_python(X, Y)

In [None]:
%load_ext line_profiler

In [None]:
%lprun -f allpairs_distances_python allpairs_distances_python(X,Y)

### Numba Version

In [None]:
from numba import jit

@jit(nopython=True)
def allpairs_distances_numba(X,Y):
    result = np.zeros((X.shape[0], Y.shape[0]), X.dtype)
    for i in range(X.shape[0]):
        for j in range(Y.shape[0]):
            result[i,j] = np.sum( (X[i,:] - Y[j,:]) ** 2)
    return result 

I should emphasize that this is the exact same code, except for numba's `jit` decorator. The results are pretty astonishing:

In [None]:
numba_version = %timeit -o allpairs_distances_numba(X,Y)

In [None]:
pure_python.best / numba_version.best 

<p style="color: red;">This is a ~ 4x speed up, simply by adding a numba decorator!</p>

## To be Continued