# Numba Basics

*Parts of this notebook courtesy of Anaconda*

### What is Numba?

Numba is a **just-in-time**, **type-specializing**, **function compiler** for accelerating **numerically-focused** Python.  That's a long list, so let's break down those terms:

 * **function compiler**: Numba compiles Python functions, not entire applications, and not parts of functions
     * Numba does not replace your Python interpreter
     * but is just another Python module that can turn a function into a (usually) faster function. 
 * **type-specializing**: Numba speeds up your function by generating a specialized implementation for the specific data types you are using
     * Python functions are designed to operate on generic data types, which makes them very flexible, but also very slow.
     * In practice, you only will call a function with a small number of argument types, so Numba will generate a fast implementation for each set of types.
 * **just-in-time**: Numba translates functions when they are first called.
     * Ensures the compiler knows what argument types you will be using.
     * Also allows Numba to be used interactively in a Jupyter notebook just as easily as a traditional application
 * **numerically-focused**: Currently, Numba is focused on numerical data types, like `int`, `float`, and `complex`.
     * Very limited string processing support, and many string use cases are not going to work well on the GPU
         * (for more string-related processing, there is a separate high-level library called nvStrings which works with RAPIDS)
     * To get best results with Numba, you will likely be using NumPy arrays.

### Requirements

Numba supports a wide range of operating systems:

 * Windows 7 and later, 32 and 64-bit
 * macOS 10.9 and later, 64-bit
 * Linux (most anything >= RHEL 5), 32-bit and 64-bit

and Python versions:

 * Python 2.7, 3.4+
 * NumPy 1.10 and later

and a very wide range of hardware:

* x86, x86_64/AMD64 CPUs
* NVIDIA CUDA GPUs (Compute capability 3.0 and later, CUDA 8.0 and later)
* AMD GPUs (experimental patches)
* ARM (experimental patches)

### First Steps

Let's write our first Numba function and compile it for the **CPU**.  The Numba compiler is typically enabled by applying a *decorator* to a Python function.  Decorators are functions that transform Python functions.  Here we will use the CPU compilation decorator:

In [None]:
from numba import jit
import math

@jit
def hypot(x, y):
    # Implementation from https://en.wikipedia.org/wiki/Hypot
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)

The above code is equivalent to writing:
``` python
def hypot(x, y):
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)
    
hypot = jit(hypot)
```
This means that the Numba compiler is just a function you can call whenever you want!

Let's try out our hypotenuse calculation:

In [None]:
hypot(3.0, 4.0)

The first time we call `hypot`, the compiler is triggered and compiles a machine code implementation for float inputs.  Numba also saves the original Python implementation of the function in the `.py_func` attribute, so we can call the original Python code to make sure we get the same answer:

In [None]:
hypot.py_func(3.0, 4.0)

### Benchmarking

An important part of using Numba is measuring the performance of your new code.  Let's see if we actually sped anything up.  The easiest way to do this in the Jupyter notebook is to use the `%timeit` magic function.  Let's first measure the speed of the original Python:

In [None]:
%timeit hypot.py_func(3.0, 4.0)

The `%timeit` magic runs the statement many times to get an accurate estimate of the run time.  It also returns the best time by default, which is useful to reduce the probability that random background events affect your measurement.  The best of 3 approach also ensures that the compilation time on the first call doesn't skew the results:

In [None]:
%timeit hypot(3.0, 4.0)

Numba did a pretty good job with this function.  It's more than 4x faster than the pure Python version.

Of course, the `hypot` function is already present in the Python module:

In [None]:
%timeit math.hypot(3.0, 4.0)

Python's built-in is even faster than Numba!  This is because Numba does introduce some overhead to each function call that is larger than the function call overhead of Python itself.  Extremely fast functions (like the above one) will be hurt by this.

(However, if you call one Numba function from another one, there is very little function overhead, sometimes even zero if the compiler inlines the function into the other one.)

## How does Numba work?

The first time we called our Numba-wrapped `hypot` function, the following process was initiated:

![Numba Flowchart](https://materials.s3.amazonaws.com/i/numba_flowchart.png "The compilation process")

We can see the result of type inference by using the `.inspect_types()` method, which prints an annotated version of the source code:

In [None]:
hypot.inspect_types()

Note that Numba's type names tend to mirror the NumPy type names, so a Python `float` is a `float64` (also called "double precision" in other languages).  Taking a look at the data types can sometimes be important in GPU code because the performance of `float32` and `float64` computations will be very different on CUDA devices.  An accidental upcast can dramatically slow down a function.

## When Things Go Wrong

Numba cannot compile all Python code.  Some functions don't have a Numba-translation, and some kinds of Python types can't be efficiently compiled at all (yet).  For example, Numba does not support dictionaries (as of this tutorial):

In [None]:
@jit
def cannot_compile(x):
    return x['key']

cannot_compile(dict(key='value'))

Wait, what happened??  By default, Numba will fall back to a mode, called "object mode," which does not do type-specialization.  Object mode exists to enable other Numba functionality, but in many cases, you want Numba to tell you if type inference fails.  You can force "nopython mode" (the other compilation mode) by passing arguments to the decorator:

In [None]:
@jit(nopython=True)
def cannot_compile(x):
    return x['key']

try:
    cannot_compile(dict(key='value'))
except Exception as err:
    print(err)

Now we get an exception when Numba tries to compile the function, with an error that says:
```
- argument 0: cannot determine Numba type of <class 'dict'>
```
which is the underlying problem.

We will see other `@jit` decorator arguments in future sections.

### Exercise

Below is a function that loops over two input NumPy arrays and puts their sum into the output array.  Modify this function to call the `hypot` function we defined above.  We will learn a more efficient way to write such functions in a future section.

(Make sure to execute all the cells in this notebook so that `hypot` is defined.)

In [None]:
@jit(nopython=True)
def ex1(x, y, out):    
    for i in range(x.shape[0]):   
      out[i] = x[i] + y[i]

In [None]:
import numpy as np

in1 = np.arange(10, dtype=np.float64)
in2 = 2 * in1 + 1
out = np.empty_like(in1)

print('in1:', in1)
print('in2:', in2)

ex1(in1, in2, out)

print('out:', out)

In [None]:
# This test will fail until you fix the ex1 function
try:
    np.testing.assert_almost_equal(out, np.hypot(in1, in2))
except AssertionError as ae:
    print(ae)

## Numba + Pandas/NumPy

Note that Numba cannot manipulate Pandas objects, but Pandas can manipulate Numba-jitted functions.

So trying to `@jit` code that calls `pd.foo(...)` won't improve it (see https://numba.pydata.org/numba-doc/latest/user/5minguide.html)

However, if we jit an expensive operation, we can use it with Pandas. And Numba can create vectorized functions, stencils, etc. for use with NumPy.

In [None]:
import pandas as pd

df = pd.DataFrame(range(100000), columns=['val'])
df

In [None]:
from numba import njit

@njit
def logistic(x):
    return 1 / (1 + math.exp(-x))

In [None]:
%timeit df.applymap(logistic.py_func)

In [None]:
%timeit df.applymap(logistic)

`applymap` operates elementwise ... but Pandas/NumPy work better with vectorized functions

Numba can create a vectorized version for us:

In [None]:
from numba import vectorize, float64

@vectorize([float64(float64)])
def logistic_vec(x):
    return 1 / (1 + math.exp(-x))

In [None]:
%timeit df.apply(logistic_vec)

Is this faster than just using NumPy itself? 

Sometimes

More importantly, it's more versatile, since we can potentially do more with our Python code, and make it more accessible/portable by using fewer NumPy idioms (e.g., we can code conditionals and loops instead of relying on NumPy masking/vectors/broadcasts)

In [None]:
def np_logistic_vec(x):
    return 1 / (1 + np.exp(-x))

In [None]:
%timeit df.apply(np_logistic_vec)

... in this case, we get a significant, if not huge, speedup.