# More Numba

https://numba.pydata.org/numba-doc/0.12.2/tutorial_firststeps.html

This is an example from the Numba introduction

In [1]:
import numpy as np
import numba

This will look at a bubble sort as an example of code that Numba can speed up

In [2]:
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

Don't write your own bubblesort!   This uses nested loops,  loops are bad!

In practice,  use a sort function from a library.   An insertion sort always runs faster in any case

But for demonstration purposes, here we are with the bubblesort

Set up an array, then shuffle it

We will shuffle it, then sort the shuffled version and compare to the original

In [3]:
original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)

Here is the sort of the shuffled version and then the comparison

In [4]:
sorted = shuffled.copy()
bubblesort(sorted)
print(np.array_equal(sorted, original))

True


Now we will use the timeit magic function (ie a Jupyter Notebook function) to time the execution of the code

In [5]:
sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

307 ms ± 6.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


There are two ways to ask Numba to compile a function, either explicitly or by "decorating" a function with an indicator 
that Numba should compile it the first time it is used

Here is the explicit function call to Numba to compile our bubblesort function

In the function call to numba.jit  (Just In Time??) we have an input parameter "void(f4(:))"

The input parameter here is called a dispatcher

I think this one means void(Float4 array)-   so the return is a void,  and the input is a 4 byte float array

Here is the quote from the Numba online manual

"A signature contains the return type as well as the argument types. One way to specify the signature is using a string, like in our example. The signature takes the form: <return type> ( <arg1 type>, <arg2 type>, ... ). The types may be scalars or arrays (NumPy arrays). In our example, void(f4[:]), it means a function with no return (return type is void) that takes as unique argument an one-dimensional array of 4 byte floats f4[:]. Starting with numba version 0.12 the result type is optional. In that case the signature will look like the following: <arg1 type>, <arg2 type>, .... When the signature doesn’t provide a type for the return value, the type is inferred."




In [6]:
bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)

  bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)


In [7]:
sorted[:] = shuffled[:] # reset to shuffled before sorting
bubblesort_jit(sorted)
print(np.array_equal(sorted, original))

True


Okay, in the line above, we know it works fine, let's get the timing on it

In [8]:
sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)

1.46 ms ± 32.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Holy Cow,  771 microsecond per loop vs 208 millisecond for the non-compiled version

That's 267 times as fast.    

Here is a "decorated" version of the same function, this is a way to indicate to the compiler that the function following 
should be compiled

In [9]:
@numba.jit("void(f4[:])")
def bubblesort_jit2(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

  @numba.jit("void(f4[:])")


In [10]:
sorted[:] = shuffled[:] # reset to shuffled before sorting
bubblesort_jit(sorted)
print(np.array_equal(sorted, original))

True


In [12]:
sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)

787 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# signature examples

See

https://numba.pydata.org/numba-doc/0.12.2/tutorial_firststeps.html


Some sample signatures follow:

signature	meaning

void(f4[:], u8)	a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer.

i4(f8)	a function returning a 32-bit signed integer taking a double precision float as argument.

void(f4[:,:],f4[:,:])	a function with no return value taking two 2-dimensional arrays as arguments.




# Automatic signature estimation

"Starting with numba version 0.12, it is possible to use numba.jit without providing a type-signature for the function. This functionality was provided by numba.autojit in previous versions of numba. The old numba.autojit hass been deprecated in favour of this signature-less version of numba.jit.

When no type-signature is provided, the decorator returns wrapper code that will automatically create and run a numba compiled version when called. When called, resulting function will infer the types of the arguments being used. That information will be used to generated the signature to be used when compiling. The resulting compiled function will be called with the provided arguments.

For performance reasons, functions are cached so that code is only compiled once for a given signature. It is possible to call the function with different signatures, in that case, different native code will be generated and the right version will be chosen based on the argument types.

For most uses, using jit without a signature will be the simplest option.""

---Why the manual didn't tell us this up front,  I don't know....

In [13]:
bubblesort_autojit = numba.jit(bubblesort)

  bubblesort_autojit = numba.jit(bubblesort)


In [14]:
sorted[:] = shuffled[:] 
    
%timeit bubblesort_autojit(sorted)

240 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


Okay, it looks like the auto_jit version actually runs faster than we set the signature.  Dang

Autojit it it is...

# Static Libraries

You can use Numba to create libraries of compiled function that can be imported just like other packages.

This would be worth doing in a project that re-uses a lot of functions


# Using Numba compiled functions with Pandas 

Pandas data frames are used more or less constantly, they work much like R dataframes.

They make life much easier when working with structured data

Pandas uses NumPy "under the hood",  so Numba compiled functions work on dataframes when we use the to_numpy() member funtion to
do the conversions

In [15]:
import pandas as pd

 We will create a Pandas data from from the np array sorted

In [16]:
sorted[:] = shuffled[:] # reset to shuffled before sorting

psorted=pd.DataFrame(sorted,columns=["A"])

psorted.head()

Unnamed: 0,A
0,8.07
1,9.91
2,0.42
3,5.42
4,8.32


In [17]:
%timeit bubblesort_autojit(psorted.A.to_numpy())

341 µs ± 6.09 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


The execution time was basically unchanged.  Note that the member function to_numpy() is also being called inside the timing
loop, but that doesn't seem to increase the run time by much.

In [18]:
psorted.head()

Unnamed: 0,A
0,0.0
1,0.01
2,0.02
3,0.03
4,0.04
