# DS-GA-3001 Advanced Python for Data Science

Before you turn this problem in, make sure you **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart). You can then run the cells **in order**, during the class.

Any textual answers that need to be provided will be marked with "YOUR ANSWER HERE". Replace this text with your answer to the question.

Any code answers that need to be provided will be marked with:

```
# YOUR CODE HERE
raise NotImplementedError()
```

Replace all this code with your answer to the question. If you do not answer the question, the `NotImplementedError` exception will be raised, which will indicate to the grader that no answer has been supplied.

In many cases, code answers will also have some associated test code. You should execute the tests after you have entered your code in order to ensure that your answer is correct. You should not proceed to the next question until your answer is correct.

Finally, insert your Net ID and the Net ID's of any collaborators in the cell below.

In [1]:
NET_ID = "jl6583"
COLLABORATORS = ""

---

# Introduction to Numba

Numba provides the ability to speed up applications with high performance functions written directly in Python, rather than using language extensions such as Cython. 

Numba allows the compilation of selected portions of pure Python code to native code, and generates optimized machine code using the [LLVM compiler infrastructure](http://llvm.org/). With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time (JIT) optimized to achieve performance similar to C, C++ and Fortran, without having to switch languages or Python interpreters. Numba works at the function level. From a function, Numba can generate native code for that function as well as the wrapper code needed to call it directly from Python. This compilation is done on-the-fly and in-memory.

Numba’s main features are:

* On-the-fly code generation (at import time or runtime, at the user’s preference)
* Native code generation for the CPU (default) and GPU hardware
* Integration with the Python scientific software stack (thanks to NumPy)

## Compiling code with `@jit`

Numba's central feature is the `number.jit()` decoration. Using this decorator, it is possible to mark a function for optimization by Numba’s JIT compiler. Various invocation modes trigger differing compilation options and behaviours.

Let's see Numba in action. The following is a Python implementation of bubblesort for NumPy arrays.

In [2]:
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

First we’ll create an array of sorted values and randomly shuffle them.

In [3]:
import numpy as np

original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)

Next, create a copy and do a bubble sort on the copy.

In [4]:
sorted = shuffled.copy()
bubblesort(sorted)
print(np.array_equal(sorted, original))

True


<div class="alert alert-success">
Now let's time the execution. Note: we need to copy the array so we sort a random array each time as sorting an already sorted array is faster.
</div>

In [5]:
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

1 loops, best of 3: 225 ms per loop


The recommended way to use the `@jit` decorator is to let Numba decide when and how to optimize:

In [6]:
from numba import jit
@jit
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

<div class="alert alert-success">
Now let's time the execution of the optimized code. Using the decorator in this way will defer compilation until the first function execution, so the first execution will be significantly slower. Numba will infer the argument types at call time, and generate optimized code based on this information. Numba will also be able to compile separate specializations depending on the input types.
</div>

In [7]:
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

The slowest run took 78.97 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.86 ms per loop


## Function signatures

It is also possible to specify the *signature* of the Numba function. A **function signature** describes the types of the arguments and the return type of the function. This can produce slightly faster code as the compiler does not need to infer the types. However the function is no longer able to accept other types. See the [numba.jit()](http://numba.pydata.org/numba-doc/0.24.0/reference/jit-compilation.html#numba.jit) documentation for more information on signatures. For the sort function, this would be:

In [8]:
from numba import jit
@jit("void(f4[:])")
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

<div class="alert alert-success">
Time this code and see if it is any faster than the previous version.
</div>

In [9]:
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

1000 loops, best of 3: 988 µs per loop


## Compilation options

Numba has two compilation modes: `nopython` mode and `object` mode. In `nopython` mode, the Numba compiler will generate code that does not access the Python C API. This mode produces the highest performance code, **but requires that the native types of all values in the function can be inferred**. In `object` mode, the Numba compiler generates code that handles all values as Python objects and uses the Python C API to perform all operations on those objects. Code compiled in object mode will often run no faster than Python interpreted code. Numba will by default automatically use `object` mode if `nopython` mode cannot be used for some reason. Rather than fall back to `object` mode, it is sometimes preferrable to generate an error instead. By adding the `nopython=True` keyword, it is possible to force Numbe to do this.

In [10]:
from numba import jit
@jit("void(f4[:])",nopython=True)
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

<div class="alert alert-success">
Notice that this code compiles cleanly. However, if we introduce an object who's type cannot be inferred and see what happens (don't worry, you should see errors).
</div>

In [11]:
from numba import jit
from decimal import Decimal
@jit("void(f4[:])",nopython=True)
def bubblesort(X):
    N = len(X)
    val = Decimal(100)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

TypingError: Caused By:
Traceback (most recent call last):
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 243, in run
    res = stage()
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 458, in stage_nopython_frontend
    self.locals)
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 758, in type_inference_stage
    infer.build_constraint()
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/typeinfer.py", line 494, in build_constraint
    self.constrain_statement(inst)
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/typeinfer.py", line 651, in constrain_statement
    self.typeof_assign(inst)
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/typeinfer.py", line 691, in typeof_assign
    self.typeof_global(inst, inst.target, value)
  File "/Users/luchristopher/anaconda/lib/python2.7/site-packages/numba/typeinfer.py", line 756, in typeof_global
    loc=inst.loc)
TypingError: Untyped global name 'Decimal'
File "<ipython-input-11-707ed1bfaf7c>", line 6

Failed at nopython (nopython frontend)
Untyped global name 'Decimal'
File "<ipython-input-11-707ed1bfaf7c>", line 6

Now when we try to compile this code, Numba complains that `Decimal` is an untyped name. Without the `nopython` mode, this code would have compiled, but would have run much more slowly.

<div class="alert alert-success">
Copy this code into the cell below and remove the `nopython` option. Verify that it compiles cleanly in this case.
</div>

In [12]:
from numba import jit
from decimal import Decimal
@jit("void(f4[:])")
def bubblesort(X):
    N = len(X)
    val = Decimal(100)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

## Calling other functions

<div class="alert alert-success">
Numba functions can call other Numba functions. Of course, both functions must have the `@jit` decorator, otherwise the code will be much slower.
</div>

In [13]:
from numba import jit
@jit("void(f4[:])",nopython=True)
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
                
@jit
def do_sort():
    sorted[:] = shuffled[:]
    bubblesort(sorted)

Time how long it takes to run the `do_sort()` function.

In [14]:
%timeit do_sort()

The slowest run took 128.65 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 971 µs per loop


## NumPy universal functions

Numba’s `@vectorize` decorator allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy. Using the `@vectorize` decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.

The `@vectorize` decorator has two modes of operation:

* *Eager, or decoration-time, compilation.* If you pass one or more type signatures to the decorator, you will be building a Numpy universal function (ufunc). We're just going to consider eager compilation here.
* *Lazy, or call-time, compilation.* When not given any signatures, the decorator will give you a Numba dynamic universal function (DUFunc) that dynamically compiles a new kernel when called with a previously unsupported input type.

Using `@vectorize`, you write your function as operating over *input scalars*, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs. The following code defines a function that takes two floating point arrays and returns a floating point array.

In [15]:
from numba import vectorize, int64

@vectorize([int64(int64, int64)])
def vec_add(x, y):
    return x + y

In [16]:
a = np.arange(6, dtype=np.int64)
print vec_add(a, a)
b = np.linspace(0, 10, 6, dtype=np.int64)
print vec_add(b, b)

[ 0  2  4  6  8 10]
[ 0  4  8 12 16 20]


This works because NumPy array elements are `int64`. If the elements are a different type, and the arguments cannot be safely coerced, then the function will raise an exception:

In [17]:
c = a.astype(float)
print c
print vec_add(c, c)

[ 0.  1.  2.  3.  4.  5.]


TypeError: ufunc 'vec_add' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

<div class="alert alert-success">
Redefine the vec_add() function so that it takes float64 as arguments and produces the correct results.
</div>

In [19]:
from numba import vectorize, float64

@vectorize([float64(float64, float64)])
def vec_add(x, y):
    return x + y

In [20]:
from nose.tools import assert_equal
c = np.linspace(0, 1, 6)
assert_equal((c * 2 == vec_add(c, c)).all(), True)
print "Correct!"

Correct!
