# SWD 6 Notebook 2: Compiling Python with Cython and Numba

## Choosing between Cython and Numba

Both Cython and Numba work by creating a compiled version of part of your code that is then called by the main program. As compiled code usually executes must more quickly than interpreted code, this is an easy gain to improve the execution time of your code.

## Which should I use?

I just need a quick way to improve speed: Use **numba**

I need to distribute my code to other people: Use **Cython**

I need to write faster code that uses numpy arrays: Use **numba**

I need to use advanced Python features: Use **Cython**

## Using Cython

Cython is regular Python but (if we do it properly) with C data types defined. 

We write what is termed **type annotated** code.

We'll use a simple example to create a python **module** containing a single function and go through the *Cythonisation* process.

The following cell will use the IPython 'magic' `%%writefile` instruction to write the contents of the file to disk with the **.pyx** extension.

In [1]:
%%writefile helloworld.pyx
def helloworld():
  """
  A simple function to demonstrate the process of Cythonisation

  inputs: none
  outputs: the string 'Hello World!'
  """
  return "Hello World!"


Overwriting helloworld.pyx


Now we need to create a file setup.py which instructs Python how to create the C version of the `helloworld.pyx` file.

Again using `%%writefile` in a cell to write the contents to disk:

In [2]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)

Overwriting setup.py


Use the `ls` command to check that both the `.pyx` file and `setup.py` are on disk:

In [3]:
!ls

build	      helloworld.cpython-37m-x86_64-linux-gnu.so  sample_data
helloworld.c  helloworld.pyx				  setup.py


In [4]:
!rm helloworld.cpython-37m-x86_64-linux-gnu.so helloworld.c

In [5]:
!ls

build  helloworld.pyx  sample_data  setup.py


We now need to use this to build the Cython (compiled Python) version of `helloworld.pyx`.

This will create a file called `helloworld.so` (so means a shared object file) on a Linux machine or a Mac and a `.pyd` file on a Windows machine.

As we are using Colab, we'll need to run Python as a shell command, so use `!`:

In [6]:
!python setup.py build_ext --inplace

Compiling helloworld.pyx because it changed.
[1/1] Cythonizing helloworld.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'helloworld' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.7m -c helloworld.c -o build/temp.linux-x86_64-3.7/helloworld.o
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.7/helloworld.o -o /content/h

Let's see what that produced:

In [7]:
!ls -l

total 176
drwxr-xr-x 3 root root   4096 Jun  3 20:54 build
-rw-r--r-- 1 root root 102336 Jun  3 21:11 helloworld.c
-rwxr-xr-x 1 root root  60200 Jun  3 21:11 helloworld.cpython-37m-x86_64-linux-gnu.so
-rw-r--r-- 1 root root    170 Jun  3 21:10 helloworld.pyx
drwxr-xr-x 1 root root   4096 Jun  1 13:40 sample_data
-rw-r--r-- 1 root root    123 Jun  3 21:10 setup.py


In this case, the long filename:   

`helloworld.cpython-37m-x86_64-linux-gnu.so`

tells us that we created the file using:

* cpython version 3.7
* on a 64 bit x86 architecture machine
* running Linux
* using the GNU compiler

To use the file, we can import the module in the normal way.

In [8]:
import helloworld as hw
text = hw.helloworld()

In [9]:
print (text)

Hello World!


## Exercise 1: Getting a benchmark timing

From the official Python documentation, we can create a simple Fibonnaci function and use it to print the Fibonacci numbers up to 20000:

In [46]:
%%writefile pairwise_pure.py
import numpy as np

def pairwise_python(X, D):
    M = X.shape[0]
    N = X.shape[1]
    for i in range(M):
        for j in range(M):
            d = 0.0
            for k in range(N):
                tmp = X[i, k] - X[j, k]
                d += tmp * tmp
            D[i, j] = np.sqrt(d)

Overwriting pairwise_pure.py


In [53]:
%%writefile pairwise_pure_test.py
import pairwise_pure as pp
import numpy as np
X = np.random.random((1000, 3))
D = np.empty((1000,1000))
pp.pairwise_python(X, D)

Writing pairwise_pure_test.py


In [54]:
%%timeit
!python pairwise_pure_test.py

1 loop, best of 5: 4.13 s per loop


Use the `%timeit` function to time how long it takes to execute

In [21]:
%%timeit
fib(20000)

100000 loops, best of 5: 2.88 µs per loop


## Exercise 2: Cythonising Pairwise

Follow the steps from the simple helloworld exercise above to:

* create a `fib.pyx` file containing just this function definition
* create a `setup.py` file
* build it
* use this new module in a test program
* time how long the new program takes to execute

Is there a difference between this Cython version and the version in exercise 1?

In [41]:
%%writefile pairwise_cython.pyx
import numpy as np

def pairwise_python(X, D):
    M = X.shape[0]
    N = X.shape[1]
    for i in range(M):
        for j in range(M):
            d = 0.0
            for k in range(N):
                tmp = X[i, k] - X[j, k]
                d += tmp * tmp
            D[i, j] = np.sqrt(d)
            

Overwriting pairwise_cython.pyx


In [42]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("pairwise_cython.pyx")
)

Overwriting setup.py


In [43]:
!python setup.py build_ext --inplace

Compiling pairwise_cython.pyx because it changed.
[1/1] Cythonizing pairwise_cython.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'pairwise_cython' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.7m -c pairwise_cython.c -o build/temp.linux-x86_64-3.7/pairwise_cython.o
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fdebug-prefix-map=/build/python3.7-OGiuun/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.7/p

In [40]:
!ls

build
fibonacci.c
fibonacci.cpython-37m-x86_64-linux-gnu.so
fibonacci.pyx
helloworld.c
helloworld.cpython-37m-x86_64-linux-gnu.so
helloworld.pyx
pairwise_cython.c
pairwise_cython.cpython-37m-x86_64-linux-gnu.so
pairwise_cython.pyx
pairwise_num.py
pairwise_pure.py
pairwise_pure.pyx
sample_data
setup.py


In [51]:
%%writefile pairwise_cython_test.py
import pairwise_cython as pc
import numpy as np
X = np.random.random((1000, 3))
D = np.empty((1000,1000))
pc.pairwise_python(X, D)

Writing pairwise_cython_test.py


In [52]:
%%timeit
!python pairwise_cython_test.py

1 loop, best of 5: 3.63 s per loop


## Using numba to make your code faster

Whereas Cython can be used to create and build a compiled C version of a module (or any chunk of Python code) as a separate build event, numba uses a process called **Just In Time** compilation (or jit) to compile the required part of a Python program as and when it is needed.

This is a very quick and easy process and just requires that numba is installed on the target computer. It is built on top of the LLVM compiler stack.

Consider this pairwise distance function:

In [55]:
%%writefile pairwise_num.py
import numpy as np
from numba import jit

@jit
def pairwise_python(X, D):
    M = X.shape[0]
    N = X.shape[1]
    for i in range(M):
        for j in range(M):
            d = 0.0
            for k in range(N):
                tmp = X[i, k] - X[j, k]
                d += tmp * tmp
            D[i, j] = np.sqrt(d)

Overwriting pairwise_num.py


In [56]:
%%writefile pairwise_num_test.py
import pairwise_num as pn
import numpy as np
X = np.random.random((1000, 3))
D = np.empty((1000,1000))
pn.pairwise_python(X, D)

Writing pairwise_num_test.py


In [57]:
%%timeit
!python pairwise_num.py

1 loop, best of 5: 716 ms per loop
