## Cython

In this notebook, we will use the `cythonmagic` extension, to demonstrate why and how to use cython.

Note that this is not the typical usage pattern for cython. We will also look at how to use cython in the context of modules and libraries.

But for now, let's load the `cythonmagic` extension. This allows us to mark cells as cython cells by starting them with `%%cython`. 

In [4]:
%load_ext Cython

First, let's see what this is good for.

Consider a very simple function defined in python:

In [5]:
def my_poly(a,b):
    return 10.5 * a + 3 * (b**2)

The equivalent cython function is defined below in a `%%cython` cell. Note that the only difference is that we tell the function to treat these variables as double-precision numbers. 

**Important**: If this code were written in a regular python cell it would produce a syntax error. Cython is a 'dialect' of python, but it is not exactly like python. In fact, cython is a proper superset of python. That means that any python code is legitimate cython code, but the opposite. We will see one way to deal with this issue in a little bit.

In [6]:
%%cython
def my_polyx(double a, double b):
    return 10.5 * a + 3 * (b**2)

In [7]:
%timeit my_poly(10, 2)
%timeit my_polyx(10, 2)

The slowest run took 8.76 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 715 ns per loop
The slowest run took 8.02 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 208 ns per loop


This shows that we can gain a 3-fold speedup for even a trivial piece of code.

Let's consider an (only slightly) more interesting example, the calculation of the fibonacci series that we considered for the profiling examples:

In [8]:
def fib(n):
    a, b = 1, 1
    for i in range(n):
        a, b = a+b, a

    return a

For the cython example of this function, we will use the `cdef` keyword (a cython keyword!) to define local variables (integers in this case)

In [9]:
%%cython
def fibx(int n):
    cdef int i, a, b
    a, b = 1, 1
    for i in range(n):
        a, b = a+b, a
    return a

In [10]:
%timeit fib(10)
%timeit fibx(10)

1000000 loops, best of 3: 1.05 µs per loop
10000000 loops, best of 3: 62.9 ns per loop


In this case, we are already at the >10-fold speedup. 

Let's pause to consider the implications of this. The C code required to perform the same calculation as `fibx` might look something like this: 

        int fib(int n){
            int tmp, i, a, b;
            a = b = 1;
            for(i=0; i<n; i++){
                 tmp = a;
                 a += b;
                 b = tmp;}	     
            return a;}

In and of itself, that's not too terrible, but can get unpleasant if you write more than this trivial function. The main issue is that integrating this code into a python program is not trivial and requires writing extension code. This also has overhead that is hard to optimize. Cython writes highly optimized python extension code, making it easy to separate out performance bottle-necks and compile them, but keep using the functions in your python code. 

## Optional section: writing cython that also works as python

Remember that we mentioned that cython code is not always syntactically correct python code? Sometimes you might want to write code that can be compiled as cython, but would also work as python. If you want to do that, you can use the cython API. The following cell is a simple example. This can be switched between (un-compiled) python and (compiled) cython by adding/removing the `%%cython` cell magic command at the top of the cell.

In [21]:
%%cython
import cython
@cython.locals(n=cython.int)
def fib_pure_python(n):
    cython.declare(a=cython.int,
                   b=cython.int,
                   i=cython.int)
    a, b = 1, 1
    for i in range(n):
        a, b = a+b, a
    return a


In [22]:
%timeit fib_pure_python(10)

The slowest run took 26.78 times longer than the fastest. This could mean that an intermediate result is being cached 
10000000 loops, best of 3: 53.8 ns per loop


In [13]:
from numba import jit

In [14]:
fib_numba = jit(fib)

In [20]:
fib_numba(10)  # We need to run it once because !
%timeit fib_numba(10)

The slowest run took 13.42 times longer than the fastest. This could mean that an intermediate result is being cached 
10000000 loops, best of 3: 169 ns per loop
