# Mixed programming: Cython

## Cython: C-Extensions for Python

##  Cython is a **superset** of Python

* Cython is a **superset** of Python, with additional functionality   for defining C types and calling C functions
* Cython generates C wrapper code, which is compiled into a Python   extension module
* Major advantage: enables incremental code optimization

## `cdef`  is used to declare C variables

```cython
cdef int i, j, k
cdef float f, g[42], *h
```

## Cython function definitions

There are three kinds of Cython function definitions: `def`, `cdef` and `cpdef`:

```cython
# Python function.
def foo(int i, char *s):
    
# C function. Not visible to Python code that imports the module 
cdef int eggs(int i, float f):  

# "Hybrid". Generates both Python and C functions.
cpdef double foo_2(int i, float f):

```

**Note**: Function arguments and return types may be declared. 

## Cython optimises based on type definitions  

* If no type is specified for a variable, parameter or return type, it defaults to a Python object
* The standard Python for-loop is used in Cython:

```cython
for i in range(n):
   ...
```   

* If `i` is declared as an integer (with `cdef int i`), this will be optimized into a standard C loop.

## A Cython example

* Approximate the integral of a general function `f(x)`:


<center>
<img src="figs/num_itg.png" style="width: 500px;"/>
Integral of $f(x) = sin(x^2)$
</center>

* Numerical integration: accuracy increases with number of intervals
* Speed is not a problem in 1D, but may be critical in 3D

## Cython example: Standard Python

Python implementation (not optimized) of the integration:

In [1]:
from math import sin

def f(x):
    return sin(x ** 2)

def integrate_f_python(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

In [2]:
%%timeit -n5
integrate_f_python(0., 7., N=1000000)

416 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


## Cython example: Compilation with distutils

Our first Cython file `integral.pyx` is identical to the Python file (Python code is legal Cython code). You *could* compile this manually:

```bash
cython integral.pyx
gcc -fPIC $(pkg-config --cflags --libs python3) integral.c 
gcc -shared -o integral0.so integral.o
```

However, compiling with `distutils` is easier. 

Make a script `setup.py`:

```python
from distutils.core import setup
from Cython.Build import cythonize

setup(
    name = "Integration",
    ext_modules = cythonize("*.pyx"),
)
```

and compile the module with

```bash
python setup.py build_ext --inplace
```

## Cython in Notebooks

There is also a Jupyter magic for compiling/running cython inside a Jupyter notebook:

In [3]:
%load_ext Cython

In [4]:
%%cython

from math import sin

def f(x):
    return sin(x**2)

def integrate_f_cython(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

In [5]:
%%timeit -n5
integrate_f_cython(0., 7., N=1000000)

291 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


The `%%cython` magic also has a special switch `--annotate`, or `-a` for short, that gives extra information about what Cython has done:

In [6]:
%%cython --annotate

from math import sin

def f(x):
    return sin(x ** 2)

def integrate_f_cython(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

## Cython example: adding ctypes

* Simply compiling the Cython file gives only minor speedup: loop runs in C, but makes numerous calls to the Python/C API
* To have any real speedup, we need to introduce types:

In [7]:
%%cython -a
from libc.math cimport sin

def f(x):            
    return sin(x**2)   

cpdef double integrate_f_cython2(double a, double  b, int N):
    cdef double s = 0
    cdef double dx = (b - a) / N
    cdef int i
    for i in range(N):  # compiles to C loop if i is declared as int
        s += f(a + i * dx)
    return s * dx


In [8]:
%%timeit -n5
integrate_f_cython2(0., 7., N=1000000)

186 ms ± 6.39 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


## Cython example: final version

A fully typed version runs about 10 times faster:

In [11]:
%%cython -a
#from math import sin
from libc.math cimport sin
#cdef extern from "math.h":
#     double sin(double arg)

cdef double f(double x):
    return sin(x ** 2)

cpdef double integrate_f(double a, double b, int N):
    cdef double s = 0
    cdef double dx = (b - a) / N
    cdef int i
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

In [12]:
%%timeit -n5
integrate_f(0., 7., N=1000000)

22.7 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


Speedups can for other cases be much higher, typically when there are loops within loops.

## Cython example: Adding "more C" gives more speedup:

<table border="1">
<thead>
<tr><th align="center">       Implementation        </th> <th align="center">Timing (normalised) </th> </tr>
</thead>
<tbody>
<tr> <td align="center">       Pure Python        </td> <td align="center">1.0 </td> </tr>
<tr> <td align="center">   Cython, no types              </td> <td align="center">   0.74    </td> </tr>
<tr> <td align="center">   *double*                 </td> <td align="center">   0.64    </td> </tr>
<tr> <td align="center">   *double* + *int*    </td> <td align="center">   0.40    </td> </tr>
<tr> <td align="center">   Types and *math.h*       </td> <td align="center">   0.12    </td> </tr>
</tbody>
</table>

# Cython and numpy

Cython works with numpy arrays as well.

## Example: A pure Python version

Apply `sin` to all numbers in a `numpy` array:

In [None]:
import numpy
from math import sin


def apply_sin_python(a):
    out = numpy.ndarray(len(a), dtype=numpy.double)

    for i in range(len(a)):
        out[i] = sin(a[i])

    return out

## Defining numpy arrays in Cython

```cython
cdef numpy.ndarray[numpy.double_t, ndim=1] out = numpy.ndarray(1000, dtype=numpy.double)
```

Note that the definition used the `cython` version of the data type. 

Translation table:

| Numpy datatype| Cython datatype|
| ------------- |:-------------:|
| numpy.int8      | numpy.int8_t |
| numpy.int16      | numpy.int16_t |
| numpy.single      | numpy.single_t |
| numpy.double      | numpy.double_t |
| numpy.complex      | numpy.complex_t |


## Moving numpy array to C side

In [None]:
%%cython -a
import numpy
cimport numpy
cimport cython
from libc.math cimport sin
#cdef extern from "math.h":
#    double sin(double arg)


@cython.boundscheck(False)
@cython.wraparound(False)
cpdef numpy.ndarray[numpy.double_t, ndim=1] apply_sin(numpy.ndarray[numpy.double_t, ndim=1] a):
    cdef int i
    cdef int l = len(a)
    
    cdef numpy.ndarray[numpy.double_t, ndim=1] out = numpy.ndarray(len(a), dtype=numpy.double)

    for i in range(l):
        out[i] = sin(a[i])

    return out

## Comparing performance

In [None]:
a = numpy.linspace(0, 10, 1000000, dtype=numpy.double)

In [None]:
%%timeit -n10
out = apply_sin_python(a)

In [None]:
%%timeit -n10
out = numpy.sin(a)

In [None]:
%%timeit -n10
out = apply_sin(a)

## Cython summary

* Cython pros and cons
    * [+] Allows incremental optimization, easy to access C libraries, active developer community, advanced and flexible.
    * [-] Fully optimized code not as readable as Python.
    * [-] Requires user to have a compiler installed.
* Should be considered (maybe as a first choice?) for mixing Python with C