# Optimisation - Cython
## Martin Robinson
## Oct 2019

# Using lower-level languages

- Interpreted languages are fundamentially speed-limited when they only consider *type* at run-time.
- e.g. consider what happens with the types of the variables in the following function
```python
def norm(arg_list, p):
    sum = 0               # sum is an int here
    for x in arg_list:    # type of x depends on input container
        sum += abs(x)**p  # type of rhs depends on both x and p, sum could *change* type here
    return sum**(1.0/p)   # return value is probably float due to 1.0
```
- how much memory to allocate for sum? does this memory need to be re-allocated during the loop? are conversion routines between types required during the loop?

- compare to equivilant C++ code
```cpp
float norm(std::vector<float>& arg_list, float p) {
    float sum = 0.0f;
    for (size_t i = 0; i < arg_list.size(); ++i) {
        sum += std::pow(std::abs(arg_list), p);
    }
    return std::pow(sum, 1.0f/p);
}
```
- compiler can pre-allocate the stack size because the sizes of local variables known
- compiler can generate efficient machine code because the programmer has provided more information (i.e. types)
- the programmer has put the required effort into making sure no type conversions are needed
    

# "Compiling" Python code

- All python implementations (CPython, PyPy, IronPython) compile to *bytecode*, which is then either interpreted at run-time, or perhaps further compiled to native machine code
- Implementations that compile to native machine code usually implement something close to normal python, but with restrictions or additions that alter the nature of the language. These include:
    - Cython (Python-to-C)
    - Nuitka (Python-to-C++)
    - Numba (Python-to-LLVM IR)

# "Wrapping" C and C++ for use in Python

- the compilers in the previous slide implement an altered version of python, yet another language to learn!
- If your already comfortable with C, C++ or Fortran, why not use this directly and write a *wrapper* to call from Python?
- Many available wrappers, including:
    - Pybind11 (C++)
    - F2PY (Fortran)
    - CPython Python-C-API (C)

# Cython

- "Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language"
- That is, can compile standard Python code, as well as a version of Python with extensions for types etc.
- Very useful as an iterative workflow, can start with standard Python and incrementally optimise

# Your first Cython program

- Here is some python code that calculates the element-wise matrix p-norm of a 2D numpy array
- Our aim is to speed it up using cython

In [None]:
import numpy as np

def norm_py(a, p):
    s = 0
    x_max = a.shape[0]
    y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

# Manual compilation

- Cython source files end in `.pyx`
- you can manually compile Cython source files to C using the `cython` command

```bash
$ cython yourmod.pyx
```

- In jupyter notebooks use the `%%cython` magic command

In [None]:
%load_ext Cython

In [None]:
%%cython
def norm_pyx(a, p):
    s = 0
    x_max = a.shape[0]
    y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

In [None]:
a = np.random.random((1000, 1000))
p = 2
%timeit norm_py(a, p)

In [None]:
%timeit norm_pyx(a, p)

# Examining the generated code

- An important part of the cython workflow is to examine, and optimise, the generated C code
- You can get this using the `-a` flag

```bash
$ cython -a yourmod.pyx
```

- In jupyter notebooks just pass the `-a` flag to the cython magic command, see next slide for an example

In [None]:
%%cython -a
def norm_pyx(a, p):
    s = 0
    x_max = a.shape[0]
    y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

# Adding types

- now lets give the cython compiler some more information on the types we want to use
- this should reduce the amount of Python interaction (yellow lines) and increase speed
- we now break source compatability, no longer pure Python syntax

In [None]:
%%cython -a
cpdef double norm_pyx(a, double p):
    cdef double s = 0.0
    cdef Py_ssize_t x_max = a.shape[0]
    cdef Py_ssize_t y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

In [None]:
%timeit norm_py(a, p)

In [None]:
%timeit norm_pyx(a, p)

# memoryviews

- The input numpy array `a` is still a python object, indexing this object is slow
- Cython provides typed *memoryviews* to allow efficient access to memory buffers, such as numpy arrays

```python
cdef int [:] 1d_array_of_ints
cdef double [:,:,:] 3d_array_of_doubles
cdef function_that_takes_a_1d_array_of_floats(float [:] arg):
```


In [None]:
%%cython -a
cpdef double norm_pyx(double [:,:] a, double p):
    cdef double s = 0.0
    cdef Py_ssize_t x_max = a.shape[0]
    cdef Py_ssize_t y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

In [None]:
%timeit norm_py(a, p)

In [None]:
%timeit norm_pyx(a, p)

# Tuning indexing further

- by default, Cython uses python behaviour for everything
- this means bounds checking for accessing arrays, divide by zero checks, and many other checks that slow down your code.
- **once** you are confident that your code is working as expected and you don't need these checks, you can turn them off

In [None]:
%%cython -a
from libc.math cimport abs  # Can import any libc functions you need here
cimport cython

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@cython.cdivision(True)     # Deactivate normal python division checking
cpdef double norm_pyx(double [:, :] a, int p):
    cdef double s = 0
    cdef Py_ssize_t x_max = a.shape[0]
    cdef Py_ssize_t y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)


In [None]:
%timeit norm_py(a, p)

In [None]:
%timeit norm_pyx(a, p)

# Extension types

- Cython can compile normal python classes
- It can also define *extension types*, aka cdef classes, which are more efficient
- Take for example this pure python clas that implements a simple ODE model

In [None]:
class Model:
    def __init__(self, dt):
        self._dt = dt
        self._y0 = 1.0
        
    def dydx(self, p, y):
        return -y

    def evaluate(self, p, time):
        timesteps = int(time / self._dt)
        y = self._y0
        for i in range(timesteps):
            y += self._dt * self.dydx(p, y)
        return y

- This is the equivilent cdef class

In [None]:
%%cython -a
cdef class Model_pyx:                  # add cdef to convert to extension type
    cdef double _dt                    # define C class variables as attributes using the cdef syntax
    cdef public double _y0             # use public keyword to enable access from python
    
    def __cinit__(self, double dt):    # __cinit__ equivilent to C++ constructors (__init__ might not be called)
        self._dt = dt 
        self._y0 = 1.0
        
    cdef double dydx(self, double p, double y): # cdef functions cannot be called from python
        return -p*y

    cpdef double evaluate(self, double p, double time): # cpdef functions *can* be called from python
        cdef int timesteps = int(time / self._dt)
        cdef double y = self._y0
        cdef double tmp
        for i in range(timesteps):
            tmp = self.dydx(p, y)
            y += self._dt * tmp
        return y

In [None]:
%timeit Model_pyx(1e-4).evaluate(1.0, 1.0)

In [None]:
%timeit Model(1e-4).evaluate(1.0, 1.0)

# Packaging Cython programs

- The `setup.py` in the provided code is used when you `pip install` the package
- This is where you use Cython to compile any pyx files, e.g.

```python
from setuptools import setup, find_packages
from Cython.Build import cythonize

setup(
    name = 'test'
    # ...
    packages = find_packages(include=('test'))
    ext_modules = cythonize('test/my_cython_code.pyx')
    # ...
)
```

- note that `cythonize` is the function equivilent to the command-line `cython`