# Cython

Cython is not just another Python library to speed up your scientific code, it actually consists of two closely related tools in one package:
* a __programming language__ that extends Python with a static type system, and
* a __transpiler__ that translates Cython source into C or C++ code that can subsequently be compiled.
These two are combined such that the source code still closely resembles Python but gains the performance of C.






__References__
* _Cython: A Guide for Python Programmers_ (2015) by Kurt Smith
* _Learning Cython Programming_ (2013) by Philip Herron
* [Cython's Documentation](https://cython.readthedocs.io/en/stable/index.html)

In [1]:
%load_ext cython

In [2]:
import cython

## Motivation

In [3]:
def integral_pi(n):
    from math import sqrt
    s = 0
    for i in range(n):
        s += sqrt(1 - (i/float(n))**2)
    return s/n

In [4]:
x = 1_000

%timeit -r 30 -n 100 integral_pi(x)

201 µs ± 9.74 µs per loop (mean ± std. dev. of 30 runs, 100 loops each)


In [5]:
%%cython --annotate

import cython
from libc.math cimport sqrt


cdef double csqrt(double x):
    return sqrt(x)


@cython.cdivision(True)
def c_integral_pi(long n):
    cdef long i
    cdef double s = 0.0
    for i in range(1, n+1):
        s += csqrt(1 - (i/float(n))**2)
        
    return s/n

In [6]:
x = 1_000

%timeit -r 30 -n 1_000 c_integral_pi(x)

3.43 µs ± 82.5 ns per loop (mean ± std. dev. of 30 runs, 1,000 loops each)


## Types

One crucial difference between high-level languages like Python and low-level languages like C is that the former has a __dynamic type system__ while the latter works with a __static type system__. Statically typed languages enforce a __fixed type__ for each variable at compile time that is either declared in the source code or unambiguously inferred by the compiler. Naturally, this gives the compiler more knowledge about the program and thus enables it to perform type checking at compile time, i.e. checking whether the type constraints are satisfied, and to generate optimized machine code. On the other hand, variables in dynamically typed languages can change their type during runtime and thereby require the interpreter to __perform type checking at runtime__ as well as __dynamically dispatch__ function calls, i.e. determining the appropriate low-level implementation for a function call with given types.

#### Static type declaration of C variables

In order to equip a variable with a static type, one simply uses they keywod `cdef` and the desired C type, e.g. `cdef int x`. If a variable in Cython does not have a static type, it is an untyped dynamic variable and behaves like a common Python variable. The modifiers `const` and `static` are not supported by Cython.

In [7]:
%%cython --annotate

import cython


def fib(int n):
    cdef int a = 0
    cdef int b = 1
    cdef int sum = 0
    cdef int count = 1
    while(count <= n):
        count += 1
        a = b
        b = sum
        sum = a + b
    return sum

#### Static Declaration with Python Types

Cython also allows to statically declare variables with built-in Python types, such as lists and dictionaries. This again gives the Cython compilter more static type information to optimize the generated code. Variables with statically declared Python types can also be used to initialize dynamically typed variables.

In [8]:
%%cython

import cython


def staticList(n):
    cdef list sints = []
    for i in range(n):
        sints.append(i)

In [9]:
%timeit -r 30 -n 100 staticList(10_000)

154 µs ± 4.92 µs per loop (mean ± std. dev. of 30 runs, 100 loops each)


In [10]:
def dynamicList(n):
    dints = []
    for i in range(n):
        dints.append(i)

In [11]:
%timeit -r 30 -n 100 dynamicList(10_000)

485 µs ± 20.4 µs per loop (mean ± std. dev. of 30 runs, 100 loops each)


## Function Definitions

Naturally, Cython also supports static declarations of functions. For the purpose of letting Python and C functions call each other easily, common Python functions work as expected. Three types of function declarations are possible:

* __`def`__
Regular Python functions are declared with `def`, just as expected. They take Python objects as parameters and return Python objects. If a function needs to be called from outside its Cython module, it has to be declared this way.

* __`cdef`__
With this declaration Cython creates a function with C-calling semantics. They can return either static C types or regular Python objects. 

* __`cpdef`__
Hybrid function wrappers that call either the C or the Python version, depending on the caller

In [12]:
%%cython --annotate


def fib(n):
    a = 0
    b = 1
    
    for i in range(n-1):
        sum = a + b
        a = b
        b = sum
        
    return b

In [13]:
%%cython --annotate


def fib(long n):
    cdef long a = 0
    cdef long b = 1
    
    for i in range(n-1):
        sum = a + b
        a = b
        b = sum
        
    return b

In [14]:
%%cython --annotate


cdef fib(long n):
    cdef long a = 0
    cdef long b = 1
    
    for i in range(n-1):
        sum = a + b
        a = b
        b = sum
        
    return b

 1194 | static PyObject *__pyx_f_46_cython_magic_14be49c72450cdfba332a126b516100a_fib(long __pyx_v_n) {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In [15]:
%%cython --annotate


cpdef fib(long n):
    cdef long a = 0
    cdef long b = 1
    
    for i in range(n-1):
        sum = a + b
        a = b
        b = sum
        
    return b

## Pure Python Mode

Cython source code certainly does not run with the Python interpreter, since Cython is language-wise a superset of Python, yet sometimes it is beneficial to retain this capability, for example, while collaborating with pure Python developer or for testing and debugging purposes.
To this end, Cython provides the __pure python mode__ that makes it possible to __augment Python code with cythonic features__ in a separate `.pxd` file. However, this mode restricts Cython to static type declarations and code that can be expressed in Python alone. The advantage is obviously the gained performance of Cython's static type system, but for the cost of necessarily maintaing two separate source code files.

In [16]:
%%writefile cython/lecture/pure_python_demo.py


def myfunction(x, y=2):
    a = x - y
    return a + x * y

def _helper(a):
    return a + 1

class A:
    def __init__(self, b=0):
        self.a = 3
        self.b = b

    def foo(self, x):
        print(x + _helper(1.0))

Writing cython/lecture/pure_python_demo.py


FileNotFoundError: [Errno 2] No such file or directory: 'cython/lecture/pure_python_demo.py'

In [None]:
%%writefile cython/lecture/pure_python_demo.pxd


cpdef int myfunction(int x, int y=*)
cdef double _helper(double a)

cdef class A:
    cdef public int a, b
    cpdef foo(self, double x)

## Compiler Directives a.k.a. Pragmas

Compiler directives are additional, non-executable commands that specify how a compiler should process the given source code. For instance, OpenMP makes heavy use of pragmas in order to define the compilation of mutli-threaded programs. Cython supports numerous compiler directives, most of which affect its type system or toggle safety checks, and a few of them are especially relevant for performance.


* __`boundscheck`__ Per default (`True`), Cython assumes that the index operator `[]` may cause IndexErrors. If set to `False`, the corresponding checks are removed and instructions that cause this error result in segfaults or data corruption.

* __`wraparound`__ Python allows negative indices to index elements relative to the end of a sequence, however, this is not possible in C. Per default (`True`), the correct behaviour of this is checked during runtime and may cause an IndexError. Deactivating these checks (`False`) allows Cython to assume that negative indices are handled correctly.


* __`cdivision`__ Per default (`False`), Cython ensures that remainder and quotient operators match Python's ints and includes checks for division by 0 into the code. No such checks are included if it is set to `True`.


__References:__

* [Cython's Compiler Directives (last checked 08.2022)](https://cython.readthedocs.io/en/stable/src/userguide/source_files_and_compilation.html?highlight=boundscheck#compiler-directives)

## Python's GIL & Parallelization

With enough exposure to Python, there is no way around the famous global interpreter lock (GIL). Contratry to popular belief, the GIL is not a shortcoming of Python itself, but a limitation of its reference implementation CPython. Other implementations like PyPy or Jython do not rely on it. In the words of [Python's official documentation](https://wiki.python.org/moin/GlobalInterpreterLock), the GIL

> "[...] is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety."

The GIL facilitates memory management of Python objects by ensuring that only one thread can execute Python bytecode at any time during the execution of a CPython program. Technically, the GIL is not necessary, but it makes the implementation of CPython much less complex and over time more of CPython's features have become dependent on it and thereby reinfoce its necessity.
However, [NoGil](https://github.com/colesbury/nogil) is a recent attempt to develop a CPython fork that is completely GIL-free.

Cython on the other hand does not require any memory management since it is a compiled and not an interpreted language whose bytecode is executed via interpreters. Consequently, the GIL can be released while the program does not interact with Python objects.


__References:__

[Cython Documentation](https://cython.readthedocs.io/en/stable/src/userguide/parallelism.html)

There are two ways to release the GIL for a given function with the keyword `nogil`:
* `nogil` put at the end of a function declaration, or
* `with nogil` placed before the function call as part of a context manager.

In [17]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp
from cython import wraparound, boundscheck
import numpy as np
from math import exp 
from libc.math cimport exp as c_exp

x = np.random.randn(2_000_000)

@boundscheck(False)
@wraparound(False)
def cy_diff(double[:] x):
    cdef int n = x.size
    cdef double[:] y = np.zeros(n-1)
    cdef int i
    for i in range(n-1):
        y[i] = c_exp(x[i+1] - x[i])
    return y

In [18]:
%timeit -r 30 -n 10 cy_diff(x)

36.7 ms ± 775 µs per loop (mean ± std. dev. of 30 runs, 10 loops each)


In [19]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp

from cython import wraparound, boundscheck
from cython.parallel cimport prange
import numpy as np
cimport openmp
from math import exp 
from libc.math cimport exp as c_exp

x = np.random.randn(2_000_000)

@boundscheck(False)
def cy_pardiff(double[:] x):
    cdef int n = x.size
    cdef double[:] y = np.zeros(n-1)
    cdef int i
    for i in prange(n-1, nogil=True, num_threads=2):
        y[i] = c_exp(x[i+1] - x[i])
    return y

In [20]:
%timeit -r 30 -n 10 cy_pardiff(x)

23.8 ms ± 3.81 ms per loop (mean ± std. dev. of 30 runs, 10 loops each)
