# Part 4: Parallel programming with Cython

Cython is an extension of the Python language which provides improved performance similar to compiled C-code. 

Cython is sometimes described as a hybrid-style language mixing Python and C, using  C-like static type definitions to provide the Cython compiler with enough extra information to produce highly-performant code, offering performance similar to traditional compiled C code.

In this mini-tutorial, we are going to look at a particular subset of the Cython project that provides parallel programming support, using the `cython.parallel` module.

## A brief crash course in Cython

Before we look at the `cython.parallel` module, however, we should cover some basic Cython first to get a feel of how it looks and works compared to pure Python and C code!

Here is a programming example favourite - a function that computes the n-th Fibonacci number, taking `n` as its input argument.

Here it is in __Python__:

In [1]:
def fib(n):
    a = 0.0
    b = 1.0
    for i in range(n):
        a, b = a + b, a
    return a

All Python code is also valid Cython code, and has the same behaviour whether it is run through the Python interpreter, or compiled with the Cython compiler. Using the Cython compiler on the above code will not have much effect on its performance however, but here is where the special __Cython__ syntax comes in. 

As Cython is a kind of Python-C hybrid, let's consider what the above code would look like translated into __pure C__ code.

```C
/* C version of the fibonacci calculation*/
double fib(int n)
{
    int i;
    double a = 0.0, b = 1.0, tmp;
    for (i=0; i<n; ++i)
    {
        tmp a;
        a = a + b;
        b = tmp;
    }
    return a;
}    
```

In the C version, we have to define the types of variables we use (e.g. `int`, `double`), in contrast to the way Python infers the types dynamically at runtime. We also have the usual curly braces and semi-colon syntax not present in Python.

In the __Cython__ version, we can blend the static types of the C code with more Python-like syntax:

_Note for Jupyter notebook users: When using Cython in a Jupyter notebook, you need the following line added at the start of a Cython session:_

`%load_ext Cython`

In [2]:
%load_ext Cython

_And this line needs to be added for every notebook cell that contains Cython code:_

`%%cython`

__Cython__ version of the Fibonacci function:

In [3]:
%%cython
def fib(int n):
    cdef int i
    cdef double a = 0.0
    cdef double b = 1.0
    for i in range(n):
        a, b = a + b, a
    return a

In Cython we can use `cdef` to define static variables, just like in C. But note we are still using the Python-syntax for `for` loops and function definitions. Think of Cython as a superset of the Python language, giving us some 'extra' Python syntax and keywords that we can use to help speed up our code and make it more C-like.

We have not yet used any parallelism, but when compiled with the Cython compiler this code will offer significant speed up at runtime compared to the dynamically interpreted pure-Python version.

If we were to compile this cython code and run it, we could expect at least an order of magnitude speed up compared to the native Python version.

## Parallelism in Cython

Cython alone may offer sufficient performance gains for an application written in Python. However, since we are here to look at parallel programming in Python, let's look at the `cython.parallel` module.

The `prange` function within this module can be used to iterate through a for loop where each iteration can be executed in parallel:

In [6]:
%%cython
from cython.parallel import prange

# First declare the variables we are going to use with cdefs:
cdef int i
cdef int n = 30
cdef int sum = 0

# Use prange instead of range
for i in prange(n, nogil=True):
    sum += i

print(sum)

435


The `prange` function takes extra arguments in addition to the number of items to iterate over, `n`. Here we have passed the argument `nogil=True`. This tells Cython we can safely release Python's Global Interpreter Lock for this section of the code in the for loop. Python's normal restriction on thread-based parallelism will be relaxed for the duration of the for loop. Each iteration is therefore free to be computer in parallel, exploiting multiple CPU cores if they are available on the system.