# Interfacing Python with compiled code, Cython

Interfacing python with compiled code lets you speed up some critical part of the code.
There are numerous ways to do this:

#### C API to Python and NumPy 

This is a library of C functions and variables that can be used
to create wrapper functions that together with the targeted C code can be compiled into fast
binary Python modules. See: https://docs.python.org/3/extending/extending.html for more information.

#### ctypes module and attribute 

The ctypes module from the Python standard library and the
ctypes attribute of NumPy arrays can be used to create a Python wrapper for an existing
dynamically-loaded library written in C.

#### Cython 

This facilitates the writing of C extensions for Python.
weave This allows the inclusion of C code in Python programs.

#### SWIG 

This automates the process of writing wrappers in C for C functions. SWIG is easy to
use if the argument list is to be limited to builtin Python types but can be cumbersome if
efficient conversion to NumPy arrays is desired. The difficulty is due to the need to match
C array parameters to predefined patterns in the numpy.

#### f2py 

This is for interfacing to Fortran.
See http://www.scipy.org/Topical_Software for links to some of these. Presented here is the
use of ctypes. Unlike the use of the C API or SWIG, it permits the interface to be written in
Python.



Let us start by writing some C code. The dot product of two vectors for instance:

```C
double dot_product(double v[], double u[], int n)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
        result += v[i]*u[i];
    return result;
}
```

Next we compile it, and build a shared object (please open another terminal window, not in the notebook):

```bash
gcc -c -Wall -Werror -fpic my_dot.c 
gcc -shared -o my_dot.so my_dot.o
```

The ctypes module of the Python standard library provides definitions of fundamental data types that can be passed to C programs. For example:


In [1]:
import ctypes as C
#these types would have names like C.c int and C.c double.
#They can be used constructors, e.g.,
x = C.c_double(2.71828)
#for which x.value returns the Python object.
print(type(2.71828))
print(type(x))

<type 'float'>
<class 'ctypes.c_double'>


In [2]:
#Fundamental types can be composed to get new types, e.g.,
xp = C.POINTER(C.c_double)(); 
xp.contents = x
print(xp)
print(x)

<__main__.LP_c_double object at 0x10515f5f0>
c_double(2.71828)


In [3]:
#or simply xp = C.POINTER(C.c_double)(x) . You can change the value of x using
xp[0] = 3.14159

In [4]:
#Array types can be created by \multiplying" a ctype by a positive integer, e.g.,
ylist = [1.,2.3,4.,5.]
n = len(ylist)
y = (C.c_double*n)()
y[:] = ylist
#or simply
y = (C.c_double*n)(*ylist)
print(ylist)
print(y[0])

[1.0, 2.3, 4.0, 5.0]
1.0


The asterisk is a Python operator for expanding the elements of a sequence into the arguments of a
function. Convert a C array back to a Python value or list by indexing it with an int or a slice.
The ctypes module has a utility subpackage to assist in locating a dynamically-loaded library,
e.g.,

In [5]:
import ctypes.util # an explicit import is necessary
C.util.find_library('my_dot')
#locates the C math library. 
#For loading a library there are constructors, e.g.,
myDL = C.CDLL('./my_dot.so')
print(myDL)
#which makes my a module-like object (a CDLL object to be precise).

<CDLL './my_dot.so', handle 105208a50 at 103d6d650>


Similar to a Python module, myDL has as attributes function-like objects (C function pointers to
be precise) which have the same names as the C functions in the library, e.g., myDL.dot. These
function-like objects themselves have an attribute restype, which must be used to declare the type
of its result. For a C function whose result type is void, use None. 

Here is a full example:

In [6]:
from ctypes import CDLL, c_int, c_double
mydot = CDLL('my_dot.so').dot_product
def dot(vec1, vec2): # vec1, vec2 are Python lists
    n = len(vec1)
    mydot.restype = c_double
    return mydot((c_double*n)(*vec1), (c_double*n)(*vec2), c_int(n))

vec1 = [x for x in range(1000000)]
vec2 = [x for x in range(1000000)]
%timeit dot(vec1,vec2)

1 loop, best of 3: 499 ms per loop


The arguments should be explicitly converted to the appropriate C type. 
The result is automatically converted to a regular Python type, based on the restype attribute.

**Warning.** If you use the extension .so for the name of a file, do not make its stem the same as a
.py file in the same directory, e.g., do not have both a funcs.py and a funcs.so. 


### Repeat the same in Cython

The fundamental nature of Cython can be summed up as follows: Cython is Python with C data types.
As Cython can accept almost any valid python source file, one of the hardest things in getting started is just figuring out how to compile your extension.

Here is the bare Python implementation of the dot product of two lists/vectors:

In [7]:
#def frange(x, y, jump):
#    while x < y:
#        yield x
#    x += jump
       
def dot_product(vec1,vec2):
    result = 0.0
    n = len(vec1)
    for i in range(n):
        result += vec1[i]*vec2[i]
    return result

vec1 = [x for x in range(1000000)]
vec2 = [x for x in range(1000000)]
%time dot_product(vec1,vec2)

CPU times: user 284 ms, sys: 2.02 ms, total: 286 ms
Wall time: 287 ms


3.3333283333312755e+17

### Prepare cython_dot.pyx file

Let us take the dot_product function and put it in the .pyx file:

```python
cimport cython


@cython.boundscheck(False) # Will not check indexing, so ensure indices are valid and non-negative
@cython.wraparound(False)  # Will not allow negative indexing
@cython.cdivision(True)    # Will not check for division by zero
def dot_product(vec1,vec2):
    cdef float result = 0.0
    cdef unsigned int n = len(vec1)

    for i in range(n):
        result += vec1[i]*vec2[i]

    return result
```

### Prepare cython_setup.py file

We would need a setup file in addition to that:

```python
from distutils.core import setup
from Cython.Build import cythonize

setup(
  name = 'my dot',
  ext_modules = cythonize("cython_dot.pyx")
)
```

save it in cython_setup.py file and build:

```bash
python cython_setup.py build_ext --inplace
```

you will now see the `cython_dot.so` file appear in your folder.

In [13]:
from cython_dot import dot_product
vec1 = [x for x in range(1000000)]
vec2 = [x for x in range(1000000)]
%time dot_product(vec1,vec2)

CPU times: user 41.5 ms, sys: 98 µs, total: 41.6 ms
Wall time: 41.6 ms


3.3338099651261235e+17

### Use ndarray in Cython code

How can we make Cython implementation even faster? use less python's generic data structures and more numpy arrays!
Change the cython_dot.pyx file to look like:

```python
cimport cython
import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

@cython.boundscheck(False) # Will not check indexing, so ensure indices are valid and non-negative
@cython.wraparound(False) # Will not allow negative indexing
@cython.cdivision(True) # Will not check for division by zero
def dot_product(np.ndarray[DTYPE_t, ndim=1] vec1, np.ndarray[DTYPE_t, ndim=1] vec2):
    cdef float result = 0.0
    cdef unsigned int i
    cdef unsigned int n = vec1.shape[0]

    for i in range(n):
        result += vec1[i]*vec2[i]

    return result
```   

and change the cython_setup.py to looks like this:

```python
#!/usr/bin/env python3

from distutils.core import setup
from Cython.Build import cythonize
import numpy as np

setup(
  name = 'my dot',
  ext_modules = cythonize("cython_dot2.pyx"),
  include_dirs = [np.get_include()]
)
```

rebuild the shared object, and rerun:

In [17]:
import numpy as np
from cython_dot2 import dot_product
vec1 = np.arange(1000000,dtype=float)
vec2 = np.arange(1000000,dtype=float)
%time dot_product(vec1,vec2)

CPU times: user 3.31 ms, sys: 0 ns, total: 3.31 ms
Wall time: 3.32 ms


3.3338099651261235e+17

### Compiler optimization in Cython

Note, that we have also included the compiler optimization flags in our Cython setup file:

```python
extra_compile_args = ["-O3", "-ffast-math", "-march=native"],
```


### Open MP (demo only)

Finally, we can try improving our Cython code with OpenMP
https://clang-omp.github.io

In [18]:
import numpy as np
from cython_dot3 import dot_product
vec1 = np.arange(1000000,dtype=float)
vec2 = np.arange(1000000,dtype=float)
%time dot_product(vec1,vec2)

ImportError: No module named cython_dot3

<p>Continue on to the Data parallelism exercise: [JoblibMultiprocessing.ipynb](JoblibMultiprocessing.ipynb).</p>