### Cython

#### Intro 
- A modification of Python
- Python code + C data types
- Convert into C, compile into shared library
- Benifits:
    - Speed: (1) little in numerical (2) a lot in for-loop part
    - Easy to call into C code
    
#### Requirement:
- Cython + C compiler (http://cython.readthedocs.io/en/latest/src/quickstart/install.html)
- Cython docs: http://cython.readthedocs.io/en/latest/
- http://cython.org/
- https://haoyugsoc.wordpress.com/page/2/

#### Syntax
##### Basic C Types
    - cahr, short, int, long, long long, float, double, long double
    - array, pointer, structure, enumeration
    - union
##### Variable and Type Definitions
    - `cdef` <== C varibales declaration
    - `ctypedef`  <== `typedef` C type naming
        - e.g. `ctypedef int * intPtr`
    - Cython vs. C struct, union, enum
<table border="1">
<tr><th>C code</th><th>Cython code</th><th></th><th>C code</th><th>Cython code</th></tr>
<tr><td><pre><code>
struct Grail {
    int age;
    float volume;
}
</code></pre></td>
<td><pre><code>
cdef struct Grail:
    int age
    float volume
</code></pre></td>
<td></td>
<td><pre><code>
union Food {
    char *spam;
    float *eggs;
}	
</code></pre></td>
<td><pre><code>
cdef union Food:
    char *spam
    float *eggs
</code></pre></td></tr>
<tr><td><pre><code>
enum CheeseType {
    cheddar, edam,
    camembert
}	
</code></pre></td>
<td><pre><code>
cdef enum CheeseType:
    cheddar, edam,
    camembert
</code></pre></td>
<td></td>
<td><pre><code>
emum CheeseState {
    hard = 1,
    soft = 2,
    runny = 3
}	
</code></pre></td>
<td><pre><code>
cdef enum CheeseState:
    hard = 1
    soft = 2
    runny = 3
</code></pre></td></tr>
</table>

##### Functions
- `def`: 
    - Python function: takes/returns Python object
    - Can call `cdef`
    - Can be export from the Cython module where it is defined
    - Can use C **numeric, string, struct** types as parameters (auto-conversion); other C types results in compile-time error.
- `cdef`: 
    - C function: takes/returns either Python objects or C values 
    - Can call `cdef`
    - Cannot be export
    - Fastest, if only use in Cython code
    - Can use **any** C types as parameters
    - Will return a False, if no explicit return (contrast to C/C++, which leaves the return value undefined)
- `cpdef`
    - hybrid function
    - Can be called from anywhere
    - Lost performance gain, and have overhead

##### Automatic Type Conversions
<pre>
C types	                   | From Python types	  |  To Python types
---------------------------+----------------------+--------------------
char, short, int, long     |    int, long	      |        int
int, long, long long	   |    int, long	      |        long
float, double, long double |    int, long, float  |        float
char*	                   |    str	              |        str
struct, union              |    N/A               |        dict
---------------------------+----------------------+--------------------
</pre>


##### Statements and Expressions
- **Control structures** and **expressions** follow Python syntax
- Most of the **Python operators** can also be applied to C values, with the obvious semantics.
- Python objects and C values are mixed in an expression, conversions are performed automatically between Python objects and C numeric or string types.

- Variables must be declared with C data types to get a performance boost.


#### Executing
- Manual Compiliation
    - Cython code is normally saved in files ending with .pyx
    - To translate into C: `cython my_module.pyx` 
        - result: my_module.c
        - use `-a` option to get an html showing translation line by line
    - Compile the C file: 
        - **Must** compile into a **shared library**
        - **Link path**: provide the python folder for linkage `-I <python_path>`
        - ALWAYS provide an destination object file name, using `-o` option 
            - (use `.so` as the file extension, following the C convention)
        - e.g. `gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o my_module.so my_module.c`
        - The above example command will create a library called my_module.so
- From IPython 
    - Load the extention: `%load_ext Cython`
    - Compile: 
        - Jupyter: (in cell) `%%cython .....<all the code in this cell will be compiled into C>`
        - IPython: %cython (didn't verify)
        - Use `--annotate` to see code analysis when compile
            - `%%cython --annotate`

- Import Cython Module
    - cimport: ???
    - pyximport
        1. `import pyximport`     # import pyximport module to use normal import to import Cython module
        2. `pyximport.install()`  # init the pyximport module
        3. `import <any .pyx module>`


#### Cython for Numpy
- Indexing Improvement for Numpy:
- It would be much more effient if we could access the data buffer directly at C speed.
- It is possible to do this by specifying the type of contents of the ndarray objects.
    - `np.ndarray[type, ndim=N]`
- 
- Some correctness checking features can be disabled if maximum speed is required. But at the **cost of safety**
- Bounds checking can be disabled by adding a decorator to the function
- More about Compiler Directives: http://docs.cython.org/src/reference/compilation.html#compiler-directives

### Some Discussion on Cython
- Turn python into cython is one-way thing, cython code can no longer python code, instead, it's C
- 

In [6]:
# from nose.tools import assert_equal
# (res1==res2).all() # .all() function

In [1]:
%load_ext Cython

In [2]:
%%cython --annotate
def cfunc(int n):
    cdef int a = 0
    for i in range(n):
        a += i
    return a

print cfunc(10)

45


In [3]:
def primes(kmax):
    p = [None] * 1000 # Initialize the list to the max number of elements
    if kmax > 1000:
        kmax = 1000
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
    
%timeit primes(1000)

1 loop, best of 3: 495 ms per loop


In [10]:
%%cython
def primes_cython(kmax):
    p = [None] * 1000 # Initialize the list to the max number of elements
    if kmax > 1000:
        kmax = 1000
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result



In [14]:
%timeit primes_cython(1000)

1 loop, best of 3: 202 ms per loop


In [16]:
%%cython
def primes_cython_def(int kmax):
    cdef int i, k, n
    cdef int p[1000]
    if kmax > 1000:
        kmax = 1000
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

In [30]:
%timeit primes_cython_def(1000)

100 loops, best of 3: 5.65 ms per loop


In [31]:
import numpy as np
def naive_convolve(f, g):
    # f is an image and is indexed by (v, w)
    # g is a filter kernel and is indexed by (s, t),
    #   it needs odd dimensions
    # h is the output image and is indexed by (x, y),
    #   it is not cropped
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # smid and tmid are number of pixels between the center pixel
    # and the edge, ie for a 5x5 filter they will be 2.
    #
    # The output size is calculated by adding smid, tmid to each
    # side of the dimensions of the input image.
    vmax = f.shape[0]
    wmax = f.shape[1]
    smax = g.shape[0]
    tmax = g.shape[1]
    smid = smax // 2
    tmid = tmax // 2
    xmax = vmax + 2*smid
    ymax = wmax + 2*tmid
    # Allocate result image.
    h = np.zeros([xmax, ymax], dtype=f.dtype)
    # Do convolution
    for x in range(xmax):
        for y in range(ymax):
            # Calculate pixel value for h at (x,y). Sum one component
            # for each pixel (s, t) of the filter g.
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

In [27]:
%%cython
import numpy as np
def convolve1(f, g):
    # f is an image and is indexed by (v, w)
    # g is a filter kernel and is indexed by (s, t),
    #   it needs odd dimensions
    # h is the output image and is indexed by (x, y),
    #   it is not cropped
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # smid and tmid are number of pixels between the center pixel
    # and the edge, ie for a 5x5 filter they will be 2.
    #
    # The output size is calculated by adding smid, tmid to each
    # side of the dimensions of the input image.
    vmax = f.shape[0]
    wmax = f.shape[1]
    smax = g.shape[0]
    tmax = g.shape[1]
    smid = smax // 2
    tmid = tmax // 2
    xmax = vmax + 2*smid
    ymax = wmax + 2*tmid
    # Allocate result image.
    h = np.zeros([xmax, ymax], dtype=f.dtype)
    # Do convolution
    for x in range(xmax):
        for y in range(ymax):
            # Calculate pixel value for h at (x,y). Sum one component
            # for each pixel (s, t) of the filter g.
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

In [26]:
%%cython
cimport numpy as np

import numpy as np
def convolve1_cdef(np.ndarray f, np.ndarray g):
    # f is an image and is indexed by (v, w)
    # g is a filter kernel and is indexed by (s, t),
    #   it needs odd dimensions
    # h is the output image and is indexed by (x, y),
    #   it is not cropped
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
        
        
        
    cdef np.int_t value
    # smid and tmid are number of pixels between the center pixel
    # and the edge, ie for a 5x5 filter they will be 2.
    #
    # The output size is calculated by adding smid, tmid to each
    # side of the dimensions of the input image.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int vtmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    
    # ATTENTION: cannot declare variables inside loops
    cdef int s_from
    cdef int s_to
    cdef int t_from
    cdef int t_to
    
    # Allocate result image.
    cdef np.ndarray h = np.zeros([xmax, ymax], dtype=f.dtype)
    # Do convolution
    for x in range(xmax):
        for y in range(ymax):
            # Calculate pixel value for h at (x,y). Sum one component
            # for each pixel (s, t) of the filter g.
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

In [33]:
from nose.tools import assert_equal
res1 = naive_convolve(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
res2 = convolve1_cdef(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
assert_equal((res1==res2).all(), True)

In [35]:
%timeit naive_convolve(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
%timeit convolve1(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
%timeit convolve1_cdef(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))

1000 loops, best of 3: 161 µs per loop
10000 loops, best of 3: 68.2 µs per loop
10000 loops, best of 3: 53.8 µs per loop
