<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Cython" data-toc-modified-id="Cython-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Cython</a></span><ul class="toc-item"><li><span><a href="#Dot-Product" data-toc-modified-id="Dot-Product-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Dot Product</a></span><ul class="toc-item"><li><span><a href="#Pure-Python" data-toc-modified-id="Pure-Python-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Pure Python</a></span></li><li><span><a href="#Simple-Cython" data-toc-modified-id="Simple-Cython-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Simple Cython</a></span></li><li><span><a href="#Cython-with-Type-Declarations" data-toc-modified-id="Cython-with-Type-Declarations-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Cython with Type Declarations</a></span></li><li><span><a href="#Cython-with-Numpy-and-Type-Declarations" data-toc-modified-id="Cython-with-Numpy-and-Type-Declarations-1.1.4"><span class="toc-item-num">1.1.4&nbsp;&nbsp;</span>Cython with Numpy and Type Declarations</a></span></li></ul></li><li><span><a href="#Conclusion:" data-toc-modified-id="Conclusion:-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Conclusion:</a></span></li></ul></li></ul></div>

# Cython

## Dot Product

The dot product of a vector is defined as:

$$u \cdot v =  \sum_{i=1}^{N} u_i * v_i $$

In [8]:
%prun dot_product(u, v)

 

In [1]:
%load_ext Cython

In [5]:
# define the size of the vector
N = int(1e5)

In [6]:
# python random library to generate random vectors
import random

In [7]:
u = [random.random() for x in range(N)]
v = [random.random() for x in range(N)]

### Pure Python

In [1]:
# define the dot product in pure Python
dot_product = lambda a, b: sum([i * j for i, j in zip(a, b)])

In [25]:
%timeit -r 7 -n 100 dot_product(u,v)

14 ms ± 835 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Simple Cython

In [17]:
%%cython
dot_product_cy = lambda a, b: sum([i * j for i, j in zip(a, b)])

In [26]:
%timeit -r 7 -n 100 dot_product_cy(u, v)

8.53 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Cython with Type Declarations

In [33]:
%%cython 
cimport cython


# import malloc (allocates requested memory) and free (releases the memory)
from libc.stdlib cimport malloc, free

# Sidenote:
# in C, standard arrays are defined using pointers
# in C++ 11, the standard library provides containers,
    # such as vector<>, which do the memory management

cdef double dot_product_cy_typed_eval(double *a, double *b, int size):
    cdef double sum = 0.0
    # python loops are slow because it tries to dynamically determine the 
    # type of each iterator on each iteration, fix by declaraing iterator type 
    cdef int i
    
    for i in range(size):
        sum+= a[i]*b[i]

    return sum


# helper function, copies Python lists to C then runs calculation
def dot_product_cy_typed(list a, list b):
    # get the size of list a, assume b is the same size
    cdef size_t size = len(a)
    
    # allocate memory of a_ and b_ C arrays that will contain a copy of a and b lists, respectively
    cdef double *a_ = <double *> malloc(size*sizeof(double *))
    cdef double *b_ = <double *> malloc(size*sizeof(double *))
    cdef double result
    

    cdef int i
    # copy python list to C array
    for i in range(size):
        a_[i] = a[i]
        b_[i] = b[i]
        i+=1
    
    # calculate the dot product
    result = dot_product_cy_typed_eval(a_, b_, size)
    
    # Release the memory (IMPORTANT!!)
    free(a_)
    free(b_)
    
    return result

In [34]:
%timeit -r 7 -n 100 dot_product_cy_typed(u, v)

2.8 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [35]:
import numpy as np

In [36]:
%timeit -r 7 -n 100 np.dot(u, v)

11.7 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


MUCH faster than Numpy!

In [37]:
# ensure that our implementation is correct
np.isclose(dot_product_cy_typed(u, v), np.dot(u, v))

True

### Cython with Numpy and Type Declarations
If we use NumPy rather than lists and arrays, we don;t have to copy the lists to C. This gives even more of a speed boost.

If, however, we time the execution with the instantiation of the numpy arrays we run almost at the same speed as np.dot(u,v).

In [42]:
%%cython 

cimport cython
cimport numpy as np

# this function uses dynamic binding and can therefore accept Python objects, i.e. numpy arrays
cpdef double dot_product_cy_np(np.ndarray[double] a, np.ndarray[double] b):
    
    cdef int i
    cdef double result = 0.0
    for i in range(a.size):
        result+= a[i]*b[i]
        
    return result

In [39]:
u_np = np.array(u)
v_np = np.array(v)

In [50]:
%%timeit -r 7 -n 100
# run np.dot with numpy arrays, without accounting for array instantiation time
np.dot(u_np, v_np)

32.8 µs ± 6.54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [45]:
%%timeit -r 7 -n 100
# run cython function with numpy arrays
dot_product_cy_np(u_np, v_np)

200 µs ± 28.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [47]:
%%timeit -r 7 -n 100
# time with numpy array instantiation
u_np = np.array(u)
v_np = np.array(v)
dot_product_cy_np(u_np, v_np)

14.6 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [48]:
# sanity check
np.isclose(dot_product_cython_2(u_np, v_np), np.dot(u_np, v_np))

True

## Conclusion:

We've successfully showcased the potential speed gains of Cython, and we've also shown NumPy to ultimately be faster and more convenient, which reinforces the first rule of optimization:

**Don't reinvent the wheel. If there exists a sufficient implementation of what you are trying to do, use it.**