<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Goals" data-toc-modified-id="Goals-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Goals</a></span></li><li><span><a href="#Cython-notebook" data-toc-modified-id="Cython-notebook-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Cython notebook</a></span><ul class="toc-item"><li><span><a href="#Iterating-over-a-python-list-containing-python-objects" data-toc-modified-id="Iterating-over-a-python-list-containing-python-objects-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Iterating over a python list containing python objects</a></span><ul class="toc-item"><li><span><a href="#Python-solution" data-toc-modified-id="Python-solution-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Python solution</a></span></li><li><span><a href="#Cythonizing-the-function" data-toc-modified-id="Cythonizing-the-function-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Cythonizing the function</a></span></li><li><span><a href="#Numpy-array-of-couple-objects" data-toc-modified-id="Numpy-array-of-couple-objects-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>Numpy array of couple objects</a></span></li><li><span><a href="#Numpy-solution" data-toc-modified-id="Numpy-solution-2.1.4"><span class="toc-item-num">2.1.4&nbsp;&nbsp;</span>Numpy solution</a></span></li><li><span><a href="#Custom-type--for-couples" data-toc-modified-id="Custom-type--for-couples-2.1.5"><span class="toc-item-num">2.1.5&nbsp;&nbsp;</span>Custom type  for couples</a></span></li><li><span><a href="#Annotations-in-cython" data-toc-modified-id="Annotations-in-cython-2.1.6"><span class="toc-item-num">2.1.6&nbsp;&nbsp;</span>Annotations in cython</a></span></li></ul></li><li><span><a href="#Example-2)--Find-divisors" data-toc-modified-id="Example-2)--Find-divisors-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Example 2)  Find divisors</a></span></li><li><span><a href="#For-loop-counting" data-toc-modified-id="For-loop-counting-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>For loop counting</a></span></li></ul></li></ul></div>

In [1]:
%load_ext cython

In [2]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%load_ext cython
%timeit

import Cython
import os
import subprocess
import matplotlib
matplotlib.style.use('ggplot')
import pandas

The cython extension is already loaded. To reload it, use:
  %reload_ext cython


# Goals

This notebook presents how to define structs in cython and use them in your python code. 



# Cython notebook

To use cython code inisde a cell use the ``%%cython`` marker to compile it.

## Iterating over a python list containing python objects

Let us consider we have a python list of tuples. Each tuple contains the earnings of a "married couple". Component 0 contains female earnings and component 1 male earnings. We want a function that counts the number of couples where the female earns more than the male.

A numpy solution would be to store all wages in M,F arrays and then simply do `np.sum(F>M)`.
Let us try to do it iterating over an array.

In [3]:
import random
from collections import namedtuple
Couple = namedtuple('Couple', ['female', 'male'], verbose=False)

In [4]:
def create_couples(n_couples:int):
    couples = []

    for n in range(n_couples):
        couple = Couple(random.randint(70_000,200_000),random.randint(70_000,200_000))
        couples.append(couple)
    return couples

couples = create_couples(100_0000)

### Python solution

In [5]:
def count_women_earning_more(couples):
    count = 0
    for c in couples:
        count += c.female > c.male
    return count

In [6]:
%%timeit
count_women_earning_more(couples)

179 ms ± 27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
count_women_earning_more(couples)

500044

### Cythonizing the function 

In [8]:
%%cython -a
cpdef int cy_count_women_earning_more(couples):
    cdef count = 0
    for c in couples:
        count += c.female > c.male
    return count

In [9]:
%%timeit
cy_count_women_earning_more(couples)

138 ms ± 5.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Numpy array of couple objects

In [10]:
import numpy as np
np_couples = np.array(couples)

In [11]:
%%cython -a
cimport numpy as cnp
cimport cython
                                         
cpdef int np_cy_count_women_earning_more(cnp.ndarray[long,ndim=2] couples):
    cdef long count = 0
    cdef size_t N
    N = len(couples)
    cdef int r
    
    for n in range(N):
        r = couples[n,0] > couples[n,1]
        count += r
    return count

In [12]:
%%timeit
np_cy_count_women_earning_more(np_couples)

1.76 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
%%timeit
np_couples = np.array(couples)

3.59 s ± 725 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [15]:
%%timeit
np_couples = np.array(couples)
np_cy_count_women_earning_more(np_couples)

2.75 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Numpy solution

In [16]:
%%timeit
np.sum(np_couples[:,0]>np_couples[:,1])

2.73 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Custom type  for couples

In [18]:
%%cython 
cimport numpy as cnp
cimport cython


from collections import namedtuple

Couple = namedtuple('Couple', ['female', 'male'], verbose=False)

cdef struct CyCouple:
    int female
    int male


cdef void make_CyCouple_array(list py_couples, CyCouple *cy_couples, int num_cy_couples):
    """
    Produces an array of CyCouple structs from a list of python Couple objects.
    """
    cdef CyCouple *CyCouple
    
    for i, py_couple in enumerate(py_couples):
        if i>= num_cy_couples:
            break
            
        cy_couples[i].female = py_couple.female
        cy_couples[i].male   = py_couple.male
    
    
cdef make_Couple_list(CyCouple *cy_couples, int num_cy_couples):
    """
    Produces an list of Python Couples from an array of CyCouple structs.
    """
    py_couples = []
    
    for i in range(num_cy_couples):
        py_couple = Couple(cy_couples[i].female, cy_couples[i].male)
        py_couples.append(py_couple)
        
    return py_couples
      
                                             
cpdef int np_cy_count_women_earning_more2(list py_couples):

    cdef:
        int count = 0, r, N
        CyCouple cy_couples[100_0000]
        
    N  = len(py_couples)
        
    make_CyCouple_array(py_couples, cy_couples, N)
    #cy_couples = CyCouple[n_couples]   
    #N = n_couples

    
    for n in range(N):
        r = cy_couples[n].female > cy_couples[n].male
        count += r
    return count

In [19]:
len(couples)

1000000

In [20]:
%%time 
np_cy_count_women_earning_more2(couples)

CPU times: user 108 ms, sys: 0 ns, total: 108 ms
Wall time: 107 ms


500044

In [None]:
def create_couples(n_couples:int):
    couples = []

    for n in range(n_couples):
        couple = Couple(random.randint(70_000,200_000),random.randint(70_000,200_000))
        couples.append(couple)
    return couples

couples = create_couples(100_0000)

In [None]:
%%cython 
def cy_fib(int n):
    cdef int i
    cdef double a=0.0, b=1.0
    for i in range(n):
        a, b = a + b, a
    return a

In [None]:
cy_fib

In [None]:
def fib(n):
    a = 0
    b = 1
    for i in range(n):
        a, b = a + b, a
    return a

In [None]:
fib(10)

In [None]:
cy_fib(10)

In [None]:
%timeit cy_fib(1000)

In [None]:
%timeit fib(1000)

In [None]:
df_time = pandas.DataFrame({"python_fib":[73.6],
                            "cython_fib":[1.13]})

In [None]:
df_time.plot(kind='bar', title="time execution python vs cython")

### Annotations in cython

You can show Cython’s code analysis by passing the --annotate option:

In [None]:
%%cython --annotate

def cy_fib(int n):
    cdef int i
    cdef double a=0.0, b=1.0
    for i in range(n):
        a, b = a + b, a
    return a

## Example 2)  Find divisors


In [None]:
import math

def all_divisors(x):
    divisors = []
    for i in range(1,x//2+1):
        if x%i ==0:
            divisors.append(i)
    
    return divisors


In [None]:
%timeit all_divisors(10000)

In [None]:
%%cython

def cy_all_divisors(int x):
    cdef int i    
    divisors = []
    
    for i in range(1,x//2+1):
        if x%i ==0:
            divisors.append(i)
    
    return divisors

In [None]:
%timeit cy_all_divisors(10000)

In [None]:
%%cython --annotate

def cy_all_divisors(int x):
    cdef int i    
    divisors = []
    
    for i in range(1,x//2+1):
        if x%i ==0:
            divisors.append(i)
    
    return divisors

## For loop counting

In [None]:
import numpy as np

np.random.seed(1234)
vec = np.random.randint(0,5,10000)

In [None]:
def summing(vec):
    total = 0
    for x in vec:
        total+=x
    return total

In [None]:
%%time
summing(vec)

In [None]:
%%cython --annotate

def cy_summing1(long[:] vec):
    
    cdef long total = 0;
    cdef int i;
    cdef int len_vec = len(vec)
    
    for i in range(len_vec):
        total += vec[i]

    return total

In [None]:
%%time
cy_summing1(vec)