<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Goals" data-toc-modified-id="Goals-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Goals</a></span></li><li><span><a href="#Iterating-over-a-python-list-containing-python-objects" data-toc-modified-id="Iterating-over-a-python-list-containing-python-objects-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Iterating over a python list containing python objects</a></span><ul class="toc-item"><li><span><a href="#Python-solution" data-toc-modified-id="Python-solution-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Python solution</a></span></li><li><span><a href="#Cythonizing-the-function" data-toc-modified-id="Cythonizing-the-function-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Cythonizing the function</a></span></li><li><span><a href="#Numpy-array-of-couple-objects" data-toc-modified-id="Numpy-array-of-couple-objects-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Numpy array of couple objects</a></span></li><li><span><a href="#Numpy-solution" data-toc-modified-id="Numpy-solution-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Numpy solution</a></span></li><li><span><a href="#Custom-type--for-couples" data-toc-modified-id="Custom-type--for-couples-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Custom type  for couples</a></span></li><li><span><a href="#cdef-struct-objects-are-not-accessible-from-python:-Defining--cdef-class-CyCouple" data-toc-modified-id="cdef-struct-objects-are-not-accessible-from-python:-Defining--cdef-class-CyCouple-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span><code>cdef struct</code> objects are not accessible from python: Defining  <code>cdef class CyCouple</code></a></span></li><li><span><a href="#cdef-classes-also-benefit-standard-python-functions" data-toc-modified-id="cdef-classes-also-benefit-standard-python-functions-2.7"><span class="toc-item-num">2.7&nbsp;&nbsp;</span><code>cdef classes</code> also benefit standard python functions</a></span></li></ul></li></ul></div>

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%load_ext cython
%timeit

import Cython
import os
import subprocess
import matplotlib
matplotlib.style.use('ggplot')
import pandas

# Goals

This notebook presents how to define structs in cython and use them in your python code. 




To use cython code inisde a cell use the ``%%cython`` marker to compile it.

# Iterating over a python list containing python objects

Let us consider we have a python list of tuples. Each tuple contains the earnings of a "married couple". Component 0 contains female earnings and component 1 male earnings. We want a function that counts the number of couples where the female earns more than the male.

A numpy solution would be to store all wages in M,F arrays and then simply do `np.sum(F>M)`.
Let us try to do it iterating over an array.

In [2]:
import random
from collections import namedtuple
Couple = namedtuple('Couple', ['female', 'male'])#, verbose=False)

In [23]:
def create_couples(n_couples:int):
    couples = []

    for n in range(n_couples):
        couple = Couple(random.randint(70_000,200_000),random.randint(70_000,200_000))
        couples.append(couple)
    return couples

couples = create_couples(100_0000)

## Python solution

In [30]:
def py_count_women_earning_more(couples):
    count = 0
    for c in couples:
        count += c.female > c.male
    return count

In [35]:
%%timeit -o
py_count_women_earning_more(couples)

192 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


<TimeitResult : 192 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)>

In [36]:
timeit_py_count_women_earning_more = _

## Cythonizing the function 

In [37]:
%%cython -a
cpdef int cy_count_women_earning_more(couples):
    cdef count = 0
    for c in couples:
        count += c.female > c.male
    return count

In [38]:
%%timeit -o
cy_count_women_earning_more(couples)

137 ms ± 796 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


<TimeitResult : 137 ms ± 796 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)>

In [39]:
timeit_cy_count_women_earning_more = _

In [42]:
speedup = timeit_py_count_women_earning_more.average/timeit_cy_count_women_earning_more.average

In [43]:
speedup

1.404035409444977

## Custom type  for couples

Now we will study the performance of `cy_count_women_earning_more_from_py_couples` which takes as input `py_couples` (a list of python namedtuples), generates a list of CyCouple structs and then performs the computation.

Notice that this method is creating another list but since the new list will have efficient C structs the computations will be faster.


In [165]:
%%cython -a
cimport numpy as cnp
cimport cython


from collections import namedtuple

Couple = namedtuple('Couple', ['female', 'male'])#, verbose=False)

cdef struct CyCouple:
    int female
    int male


cdef void make_CyCouple_array(list py_couples, CyCouple *cy_couples, int num_cy_couples):
    """
    Produces an array of CyCouple structs from a list of python Couple objects.
    """
    cdef CyCouple *CyCouple
    
    for i, py_couple in enumerate(py_couples):
        if i>= num_cy_couples:
            break
            
        cy_couples[i].female = py_couple.female
        cy_couples[i].male   = py_couple.male
    
    
cdef make_Couple_list(CyCouple *cy_couples, int num_cy_couples):
    """
    Produces an list of Python Couples from an array of CyCouple structs.
    """
    py_couples = []
    
    for i in range(num_cy_couples):
        py_couple = Couple(cy_couples[i].female, cy_couples[i].male)
        py_couples.append(py_couple)
        
    return py_couples
      
                                             
cpdef int cy_count_women_earning_more_from_py_couples(list py_couples):

    cdef:
        int count = 0, r, N
        CyCouple cy_couples[100_0000]
        
    N  = len(py_couples)
        
    make_CyCouple_array(py_couples, cy_couples, N)
    
    for n in range(N):
        r = cy_couples[n].female > cy_couples[n].male
        count += r
    return count

In [166]:
len(couples)

1000000

In [167]:
%%timeit -o
cy_count_women_earning_more_from_py_couples(couples)

98.5 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


<TimeitResult : 98.5 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)>

In [168]:
timeit_cy_count_women_earning_more_from_py_couples = _

In [172]:
timeit_cy_count_women_earning_more_from_py_couples.average

0.09846611492857847

In [173]:
timeit_np_cy_count_women_earning_more.average

1.5748196238572355e-06

## `cdef struct ` objects are not accessible from python: Defining  `cdef class CyCouple`

Notice that in  cython we can do `CyCouple(2,3)` or `CyCouple(female=2,male=3)` but we can't use a `cdef struct` in Python right away.

What we are allowed to do is to construct a cython class (usualy called Extension type) and instanciate that class from Python. A cython class can have 


In [174]:
%%cython

cdef struct CyCouple:
    int female
    int male

c1 = CyCouple(2,3)
c2 = CyCouple(female=2,male=3)

In [175]:
# Error CyCouple not defined
CyCouple(2,3)

NameError: name 'CyCouple' is not defined

In [178]:
%%cython

cdef class CyCouple:
    
    cdef public int female
    cdef public int male
    
    def __cinint__(self, female, male):
        self.female = female
        self.male = male
        
    def __init__(self,female,male):
        self.female = female
        self.male   = male
        
    def __repr__(self):
        'Return a nicely formatted representation string'
        return 'CyCouple[female={}, male={}]'.format(self.female, self.male)

In [179]:
c = CyCouple(2,6)

In [180]:
c.female, c.male

(2, 6)

In [181]:
c

CyCouple[female=2, male=6]

Notice that if we try con instanciate with an invalid type we will get an error

In [182]:
# Error the class expected a type during creation
c = CyCouple(2,"maria")

TypeError: an integer is required

In [183]:
def create_cycouples(n_couples:int):
    couples = []

    for n in range(n_couples):
        couple = CyCouple(random.randint(70_000,200_000),random.randint(70_000,200_000))
        couples.append(couple)
    return couples

cy_couples = create_cycouples(100_0000)

In [184]:
cy_couples[0]

CyCouple[female=99666, male=130913]

Now we have a class that has as fields typed integers and therefore, cython should avoid checking the type of CyCouple fields every type they are accessed.

Let us try again the function that counts how many females earn more than males.

In [185]:
%%cython -a
cimport numpy as cnp
cimport cython

                                             
cpdef int cy_count_women_earning_more_from_cy_couples(list cy_couples):
    cdef:
        int count = 0, r, N
        
    N  = len(cy_couples)         
    for cy_couple in cy_couples:
        r = cy_couple.female > cy_couple.male
        count += r
    return count

In [186]:
%%timeit
cy_count_women_earning_more_from_cy_couples(cy_couples)

84.9 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [187]:
%%cython -a
cimport numpy as cnp
cimport cython

              
@cython.boundscheck(False)
cpdef int cy_count_women_earning_more_from_cy_couples(list cy_couples):
    cdef:
        int count = 0, r, N, female, male
        
    N  = len(cy_couples)         
    for n in range(N):
        female = cy_couples[n].female
        male = cy_couples[n].male
        count += female > male
    return count

In [188]:
%%timeit
cy_count_women_earning_more_from_cy_couples(cy_couples)

69.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Giving type information to cy_couple

In [198]:
%%cython -a
cimport numpy as cnp
cimport cython

                                             
cpdef int cy_count_women_earning_more_from_cy_couples(list cy_couples):
    cdef:
        int count = 0, r, N
        
    N  = len(cy_couples)         
    for n in range(N):
        r = cy_couples[n].female > cy_couples[n].male
        count += r
    return count

## `cdef classes` also benefit standard python functions

Notice that **if we use a list of cy_couple objects** (so every element in the list is a cdef class CyCouple) **we also benefit form standard python functions**.

In [190]:
py_couples = create_couples(100_0000)

In [191]:
%%timeit
py_count_women_earning_more(py_couples)

194 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [193]:
%%timeit -o
py_count_women_earning_more(cy_couples)

177 ms ± 2.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


<TimeitResult : 177 ms ± 2.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)>

In [195]:
timeit_cycouples = _

In [197]:
timeit_cycouples.average

0.1769310309285564