# Chapter 4. C Performance with Cython

Cython is a language that extends Python by supporting the declaration of types for functions, variables, and classes. These typed declarations enable Cython to compile Python scripts to efficient C code. Cython can also act as a bridge between Python and C as it provides easy-to-use constructs to write interfaces to external C and C++ routines.

## 4.1 Compiling cython extensions

The Cython syntax is, by design, a superset of Python. Cython can compile, with a few exceptions, most Python modules without requiring any change. Cython source files have the .pyx extension and can be compiled to produce a C file using the cython command

In [1]:
def hello():
    print('Hello, World!')

In [None]:
!cython hello.pyx

In [None]:
!gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -lm -I/usr/include/python3.5/ -o hello.so hello.c

In [None]:
!python -c "from distutils import sysconfig;
 print(sysconfig.get_python_inc())"

In [None]:
import hello
hello.hello()

In [None]:
!cython -3 hello.pyx

In [None]:
!gcc -I/usr/include/python3.5 # ... other options

In [None]:
!gcc -I/usr/include/python2.7 # ... other options

In [None]:
from distutils.core import setup
from Cython.Build import cythonize

setup(
    name='Hello',
    ext_modules = cythonize('hello.pyx')
)

cythonize(['hello.pyx', 'world.pyx', '*.pyx'])

In [None]:
!python setup.py build_ext --inplace

In [None]:
import pyximport
pyximport.install()
import hello # This will compile hello.pyx

In [None]:
conda install Cython

In [None]:
%load_ext cythonmagic

In [None]:
%%cython

def hello_snippet():
    print("Hello, Cython!")
    
hello_snippet()
# Hello, Cython!

## 4.2 Adding static types

In Python, a variable can be associated to objects of different types during the execution of the program. While this feature is desirable as it makes the language flexible and dynamic, it also adds a significant overhead to the interpreter as it needs to look up type and methods of the variables at runtime, making it difficult to perform various optimizations. Cython extends the Python language with explicit type declarations so that it can generate efficient C extensions through compilation.

The main way to declare data types in Cython is through cdef statements. The *cdef* keyword can be used in multiple contexts, such as variables, functions, and extension types (statically-typed classes).

### 4.2.1 Variables

In Cython, you can declare the type of a variable by prepending the variable with cdef and its respective type. For example, we can declare the i variable as a 16 bit integer in the following way:

In [None]:
%load_ext Cython

In [None]:
cdef int i

The cdef statement supports multiple variable names on the same line along with optional initialization, as seen in the following line

In [None]:
cdef double a, b = 2.0, c = 3.0

Typed variables are treated differently in comparison to regular variables. In Python, variables are often described as labels that refer to objects in memory. For example, we could assign the value 'hello' to the a variable at any point in the program without restriction:

In [None]:
a = 'hello'

The a variable holds a reference to the 'hello' string. We can also freely assign another value (for example, the integer 1) to the same variable later in the code

In [None]:
a = 1

Python will assign the integer 1 to the a variable without any problem

Typed variables behave quite differently and are usually described as data containers: we can only store values that fit into the container that is determined by its data type. For example, if we declare the a variable as int, and then we try to assign it to a double, Cython will trigger an error, as shown in the following code:

In [24]:
%%cython
cdef int i
i = 3.0
# Output has been cut
# ...cf4b.pyx:2:4 Cannot assign type 'double' to 'int'


Error compiling Cython file:
------------------------------------------------------------
...
cdef int i
i = 3.0
   ^
------------------------------------------------------------

/Users/boyuan/.ipython/cython/_cython_magic_720a194146990106a1362e98518b71aa.pyx:2:4: Cannot assign type 'double' to 'int'


Static typing makes it easy for the compiler to perform useful optimizations. For example, if we declare a loop index as int, Cython will rewrite the loop in pure C without needing to step into the Python interpreter. The typing declaration guarantees that the type of the index will always be int and cannot be overwritten at runtime so that the compiler is free to perform the optimizations without compromising the program correctness.

In [28]:
%%cython
def example():
    cdef int i, j=0
    for i in range(100):
        j += 1
    return j
example()

# Result:
# 100

In [26]:
def example_python():
    j=0
    for i in range(100):
        j += 1
    return j

In [27]:
%timeit example()
# 10000000 loops, best of 3: 25 ns per loop
%timeit example_python()
# 100000 loops, best of 3: 2.74 us per loop

29.5 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
2.49 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


This works because the Cython loop has first been converted to pure C and then to efficient machine code, while the Python loop still relies on the slow interpreter.

In Cython, it is possible to declare a variable to be of any standard C type, and it is also possible to define custom types using classic C constructs, such as struct, enum, and typedef.

An interesting example is that if we declare a variable to be of the object type, the variable will accept any kind of Python object:

In [30]:
%%cython 
cdef object a_py
# both 'hello' and 1 are Python objects
a_py = 'hello'
a_py = 1

Note that declaring a variable as object has no performance benefits as accessing and operating on the object will still require the interpreter to look up the underlying type of the variable and its attributes and methods.

Sometimes, certain data types (such as float and int numbers) are compatible in the sense that they can be converted into each other. In Cython, it is possible to convert (cast) between types by surrounding the destination type between pointy brackets

In [31]:
%%cython
cdef int a = 0
cdef double b
b = <double> a

### 4.2.2 Functions

Add type information to the arguments of a Python function by specifying the type in front of each of the argument names. Functions specified in this way will work and perform like regular Python functions, but their arguments will be type-checked

In [33]:
%%cython
def max_python(int a, int b):
    return a if a > b else b

A function specified in this way will perform type-checking and treat the arguments as typed variables, just like in cdef definitions. However, the function will still be a Python function, and calling it multiple times will still need to switch back to the interpreter. To allow Cython for function call optimizations, we should declare the type of the return type using a cdef statement

In [35]:
%%cython
cdef int max_cython(int a, int b):
    return a if a > b else b

Functions declared in this way are translated to native C functions and have much less overhead compared to Python functions. A substantial drawback is that they can't be used from Python, but only from Cython, and their scope is restricted to the same Cython file unless they're exposed in a definition file

Declare a function with a cpdef statement, Cython will generate two versions of the function: a Python version available to the interpreter, and a fast C function usable from Cython. The cpdef syntax is equivalent to cdef

In [36]:
%%cython 
cpdef int max_hybrid(int a, int b):
    return a if a > b else b

When the function body is small, it is convenient to add the inline keyword in front of the function definition; the function call will be replaced by the function body itself. Our max function is a good candidate for inlining:

In [37]:
%%cython 
cdef inline int max_inline(int a, int b):
    return a if a > b else b

### 4.2.3 Classes

We can define an extension type using the cdef class statement and declaring its attributes in the class body. For example, we can create an extension type--Point--as shown in the following code, which stores two coordinates (x, y) of the double type:

In [43]:
%%cython 
cdef class Point:
    cdef double x 
    cdef double y 
    
def __init__(self, double x, double y):
    self.x = x
    self.y = y

Accessing the declared attributes in the class methods allows Cython to bypass expensive Python attribute look-ups by direct access to the given fields in the underlying C struct. For this reason, attribute access in typed classes is an extremely fast operation.

To use the cdef class in your code, you need to explicitly declare the type of the variables you intend to use at compile time. You can use the extension type name (such as Point) in any context where you will use a standard type (such as double, float, and int). For example, if we want a Cython function that calculates the distance from the origin (in the example, the function is called norm) of a Point, we have to declare the input variable as Point, as shown in the following code

In [45]:
%%cython 

cdef class Point:
    cdef double x 
    cdef double y 
    
cdef double norm(Point p):
    return(p.x ** 2 + p.y ** 2) ** 0.5

In [None]:
a = Point(0.0, 0.0)
a.x

In order to access attributes from Python code, you have to use the public (for read/write access) or readonly specifiers in the attribute declaration

In [49]:
%%cython 
cdef class Point:
    cdef public double x

Additionally, methods can be declared with the cpdef statement, just like regular functions.

Extension types do not support the addition of extra attributes at runtime. In order to do that, a solution is defining a Python class that is a subclass of the typed class and extends its attributes and methods in pure Python

## 4.3 Sharing declarations

When writing your Cython modules, you may want to reorganize your most used functions and classes declaration in a separate file so that they can be reused in different modules. Cython allows you to put these components in a definition file and access them with cimport statements

Let's say that we have a module with the max and min functions, and we want to reuse those functions in multiple Cython programs. If we simply write a bunch of functions in a .pyx file, the declarations will be confined to the same file.

Definition files are also used to interface Cython with external C code. The idea is to copy (or, more accurately, translate) the types and function prototypes in the definition file and leave the implementation in the external C code that will be compiled and linked in a separate step

To share the max and min functions, we need to write a definition file with a .pxd extension. Such a file only contains the types and function prototypes that we want to share with other modules--a public interface. We can declare the prototypes of our max and min functions in a file named mathlib.pxd, as follows

In [None]:
cdef int max(int a, int b)
cdef int min(int a, int b)

Only write the function name and arguments without implementing the function body

The function implementation goes into the implementation file with the same base name but the .pyx extension--mathlib.pyx

In [None]:
cdef int max(int a, int b):
    return a if a > b else b
cdef int min(int a, int b):
    return a if a < b else b

The mathlib module is now importable from another Cython module

In [None]:
max(abs(x1 - x2), abs(y1 - y2))

In [None]:
%%cython 
from mathlib cimport max

def chebyshev(int x1, int y1, int x2, int y2):
    return max(abs(x1 - x2), abs(y1 - y2))

## 4.4 Working with arrays

Numerical and high performance calculations often make use of arrays. Cython provides an easy way to interact with them, using directly low-level C arrays, or the more general typed memoryviews

### 4.4.1 C arrays and pointers

C arrays are a collection of items of the same type, stored contiguously in memory.

Variables in C are like containers. When creating a variable, a space in memory is reserved to store its value. For example, if we create a variable containing a 64 bit floating point number (double), the program will allocate 64 bit (16 bytes) of memory. This portion of memory can be accessed through an address to that memory location

To obtain the address of a variable, we can use the address operator denoted by the & symbol. We can also use the printf function, as follows, available in the libc.stdio Cython module to print the address of this variable

In [57]:
%%cython
cdef double a
from libc.stdio cimport printf
printf("%p", &a)
# Output:
# 0x7fc8bb611210

Memory addresses can be stored in special variables, pointers, that can be declared by putting a * prefix in front of the variable name, as follows

In [59]:
%%cython 
from libc.stdio cimport printf
cdef double a
cdef double *a_pointer
a_pointer = &a # a_pointer and &a are of the same type

If we have a pointer, and we want to grab the value contained in the address it's pointing at, we can use the dereference operator denoted by the * symbol. Be careful, the * used in this context has a different meaning from the * used in the variable declaration

In [None]:
%%cython 
cdef double a
cdef double *a_pointer
a_pointer = &a

a = 3.0
print(*a_pointer) # prints 3.0

When declaring a C array, the program allocates enough space to accommodate all the elements requested. For instance, to create an array that has 10 double values (16 bytes each), the program will reserve 16 * 10 = 160 bytes of contiguous space in memory. In Cython, we can declare such arrays using the following syntax

In [None]:
cdef double arr[10]

We can also declare a multidimensional array, such as an array with 5 rows and 2 columns, using the following syntax

In [None]:
cdef double arr[5][2]

The memory will be allocated in a single block of memory, row after row. This order is commonly referred to as row-major. Arrays can also be ordered column-major, as is the case for the FORTRAN programming language

Array ordering has important consequences. When iterating a C array over the last dimension, we access contiguous memory blocks (in our example, 0, 1, 2, 3 ...) while when we iterate on the first dimension, we skip a few positions (0, 2, 4, 6, 8, 1 ... ). You should always try to access memory sequentially as this optimizes cache and memory usage

We can store and retrieve elements from the array using standard indexing; C arrays don't support fancy indexing or slices

arr[0] = 1.0

C arrays have many of the same behaviors as pointers. The arr variable, in fact, points to the memory location of the first element of the array. We can verify that the address of the first element of the array is the same as the address contained in the arr variable using the dereference operator

In [60]:
%%cython
from libc.stdio cimport printf
cdef double arr[10]
printf("%pn", arr)
printf("%pn", &arr[0])

# Output
# 0x7ff6de204220
# 0x7ff6de204220

You should use C arrays and pointers when interfacing with the existing C libraries or when you need a fine control over the memory (also, they are very performant). This level of fine control is also prone to mistakes as it doesn't prevent you from accessing the wrong memory locations.

### 4.4.2 Numpy arrays 

NumPy arrays can be used as normal Python objects in Cython using their already optimized broadcasted operations. However, Cython provides a numpy module with better support for direct iteration

When we normally access an element of a NumPy array, a few other operations take place at the interpreter level causing a major overhead. Cython can bypass those operations and checks by acting directly on the underlying memory area used by NumPy arrays, and thus obtaining impressive performance gains

NumPy arrays can be declared as the ndarray data type. To use the data type in our code, we first need to cimport the numpy Cython module (which is not the same as the Python NumPy module). We will bind the module to the c_np variable to make the difference with the Python numpy module more explicit:

In [None]:
%%cython 
cimport numpy as c_np
import numpy as np

We can now declare a NumPy array by specifying its type and the number of dimensions between square brackets (this is called buffer syntax). To declare a two-dimensional array of type double, we can use the following code:

In [None]:
%%cython 
cdef c_np.ndarray[double, ndim=2] arr

Access to this array will be performed by directly operating on the underlying memory area; the operation will avoid stepping into the interpreter, giving us a tremendous speed boost.

We first write the numpy_bench_py function that increments each element of py_arr. We declare the i index as an integer so that we avoid the for-loop overhead

In [66]:
%%cython
import numpy as np
def numpy_bench_py():
    py_arr = np.random.rand(1000)
    cdef int i
    for i in range(1000):
        py_arr[i] += 1

Then, we write the same function using the ndarray type. Note that after we define the c_arr variable using c_np.ndarray, we can assign to it an array from the numpy Python module:

In [67]:
%%cython
import numpy as np
cimport numpy as c_np

def numpy_bench_c():
    cdef c_np.ndarray[double, ndim=1] c_arr
    c_arr = np.random.rand(1000)
    cdef int i
    for i in range(1000):
        c_arr[i] += 1

In [68]:
%timeit numpy_bench_c()
# 100000 loops, best of 3: 11.5 us per loop
%timeit numpy_bench_py()
# 1000 loops, best of 3: 603 us per loop

8.22 µs ± 363 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
266 µs ± 2.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### 4.4.3 Typed memoryviews

C and NumPy arrays as well as the built-in bytes, bytearray, and array.array objects are similar in the sense that they all operate on a contiguous memory area (also called memory buffer). Cython provides a universal interface--the typed memoryview--that unifies and simplifies the access to all these data types

A memoryview is an object that maintains a reference on a specific memory area. It doesn't actually own the memory, but it can read and change its contents; in other words, it is a view on the underlying data. Memoryviews can be defined using a special syntax. For example, we can define a memoryview of int and a two-dimensional memoryview of double in the following way:

In [None]:
cdef int[:] a
cdef double[:, :] b

The same syntax applies to the declaration of any type in variables, function definitions, class attributes, and so on. Any object that exposes a buffer interface (for example, NumPy arrays, bytes, and array.array objects) will be bound to the memoryview automatically. For example, we can bind the memoryview to a NumPy array using a simple variable assignment

In [None]:
import numpy as np
cdef int[:] arr
arr_np = np.zeros(10, dtype='int32')
arr = arr_np # We bind the array to the memoryview

It is important to note that the memoryview does not own the data, but it only provides a way to access and change the data it is bound to; the ownership, in this case, is left to the NumPy array. As you can see in the following example, changes made through the memoryview will act on the underlying memory area and will be reflected in the original NumPy structure (and vice versa):

In [None]:
arr[2] = 1 # Changing memoryview
print(arr_np)
# [0 0 1 0 0 0 0 0 0 0]

In a certain sense, the mechanism behind memoryviews is similar to what NumPy produces when we slice an array. Slicing a NumPy array does not copy the data but returns a view on the same memory area, and changes to the view will reflect on the original array

Memoryviews also support array slicing with the standard NumPy syntax

In [None]:
cdef int[:, :, :] a
arr[0, :, :] # Is a 2-dimensional memoryview
arr[0, 0, :] # Is a 1-dimensional memoryview
arr[0, 0, 0] # Is an int

To copy data between one memoryview and another, you can use syntax similar to slice assignment

In [None]:
import numpy as np

cdef double[:, :] b
cdef double[:] r

b = np.random.rand(10, 3)
r = np.zeros(3, dtype='float64')

b[0, :] = r # Copy the value of r in the first row of b

## 4.5 Particle simulator in cython

In [None]:
def evolve_numpy(self, dt):
    timestep = 0.00001
    nsteps = int(dt/timestep)
    
    r_i = np.array([[p.x, p.y] for p in self.particles])
    ang_speed_i = np.array([p.ang_speed for p in self.particles])
    v_i = np.empty_like(r_i)
    
    for i in range(nsteps):
        norm_i = np.sqrt((r_i ** 2).sum(axis=1))
      
        v_i = r_i[:, [1, 0]]
        v_i[:, 0] *= -1
        v_i /= norm_i[:, np.newaxis]
        d_i = timestep * ang_speed_i[:, np.newaxis] * v_i
        r_i += d_i
        
    for i, p in enumerate(self.particles):
        p.x, p.y = r_i[i]

We want to convert this code to Cython. Our strategy will be to take advantage of the fast indexing operations by removing the NumPy array broadcasting, thus reverting to an indexing-based algorithm. Since Cython generates efficient C code, we are free to use as many loops as we like without any performance penalty

In [None]:
# file: simul.py
def evolve_cython(self, dt):
    timestep = 0.00001
    nsteps = int(dt/timestep)

    r_i = np.array([[p.x, p.y] for p in self.particles])
    ang_speed_i = np.array([p.ang_speed for p in self.particles])
    c_evolve(r_i, ang_speed_i, timestep, nsteps)
for i, p in enumerate(self.particles):
    p.x, p.y = r_i[i]

In [None]:
# file: cevolve.pyx
import numpy as np

def c_evolve(r_i, ang_speed_i, timestep, nsteps):
    v_i = np.empty_like(r_i)
    
    for i in range(nsteps):
        norm_i = np.sqrt((r_i ** 2).sum(axis=1))
        v_i = r_i[:, [1, 0]]
        v_i[:, 0] *= -1
        v_i /= norm_i[:, np.newaxis]
        d_i = timestep * ang_speed_i[:, np.newaxis] * v_i
        r_i += d_i

In [None]:
def benchmark(npart=100, method='python'):
    particles = [Particle(uniform(-1.0, 1.0),
                          uniform(-1.0, 1.0),
                          uniform(-1.0, 1.0))
                          for i in range(npart)]
    
simulator = ParticleSimulator(particles)
if method=='python':
    simulator.evolve_python(0.1)
elif method == 'cython':
    simulator.evolve_cython(0.1)
elif method == 'numpy':
    simulator.evolve_numpy(0.1)

In [None]:
%timeit benchmark(100, 'cython')
# 1 loops, best of 3: 401 ms per loop
%timeit benchmark(100, 'numpy')
# 1 loops, best of 3: 413 ms per loop

In [None]:
def c_evolve(double[:, :] r_i,
             double[:] ang_speed_i,
             double timestep,
             int nsteps):

In [None]:
cdef int i, j
cdef int nparticles = r_i.shape[0]

In [None]:
for i in range(nsteps):
    for j in range(nparticles):
        x = r_i[j, 0]
        y = r_i[j, 1]
        ang_speed = ang_speed_i[j]
        
        norm = sqrt(x ** 2 + y ** 2)
        vx = (-y)/norm
        vy = x/norm
        dx = timestep * ang_speed * vx
        dy = timestep * ang_speed * vy
        r_i[j, 0] += dx
        r_i[j, 1] += dy

In [None]:
cdef double norm, x, y, vx, vy, dx, dy, ang_speed

In [None]:
from libc.math cimport sqrt

In [None]:
%timeit benchmark(100, 'cython')
# 100 loops, best of 3: 13.4 ms per loop
%timeit benchmark(100, 'numpy')
# 1 loops, best of 3: 429 ms per loop

In [None]:
%timeit benchmark(1000, 'cython')
# 10 loops, best of 3: 134 ms per loop
%timeit benchmark(1000, 'numpy')
# 1 loops, best of 3: 877 ms per loop

## 4.6 Profiling cython

Cython provides a feature, called annotated view, that helps find which lines are executed in the Python interpreter and which are good candidates for ulterior optimizations. We can turn this feature on by compiling a Cython file with the -a option. In this way, Cython will generate an HTML file containing our code annotated with some useful information. The usage of the -a option is as follows:

In [None]:
!cython -a cevolve.pyx
!firefox cevolve.html

Each line in the source code can appear in different shades of yellow. A more intense color corresponds to more interpreter-related calls, while white lines are translated to regular C code. Since interpreter calls substantially slow down execution, the objective is to make the function body as white as possible

In [None]:
cimport cython
@cython.boundscheck(False)
def myfunction():
# Code here

In [None]:
with cython.boundscheck(False):
# Code here

In [None]:
# cython: boundscheck=False

In [None]:
!cython -X boundscheck=True

In [None]:
cimport cython
@cython.boundscheck(False)
@cython.cdivision(True)
def c_evolve(double[:, :] r_i,
             double[:] ang_speed_i,
             double timestep,
             int nsteps):

In [None]:
%timeit benchmark(100, 'cython')
# 100 loops, best of 3: 13.4 ms per loop

In [None]:
import numpy as np
from distance import chebyshev

def benchmark():
    a = np.random.rand(100, 2)
    b = np.random.rand(100, 2)
    for x1, y1 in a:
        for x2, y2 in b:
            chebyshev(x1, x2, y1, y2)

In [None]:
# cython: profile=True
cdef int max(int a, int b):
# Code here

In [None]:
import cheb
%prun cheb.benchmark()

## 4.7 Using cython with jupyter

Optimizing Cython code requires substantial trial and error. Fortunately, Cython tools can be conveniently accessed through the Jupyter notebook for a more streamlined and integrated experience.

In [69]:
%load_ext cython

The %%cython magic can be used to compile and load the Cython code inside the current session

In [70]:
%%cython
import numpy as np

cdef int max(int a, int b):
    return a if a > b else b

cdef int chebyshev(int x1, int y1, int x2, int y2):
    return max(abs(x1 - x2), abs(y1 - y2))

def c_benchmark():
    a = np.random.rand(1000, 2)
    b = np.random.rand(1000, 2)
    
    for x1, y1 in a:
        for x2, y2 in b:
            chebyshev(x1, x2, y1, y2)

A useful feature of the %%cython magic is the -a option that will compile the code and produce an annotated view (just like the command line -a option) of the source directly in the notebook

In [71]:
%%cython -a
import numpy as np

cdef int max(int a, int b):
    return a if a > b else b

cdef int chebyshev(int x1, int y1, int x2, int y2):
    return max(abs(x1 - x2), abs(y1 - y2))

def c_benchmark():
    a = np.random.rand(1000, 2)
    b = np.random.rand(1000, 2)
    
    for x1, y1 in a:
        for x2, y2 in b:
            chebyshev(x1, x2, y1, y2)

This allows you to quickly test different versions of your code and also use the other integrated tools available in Jupyter. For example, we can time and profile the code (provided that we activate the profile directive in the cell) in the same session using tools such as %prun and %timeit. For example, we can inspect the profiling results by taking advantage of the %prun magic

In [72]:
%prun c_benchmark()

 

         4 function calls in 0.695 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.695    0.695    0.695    0.695 {_cython_magic_f6ea454da547d242733f3c989b443324.c_benchmark}
        1    0.000    0.000    0.695    0.695 {built-in method builtins.exec}
        1    0.000    0.000    0.695    0.695 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

It is also possible to use the line_profiler tool directly in the notebook. In order to support line annotations

In [73]:
%%cython -a -f -c=-DCYTHON_TRACE=1
# cython: linetrace=True
# cython: binding=True

import numpy as np

cdef int max(int a, int b):
    return a if a > b else b
def chebyshev(int x1, int y1, int x2, int y2):
    return max(abs(x1 - x2), abs(y1 - y2))
def c_benchmark():
    a = np.random.rand(1000, 2)
    b = np.random.rand(1000, 2)
    for x1, y1 in a:
        for x2, y2 in b:
            chebyshev(x1, x2, y1, y2)

In [75]:
%load_ext line_profiler
%lprun -f c_benchmark c_benchmark()

Timer unit: 1e-06 s

Total time: 1.44541 s
File: /Users/boyuan/.ipython/cython/_cython_magic_8e773318a6cfe215e528e230607508df.pyx
Function: c_benchmark at line 10

Line #      Hits         Time  Per Hit   % Time  Line Contents
    10                                           def c_benchmark():
    11         1         46.0     46.0      0.0      a = np.random.rand(1000, 2)
    12         1         20.0     20.0      0.0      b = np.random.rand(1000, 2)
    13      1001        949.0      0.9      0.1      for x1, y1 in a:
    14   1001000     903535.0      0.9     62.5          for x2, y2 in b:
    15   1000000     540863.0      0.5     37.4              chebyshev(x1, x2, y1, y2)

## 4.8 Summary

Cython is a tool that bridges the convenience of Python with the speed of C. Compared to C bindings, Cython programs are much easier to maintain and debug, thanks to the tight integration and compatibility with Python and the availability of excellent tools.