# Common Gotchas

In [1]:
%load_ext Cython

### Calling Numpy functions

One of the most common beginner's mistakes when writing Cython code is to call some Numpy function within a tight loop that has otherwise been cleaned of the Python layer. Below is a simple example of a function that computes the element-wise maximum of two arrays:

In [19]:
%%cython 

import numpy as np 

def slow_cython(double[:] arr1, double[:] arr2):
    
    cdef int i
    cdef int npts = len(arr1)
    cdef double[:] result = np.zeros(npts, dtype='f8')
    
    for i in range(npts):
        result[i] = np.sqrt(arr1[i]) + np.exp(arr2[i])
    
    return np.array(result)
        

In [20]:
npts = int(1e6)
x = np.random.rand(npts)
y = np.random.rand(npts)

In [21]:
%timeit slow_cython(x, y)

1 loop, best of 3: 2.99 s per loop


The reason this loop is so slow is due to the calling of `numpy` functions inside the loop. Try copying-and-pasting this code into a cython module and run `cython -a` on it: you'll see an enormous amount of C code generated by the one line inside the loop. Even though you are probably used to thinking of `numpy` functions being fast because they are "written in C", the `slow_cython` code gives poor performance because the `np.sqrt` function and `np.exp` function are actual python objects that the Cython compiler turns to C. So there is a back-and-forth with the Python layer associated with calling any `numpy` function, even though the internals governing the behavior of the `numpy` function are written in C. 

The way around this problem is easy: you just need to find the appropriate function in `libc` library, and `cimport` that function instead of calling `numpy`. Here's how this works with our example:

In [26]:
%%cython 

import numpy as np 
from libc.math cimport sqrt as c_sqrt
from libc.math cimport exp as c_exp

def fast_cython(double[:] arr1, double[:] arr2):
    
    cdef int i
    cdef int npts = len(arr1)
    cdef double[:] result = np.zeros(npts, dtype='f8')
    
    for i in range(npts):
        result[i] = c_sqrt(arr1[i]) + c_exp(arr2[i])
    
    return np.array(result)


In [27]:
%timeit fast_cython(x, y)

10 loops, best of 3: 29.8 ms per loop


### Writing loops the pythonic way

One very nice feature of python is being able to write loops over array *elements* rather than array *indices*:

In [28]:
arr = np.arange(5)
for x in arr:
    print(x)

0
1
2
3
4


You may have noticed that all the Cythonized loops written in this repo were over the *indices* of the array, not over its elements:

In [30]:
npts = len(arr)
for i in range(npts):
    print(arr[i])

0
1
2
3
4


This is purposeful: whenever you do a loop the "pythonic" way, you are actually doing some high-level operations in the Python layer that cannot be reduced to an elementary C operation. So when writing loops in Cython, be sure to always do 

```
>>> for i in range(some_cdef_integer)
``` 

instead of 

```
>>> for element in array
```