https://tedboy.github.io/pandas/enhancingperf/enhancingperf1.html

In [1]:
import pandas as pd

In [3]:
import numpy as np

In [4]:
df = pd.DataFrame({'a': np.random.randn(1000),
                   'b': np.random.randn(1000),
                   'N': np.random.randint(100, 1000, (1000)),
                   'x': 'x'})

In [9]:
df['N'].sum()

549643

In [8]:
df

Unnamed: 0,a,b,N,x
0,-0.419506,0.388628,337,x
1,-0.641013,-1.267908,214,x
2,1.400197,0.934439,262,x
3,0.376756,0.518383,914,x
4,1.041748,-1.944490,524,x
...,...,...,...,...
995,0.006013,0.941230,483,x
996,0.285279,2.589398,981,x
997,-0.023540,-0.364760,642,x
998,-0.463712,-0.475140,601,x


In [13]:
def f(x):
    return x * (x - 1)

def integrate_f(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

In [6]:
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

77.5 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [10]:
%prun -l 8 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

 

The above, calling method will pass three series to integerate_f, let's call them x1, x2, x3, then a is from x1[i], etc.

In [15]:
def f2(x):
    return x * (x - 1)

def integrate_f2(x):
    a, b, N = x['a'], x['b'], x['N']
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f2(a + i * dx)
    return s * dx

In [40]:
%timeit df.apply(lambda x: integrate_f2(x), axis=1)

78.6 ms ± 2.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [42]:
%prun  df.apply(lambda x: integrate_f2(x), axis=1)

 

# OK, cython

In [19]:
%load_ext Cython

In [48]:
# add type checking, a big gain ~ 10 faster

In [37]:
%%cython
cdef double f_typed(double x) except? -2:
    return x * (x - 1)
cpdef double integrate_f_typed(double a, double b, int N):
    cdef int i
    cdef double s, dx
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f_typed(a + i * dx)
    return s * dx

In [38]:
%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)

8.43 ms ± 169 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [39]:
%prun df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)

 

It’s calling series... a lot! It’s creating a Series from each row, and get-ting from both the index and the series (three times for each row). Function calls are expensive in python, so maybe we could minimise these by cythonizing the apply part.

## Now, also explicitly telling our code the argument type is nd.array 

A big gain, ~ 10 faster

In [28]:
%%cython
cimport numpy as np
import numpy as np
cdef double f_typed(double x) except? -2:
    return x * (x - 1)
cpdef double integrate_f_typed(double a, double b, int N):
    cdef int i
    cdef double s, dx
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f_typed(a + i * dx)
    return s * dx
cpdef np.ndarray[double] apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
    cdef Py_ssize_t i, n = len(col_N)
    assert (len(col_a) == len(col_b) == n)
    cdef np.ndarray[double] res = np.empty(n)
    for i in range(len(col_a)):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res


In 0.13.0 since Series has internaly been refactored to no longer sub-class ndarray but instead subclass NDFrame, you can not pass a Series directly as a ndarray typed parameter to a cython function. Instead pass the actual ndarray using the .values attribute of the Series.

Prior to 0.13.0

apply_integrate_f(df['a'], df['b'], df['N'])
Use .values to get the underlying ndarray

apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

In [29]:
%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations


660 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [31]:
%prun apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

 

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations


## add bound checking, wraparound

In [35]:
%%cython
cimport cython
cimport numpy as np
import numpy as np
cdef double f_typed(double x) except? -2:
    return x * (x - 1)
cpdef double integrate_f_typed(double a, double b, int N):
    cdef int i
    cdef double s, dx
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f_typed(a + i * dx)
    return s * dx

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[double] apply_integrate_f2(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
    cdef Py_ssize_t i, n = len(col_N)
    assert (len(col_a) == len(col_b) == n)
    cdef np.ndarray[double] res = np.empty(n)
    for i in range(len(col_a)):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res


In [36]:
%timeit apply_integrate_f2(df['a'].values, df['b'].values, df['N'].values)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations


658 µs ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
