# Profiling, Cython, and Numba 🚀
### Zbyszek & Jakob
### ASPP 2022, Bilbao, Spain

## Outline

* Introduction
* Profiling
* Speed up Python code using Cython
 * Basic principles
 * Interacting with NumPy arrays
 * Release the GIL and parallelize easily
 * Wrap C/C++ code
* Using Numba to speed up Python code

## Introduction

* Sometimes, it seems like the execution speed of some script is *the* thing which keeps you from your next scientific breakthrough
* Both Cython and Numba are tools to make your code faster -> "optimization"
* So when should you optimize your code?

## The three rules of optimization
(adapted from Sebastian Witowski, EuroPython 2016)

#### 1. Don't.
 * Optimization comes with costs.
 * Likely you don't need it.
 * Invest in better hardware.

#### 2. Don't yet.
 * Is your code finished?
 * Did you write tests?
 * Are you sure it's worth the investment?

#### 3. Profile
* Don't guess which part of your code you should optimize!
* Measure. Measure. Measure.

In [38]:
def integrate_f(f, a, b, n):
    dx = (b - a) / n
    dx2 = dx / 2
    s = f(a) * dx2
    for i in range(1, n):
        s += f(a + i * dx) * dx
    s += f(b) * dx2
    return s

In [64]:
f = lambda x: x ** 4 - 3 * x

In [65]:
def integrate_f0(f, a, b, n):
    s = 0.0
    for i in range(n):
        dx = (b - a) / n
        x = a + (i + 0.5) * dx
        s += f(x) * dx
    return s

In [66]:
def integrate_f1(f, a, b, n):
    dx = (b - a) / n
    s = 0.0
    for i in range(n):
        x = a + (i + 0.5) * dx
        s += f(x) * dx
    return s

In [67]:
def integrate_f2(f, a, b, n):
    dx = (b - a) / n
    x = a + 0.5 * dx
    s = 0.0
    for i in range(n):
        s += f(x) * dx
        x += dx
    return s

In [68]:
def integrate_f3(f, a, b, n):
    dx = (b - a) / n
    x = a + 0.5 * dx
    s = 0.0
    for i in range(n):
        s += f(x)
        x += dx
    return s * dx

In [76]:
print(integrate_f0(f, 0, 1, 1000))
print(integrate_f1(f, 0, 1, 1000))
print(integrate_f2(f, 0, 1, 1000))
print(integrate_f3(f, 0, 1, 1000))

-1.300000166666638
-1.300000166666638
-1.3000001666666385
-1.3000001666666394


In [77]:
%timeit integrate_f0(f, 0, 1, 1000)

326 µs ± 42.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [78]:
%timeit integrate_f1(f, 0, 1, 1000)

275 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [79]:
%timeit integrate_f2(f, 0, 1, 1000)

221 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [80]:
%timeit integrate_f3(f, 0, 1, 1000)

223 µs ± 18.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
