![Is it a bird? Is it a plane? Accelerating Python with numba](img/cover.png)

# Outline

1. "Python is slow"
2. What is numba?
3. Some examples
4. Limitations and workarounds
5. Conclusions

# Who is this guy?

* **Aerospace Engineer** with a passion for orbits 🛰
* Chair of the **Python España** non profit and co-organizer of **PyCon Spain** 🐍
* **Software Developer** at **Satellogic** 🌍
* Free Software advocate and Python enthusiast 🕮
* Hard Rock lover 🎸

Follow me! https://github.com/Juanlu001/

![Me!](img/juanlu_esa.jpg)

# "Python is slow"

## Dynamic and interpreted (rather than static and compiled)

![Four nested loops](img/loops.png)

_(From https://gist.github.com/Juanlu001/cf19b1c16caf618860fb_)

## Data structures

![array vs list](img/array_vs_list.png)

_(From https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/_)

## Introspection

!["Why is checking isinstance(something, Mapping) so slow?"](img/isinstance.png)

_(From https://stackoverflow.com/q/42378726/554319)_

![numba 0.1 is released](img/tweet-travis.png)

https://twitter.com/teoliphant/status/235789560678858752

# What is numba?

![numba](img/numba.png)

> Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

* Latest version (at the time of writing) 0.44, 16 days ago
* Documentation https://numba.pydata.org/numba-doc/latest/index.html
* BSD-2 License
* Easy to install:

```
$ pip install numba
$ conda install numba [--channel conda-forge]
```

# Caveats and limitations

## The "nopython mode" is the only way

* Two modes: "object mode" and "nopython mode", only the latter is truly optimized
* Functions JITted in nopython mode can only call other functions in nopython mode
* _Avoid "object mode"!_ In the process of being deprecated, in numba 0.44 raises warnings

![It's nopython all the way down](img/nopython.jpg)

In [15]:
@jit
def range10():
    l = []
    for x in range(10):
        l.append(x)
    return l

range10()

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [17]:
@jit
def reversed_range10():
    l = []
    for x in range(10):
        l.append(x)

    return reversed(l)  # innocuous change, but no reversed support in nopython mode

reversed_range10()

Compilation is falling back to object mode WITH looplifting enabled because Function "reversed_range10" failed type inference due to: Untyped global name 'reversed': cannot determine Numba type of <class 'type'>

File "<ipython-input-17-2e1fa93510fc>", line 7:
def reversed_range10():
    <source elided>

    return reversed(l)  # innocuous change, but no reversed support in nopython mode
    ^

  @jit
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "reversed_range10" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>

File "<ipython-input-17-2e1fa93510fc>", line 4:
def reversed_range10():
    <source elided>
    l = []
    for x in range(10):
    ^

  @jit

File "<ipython-input-17-2e1fa93510fc>", line 2:
@jit
def reversed_range10():
^

  self.func_ir.loc))
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more infor

<list_reverseiterator at 0x7f668f6f0cc0>

In [22]:
@jit(nopython=True)
def reversed_range10():
    l = []
    for x in range(10):
        l.append(x)

    return l[::-1]  # innocuous change, but no reversed support in nopython mode

reversed_range10()

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [23]:
from numba import njit

@njit
def reversed_range10():
    l = []
    for x in range(10):
        l.append(x)

    return l[::-1]  # innocuous change, but no reversed support in nopython mode

reversed_range10()

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

## Passing functions as arguments is _slow_

* Since numba 0.38 the user can pass JITted functions as arguments, but it's even slower than not JITting them https://github.com/numba/numba/issues/2952
* Arguably the most important blocker to write reusable numba code

In [25]:
@njit
def func(x):
    return x**3 - 1

@njit
def fprime(x):
    return 3 * x**2

In [26]:
@njit
def njit_newton(func, x0, fprime):
    for _ in range(50):
        fder = fprime(x0)
        fval = func(x0)
        newton_step = fval / fder
        x = x0 - newton_step
        if abs(x - x0) < 1.48e-8:
            return x
        x0 = x

In [29]:
%timeit njit_newton(func, 1.5, fprime)
%timeit njit_newton.py_func(func, 1.5, fprime=fprime)

18.7 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.62 µs ± 80.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


With a smart combination of closures and caching we can implement a workaround:

In [52]:
@lru_cache()
def newton_generator(func, fprime):
    @njit
    def njit_newton_final(x0):
        for _ in range(50):
            fder = fprime(x0)
            fval = func(x0)
            newton_step = fval / fder
            x = x0 - newton_step
            if abs(x - x0) < 1.48e-8:
                return x
            x0 = x

    return njit_newton_final

In [53]:
%timeit -n1 -r1 newton_generator(func, fprime)

352 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [54]:
%timeit -n1 -r1 newton_generator(func, fprime)

2.99 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [55]:
newton_func = newton_generator(func, fprime)
%timeit -n1 -r1 newton_func(1.5)

79.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [56]:
%timeit newton_func(1.5)

241 ns ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [57]:
%timeit newton_generator(func, fprime)

124 ns ± 5.77 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [60]:
def newton(func, x0, fprime):
    return newton_generator(func, fprime)(x0)

%timeit newton(func, 1.5, fprime)

445 ns ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## NumPy arrays and nothing else

![Two layers](img/two-layers.png)

High level API:

* Supports complex data structures (e.g. `astropy.units` or `pint`, NumPy extensions for physical units) 
* Convert the code to normalized, simple structure that numba understands

Dangerous™ algorithms:

* Fast (easy to accelerate with `numba.njit`)
* Only cares about numbers, makes assumptions

# Conclusions

# Thank you!

* https://github.com/Juanlu001/talk-numba
* <hello@juanlu.space>

# Backup slides

## Comparison of solutions

| Project | Pros | Cons |
|--------|-----------------------------------------|-----------------------------------------------------------------------------------------------|
| NumPy | Powerful, omnipresent  | Vectorized code is sometimes difficult to read<sup>1</sup>, if you can't vectorize you are out of luck |
| Cython | Gradual, effective, widely used, mature | Tricky if you don't know any C, couldn't make the native debugger work<sup>2</sup> |
| PyPy | General purpose | C extensions still very slow, no wheels on PyPI |
| Numba | Simplest, very effective | Only numerical code, needs special care |

And many others: Pythran, Nuitka, mypyc...

<sup>1</sup>Check out "Integration with the vernacular", by James Powell https://pyvideo.org/pydata-london-2015/integration-with-the-vernacular.html

<sup>2</sup>https://github.com/cython/cython/issues/2699

<sup>3</sup>See https://github.com/antocuni/pypy-wheels for a half-baked effort. Perhaps the future will be brighter with the new manylinux2010 specification? https://bitbucket.org/pypy/pypy/issues/2617/pypy-binary-is-linked-to-too-much-stuff, https://github.com/pypa/manylinux/issues/179