# Profiling and Optimizing NumPy in 10 Minutes

## David Wagner

1. Don't Guess
2. Don't Loop
3. Don't Copy
4. Don't Compile!
5. Don't JIT?
6. Don't Even Try

In [None]:
from __future__ import print_function

def page_printer(data, start=0, screen_lines=0, pager_cmd=None):
    if isinstance(data, dict):
        data = data['text/plain']
    print(data)

import IPython.core.page
IPython.core.page.page = page_printer

In [None]:
import warnings
warnings.filterwarnings('ignore')

# Don't Guess

# Don't Guess - Measure 📏 

In [16]:
import numpy as np
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [17]:
x = np.random.randint(1000, size=(1000, 1000), dtype=np.int64)
y = np.random.randint(1000, size=(1000, 1000), dtype=np.int64)

def maybe_slow(x, y):
    add = x + y
    mult = x * y
    exp = x ** y
    return add + mult + exp

In [19]:
%lprun -u 0.001 -f maybe_slow maybe_slow(x, y)

Timer unit: 0.001 s

Total time: 0.030354 s
File: <ipython-input-17-854fa816945a>
Function: maybe_slow at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
     4                                           def maybe_slow(x, y):
     5         1          4.6      4.6     15.2      add = x + y
     6         1          3.4      3.4     11.1      mult = x * y
     7         1         16.9     16.9     55.8      exp = x ** y
     8         1          5.4      5.4     17.9      return add + mult + exp


### snakeviz
![SnakeViz](https://data-profiler.readthedocs.io/en/latest/_images/profiling.png)

# Don't Loop

# Don't Loop - Vectorize!

In [20]:
def so_loopy_slow(x, y):
    rows, cols = x.shape
    for i in range(rows):
        for j in range(cols):
            x[i, j] += y
    return x

In [21]:
%timeit so_loopy_slow(x, 5)

594 ms ± 87.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [22]:
%timeit x + 5

2.77 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Don't Copy

In [23]:
def in_place_ops(x, y):
    new_array = x + y # Copy
    x += y # In-place
    x = x.flatten() # Copy
    x = x.ravel() # In-place
    return x

In [24]:
%lprun -u 0.001 -f in_place_ops [in_place_ops(x, y) for _ in range(1000)]

Timer unit: 0.001 s

Total time: 14.1561 s
File: <ipython-input-23-2568efd48b16>
Function: in_place_ops at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def in_place_ops(x, y):
     2      1000       3658.9      3.7     25.8      new_array = x + y # Copy
     3      1000       2949.3      2.9     20.8      x += y # In-place
     4      1000       7516.9      7.5     53.1      x = x.flatten() # Copy
     5      1000         28.7      0.0      0.2      x = x.ravel() # In-place
     6      1000          2.3      0.0      0.0      return x


# Don't  Compile, JIT!

In [25]:
import numba
from numba import jit
import numexpr as ne

In [29]:
def maths_py(a, b):
    _ = [x * y for x, y in zip(a, b)]
    _ = [x / y for x, y in zip(a, b)]
    _ = [x + y for x, y in zip(a, b)]
    _ = [x - y for x, y in zip(a, b)]
    _ = [x % y for x, y in zip(a, b)]

def maths_numpy(a, b):
    a *= b
    x1 = a / b
    a += b
    a -= b
    a %= b
    
def maths_numexpr(a, b):
    ne.evaluate('a * b', out=a)
    x1 = ne.evaluate('a / b')
    ne.evaluate('a + b', out=a)
    ne.evaluate('a - b', out=a)
    ne.evaluate('a % b', out=a)

@jit(nopython=True, cache=True, fastmath=True, parallel=True)
def maths_numba(a, b):
    a *= b
    x1 = a / b
    a += b
    a -= b
    a %= b

In [31]:
def all_(a, b):
    x = maths_py(a, b)
    x = maths_numpy(a, b)
    x = maths_numexpr(a, b)
    x = maths_numba(a, b)
    return x

In [32]:
%lprun -u .001 -f all_ [all_(x, y) for _ in range(10)]

Timer unit: 0.001 s

Total time: 0.846089 s
File: <ipython-input-31-3ca3a639746a>
Function: all_ at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def all_(a, b):
     2        10        490.7     49.1     58.0      x = maths_py(a, b)
     3        10        227.9     22.8     26.9      x = maths_numpy(a, b)
     4        10        106.7     10.7     12.6      x = maths_numexpr(a, b)
     5        10         20.7      2.1      2.4      x = maths_numba(a, b)
     6        10          0.0      0.0      0.0      return x


# Don't JIT, Compile?

https://github.com/pybind/pybind11

![PyBind](https://3.bp.blogspot.com/-HGuANPWdKJg/W90CXetle9I/AAAAAAAA5uA/Ie2dNb_pz_s52YUGCOgMqHWKgUo0heXewCLcBGAs/s1600/%25E8%259E%25A2%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7%2B2018-11-03%2B%25E4%25B8%258A%25E5%258D%258810.04.03.png)

# Don't Even Try

https://speakerdeck.com/pycon2018/jake-vanderplas-performance-python-seven-strategies-for-optimizing-your-numerical-code?slide=102

The Python community writes some great packages, make sure you are not re-inventing the wheel here!