# Efficient NumPy

In [1]:
import numpy as np

## Best practices

### Avoid loops

Python loops are costly:

In [2]:
def square_loop(a):
    """Calculate square of an array in loop. We assume 1D array here."""

    result = np.zeros_like(a)
    for i in range(a.shape[0]):
        result[i] = a[i]*a[i]
    return result

In [3]:
large_arr = np.random.randint(100, size=(100000,))

In [4]:
%timeit -n 10 -r 3 square_loop(large_arr)

35.4 ms ± 549 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


In [5]:
%timeit -n 10 -r 3 np.square(large_arr)

129 µs ± 44.1 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


### Use broadcasting

Broadcasting mechanism provides an extremely efficient way of handling operations on arrays of different dimensionality. And it's always way more readable and concise. For example, to add `1D` array `b` to `2D` array `a` row-wise with a loop:

In [6]:
def row_loop(a, b):
    """Add a vector to a matrix directly."""

    result = np.zeros_like(a)
    for i in range(a.shape[0]):
        result[i] = a[i] + b
    return result

In [7]:
large_arr = np.random.randint(100, size=(1000,1000))
large_b = np.random.randint(100, size=(1000,))

In [8]:
%timeit -n 10 -r 3 row_loop(large_arr, large_b)

6.36 ms ± 361 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


Broadcasting is about `2X` faster:

In [9]:
%timeit -n 10 -r 3 large_arr + large_b

1.51 ms ± 258 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


In-place addition with broadcasting is even faster:

In [10]:
%timeit -n 10 -r 3 np.add(large_arr, large_b, out=large_arr)

1.05 ms ± 205 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


Btw, broadcasting allows for creating fancy structures in just a single line (you may leverage this in one of the problems in Homework #2):

In [11]:
np.arange(10) + np.expand_dims(np.arange(10), axis=-1)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       [ 7,  8,  9, 10, 11, 12, 13, 14, 15, 16],
       [ 8,  9, 10, 11, 12, 13, 14, 15, 16, 17],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]])

### Beware!

In-place operations are prone to bugs due to incorrect shape of the result container:

In [12]:
A = np.random.randint(10, size=(10,10))
B = np.random.randint(10, size=(10,))

In [13]:
A

array([[8, 3, 1, 2, 7, 7, 8, 1, 7, 7],
       [7, 0, 2, 8, 0, 3, 1, 4, 3, 9],
       [8, 2, 4, 1, 5, 2, 1, 7, 6, 5],
       [9, 6, 2, 6, 7, 8, 5, 0, 8, 8],
       [9, 9, 7, 2, 7, 3, 6, 2, 3, 9],
       [5, 5, 0, 0, 8, 0, 6, 2, 0, 3],
       [3, 3, 2, 9, 3, 4, 6, 0, 1, 2],
       [2, 1, 5, 1, 0, 3, 1, 1, 6, 1],
       [0, 3, 3, 1, 8, 0, 0, 9, 6, 3],
       [9, 4, 0, 2, 1, 4, 0, 5, 5, 0]])

In [14]:
B

array([9, 6, 6, 2, 5, 9, 5, 7, 9, 1])

In [15]:
A+B

array([[17,  9,  7,  4, 12, 16, 13,  8, 16,  8],
       [16,  6,  8, 10,  5, 12,  6, 11, 12, 10],
       [17,  8, 10,  3, 10, 11,  6, 14, 15,  6],
       [18, 12,  8,  8, 12, 17, 10,  7, 17,  9],
       [18, 15, 13,  4, 12, 12, 11,  9, 12, 10],
       [14, 11,  6,  2, 13,  9, 11,  9,  9,  4],
       [12,  9,  8, 11,  8, 13, 11,  7, 10,  3],
       [11,  7, 11,  3,  5, 12,  6,  8, 15,  2],
       [ 9,  9,  9,  3, 13,  9,  5, 16, 15,  4],
       [18, 10,  6,  4,  6, 13,  5, 12, 14,  1]])

In [16]:
np.add(A, B)

array([[17,  9,  7,  4, 12, 16, 13,  8, 16,  8],
       [16,  6,  8, 10,  5, 12,  6, 11, 12, 10],
       [17,  8, 10,  3, 10, 11,  6, 14, 15,  6],
       [18, 12,  8,  8, 12, 17, 10,  7, 17,  9],
       [18, 15, 13,  4, 12, 12, 11,  9, 12, 10],
       [14, 11,  6,  2, 13,  9, 11,  9,  9,  4],
       [12,  9,  8, 11,  8, 13, 11,  7, 10,  3],
       [11,  7, 11,  3,  5, 12,  6,  8, 15,  2],
       [ 9,  9,  9,  3, 13,  9,  5, 16, 15,  4],
       [18, 10,  6,  4,  6, 13,  5, 12, 14,  1]])

This one will work:

In [17]:
np.multiply(A, B, out=A)

array([[72, 18,  6,  4, 35, 63, 40,  7, 63,  7],
       [63,  0, 12, 16,  0, 27,  5, 28, 27,  9],
       [72, 12, 24,  2, 25, 18,  5, 49, 54,  5],
       [81, 36, 12, 12, 35, 72, 25,  0, 72,  8],
       [81, 54, 42,  4, 35, 27, 30, 14, 27,  9],
       [45, 30,  0,  0, 40,  0, 30, 14,  0,  3],
       [27, 18, 12, 18, 15, 36, 30,  0,  9,  2],
       [18,  6, 30,  2,  0, 27,  5,  7, 54,  1],
       [ 0, 18, 18,  2, 40,  0,  0, 63, 54,  3],
       [81, 24,  0,  4,  5, 36,  0, 35, 45,  0]])

This one will not (although broadcasting mechanics is ok for addition):

In [18]:
np.add(A, B, out=B)

ValueError: non-broadcastable output operand with shape (10,) doesn't match the broadcast shape (10,10)