# Efficient NumPy

In [7]:
import numpy as np

## Best practices

### Avoid loops

Python loops are costly, and were never designed to be efficient. It is significantly more efficient to use Numpy (or Pytorch) vectorized operations

In [8]:
def square_loop(a):
    """Calculate square of an array in loop. We assume 1D array here."""

    result = np.zeros_like(a)
    for i in range(a.shape[0]):
        result[i] = a[i]*a[i]
    return result
    #Here, we are iterating inside the array itself, which will be inneficient

In [9]:
large_arr = np.random.randint(100, size=(100000,))

In [10]:
%timeit -n 10 -r 3 square_loop(large_arr)

59.9 ms ± 7.51 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


In [11]:
%timeit -n 10 -r 3 np.square(large_arr)

119 µs ± 31.4 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


We see that the np.square is almost three times as fast in just a small amount of usages, rather than loops

### Use broadcasting

Broadcasting mechanism provides an extremely efficient way of handling operations on arrays of different dimensionality. And it's always way more readable and concise. For example, to add `1D` array `b` to `2D` array `a` row-wise with a loop:

In [12]:
def row_loop(a, b): #a is 2-d, b is 1-d
    """Add a vector to a matrix directly."""

    result = np.zeros_like(a)
    for i in range(a.shape[0]):
        result[i] = a[i] + b
    return result

In [13]:
large_arr = np.random.randint(100, size=(1000,1000))
large_b = np.random.randint(100, size=(1000,))

In [14]:
%timeit -n 10 -r 3 row_loop(large_arr, large_b)

9.37 ms ± 1.8 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


Broadcasting is about `2X` faster:

In [15]:
%timeit -n 10 -r 3 large_arr + large_b

3.88 ms ± 846 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)


In-place addition with broadcasting is even faster:

In [None]:
%timeit -n 10 -r 3 np.add(large_arr, large_b, out=large_arr) #As there is no new allocated memory, it is faster

Btw, broadcasting allows for creating fancy structures in just a single line (you may leverage this in one of the problems in Homework #2):

In [19]:
np.arange(10) + np.arange(10)[:, np.newaxis] # (10,) + (10,1) -> (1,10) + (10,1)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       [ 7,  8,  9, 10, 11, 12, 13, 14, 15, 16],
       [ 8,  9, 10, 11, 12, 13, 14, 15, 16, 17],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]])

In [None]:
np.arange(10) + np.expand_dims(np.arange(10), axis=-1) #This is the same as above, above is more clear though

### Beware!

In-place operations are prone to bugs due to incorrect shape of the result container:

In [20]:
A = np.random.randint(10, size=(10,10)) #2-D
B = np.random.randint(10, size=(10,)) #1-D

In [21]:
A

array([[7, 8, 3, 2, 1, 8, 3, 6, 9, 0],
       [8, 2, 1, 2, 5, 5, 5, 6, 0, 6],
       [8, 0, 4, 8, 1, 3, 3, 6, 6, 7],
       [3, 0, 6, 7, 2, 4, 3, 5, 6, 9],
       [8, 7, 3, 4, 1, 7, 1, 5, 6, 5],
       [1, 8, 3, 1, 6, 3, 2, 0, 8, 8],
       [5, 7, 9, 6, 1, 9, 4, 4, 4, 2],
       [8, 6, 0, 7, 3, 9, 3, 7, 4, 7],
       [7, 6, 8, 8, 1, 5, 8, 7, 3, 9],
       [9, 4, 6, 7, 3, 7, 4, 8, 2, 1]])

In [22]:
B

array([8, 2, 0, 5, 7, 7, 0, 6, 4, 9])

In [23]:
A+B

array([[15, 10,  3,  7,  8, 15,  3, 12, 13,  9],
       [16,  4,  1,  7, 12, 12,  5, 12,  4, 15],
       [16,  2,  4, 13,  8, 10,  3, 12, 10, 16],
       [11,  2,  6, 12,  9, 11,  3, 11, 10, 18],
       [16,  9,  3,  9,  8, 14,  1, 11, 10, 14],
       [ 9, 10,  3,  6, 13, 10,  2,  6, 12, 17],
       [13,  9,  9, 11,  8, 16,  4, 10,  8, 11],
       [16,  8,  0, 12, 10, 16,  3, 13,  8, 16],
       [15,  8,  8, 13,  8, 12,  8, 13,  7, 18],
       [17,  6,  6, 12, 10, 14,  4, 14,  6, 10]])

In [24]:
np.add(A, B)

array([[15, 10,  3,  7,  8, 15,  3, 12, 13,  9],
       [16,  4,  1,  7, 12, 12,  5, 12,  4, 15],
       [16,  2,  4, 13,  8, 10,  3, 12, 10, 16],
       [11,  2,  6, 12,  9, 11,  3, 11, 10, 18],
       [16,  9,  3,  9,  8, 14,  1, 11, 10, 14],
       [ 9, 10,  3,  6, 13, 10,  2,  6, 12, 17],
       [13,  9,  9, 11,  8, 16,  4, 10,  8, 11],
       [16,  8,  0, 12, 10, 16,  3, 13,  8, 16],
       [15,  8,  8, 13,  8, 12,  8, 13,  7, 18],
       [17,  6,  6, 12, 10, 14,  4, 14,  6, 10]])

This one will work:

In [25]:
np.multiply(A, B, out=A) #The result is still two dimensional, therefoe it will work

array([[56, 16,  0, 10,  7, 56,  0, 36, 36,  0],
       [64,  4,  0, 10, 35, 35,  0, 36,  0, 54],
       [64,  0,  0, 40,  7, 21,  0, 36, 24, 63],
       [24,  0,  0, 35, 14, 28,  0, 30, 24, 81],
       [64, 14,  0, 20,  7, 49,  0, 30, 24, 45],
       [ 8, 16,  0,  5, 42, 21,  0,  0, 32, 72],
       [40, 14,  0, 30,  7, 63,  0, 24, 16, 18],
       [64, 12,  0, 35, 21, 63,  0, 42, 16, 63],
       [56, 12,  0, 40,  7, 35,  0, 42, 12, 81],
       [72,  8,  0, 35, 21, 49,  0, 48,  8,  9]])

This one will not (although broadcasting mechanics is ok for addition):

In [26]:
np.add(A, B, out=B) #Here, B is smaller than the result of the operation, therefore the inplace will not work

ValueError: non-broadcastable output operand with shape (10,) doesn't match the broadcast shape (10,10)