# Losing your Loops Fast Numberical Computing with NumPy

Youtube Link: https://youtu.be/EEUXKG97YRw

In [8]:
import numpy as np
a = np.array(a)

b = a + 5
print(b)

[6 8 7 9 8 6 9 7]


In [9]:
a = list(range(100000))
%timeit [val + 5 for val in a]

6.05 ms ± 292 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [10]:
a = np.array(a)
%timeit a + 5

49.6 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Numpy Aggregations are much faster than Python built-ins...

In [11]:
from random import random
c = [random() for i in range(100000)]

%timeit min(c)

1.31 ms ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
c = np.array(c)

%timeit c.min()

26.4 µs ± 268 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### NumPy aggregations also work on multi-dimensional arrays...

In [13]:
M = np.random.randint(0, 10, (3, 5))
M

array([[8, 2, 7, 7, 1],
       [9, 7, 7, 0, 9],
       [4, 6, 7, 6, 8]])

In [14]:
M.sum()

88

In [15]:
M.sum(axis=0)

array([21, 15, 21, 13, 18])

In [16]:
M.sum(axis=1)

array([25, 32, 31])

## Strategy 3
### Broadcasting is a set of rules by which ufuncs operate on arrays of different sizes and/or dimensions

Visualizing Broadcasting Checkout: https://youtu.be/EEUXKG97YRw?t=915

#### Broadcasting Rules
1. If array shapes differ, left-pad the smaller shape with 1s
2. If any dimension does not match, broadcast the dimension with size=1
3. If neither non-matching dimension is 1, raise an error.

Screenshots: https://jmp.sh/cJ9wIqT, https://jmp.sh/GMlWzbn


## Strategy 4
### Using NumPy's slicing, masking and fancy indexing

#### With Python Lists, indexing accepts integers or slices.

In [18]:
L = [2, 3, 5, 7, 11]
print(L[0]) # Integer Index
print(L[1:3]) # Slice for multiple elements

2
[3, 5]


#### NumPy arrays are similar...

In [19]:
L = np.array(L)
L

array([ 2,  3,  5,  7, 11])

In [20]:
L[0]

2

In [21]:
L[1:3]

array([3, 5])

#### ... but NumPy offers other fast and convenient indexing options as well

##### "Masking": indexing with boolean masts

In [22]:
L

array([ 2,  3,  5,  7, 11])

###### A mask is a boolean array:

In [24]:
mask = np.array([False, True, True, False, True])
L[mask]

array([ 3,  5, 11])

###### Masks are often constructed using comparison operators and boolean logic, e.g.:

In [26]:
mask = (L < 4) | (L > 8) # "|" ] "bitwise OR"
L[mask]

array([ 2,  3, 11])

###### Fancy indexing: Passing a list/array of indices

In [27]:
L

array([ 2,  3,  5,  7, 11])

In [28]:
ind = [0, 4, 2]
L[ind]

array([ 2, 11,  5])

###### Multiple Dimensions: Use comments to separate indices!

In [30]:
M = np.arange(6).reshape(2, 3)
M

array([[0, 1, 2],
       [3, 4, 5]])

In [31]:
M[0, 1]

1

In [32]:
M[:, 1]

array([1, 4])

In [36]:
# Masking the full array
M[abs(M - 3) < 2]

array([2, 3, 4])

###### Mixing fancy indexing and slicing...

In [37]:
M = np.arange(6).reshape(2, 3)
M

array([[0, 1, 2],
       [3, 4, 5]])

In [38]:
# mixing fancy indexing and slicing
M[[1, 0], :2]

array([[3, 4],
       [0, 1]])

###### Mixing Masking and slicing

In [39]:
M = np.arange(6).reshape(2, 3)
M

array([[0, 1, 2],
       [3, 4, 5]])

In [40]:
# mising masking and slicing
M[M.sum(axis=1) > 4, 1:]

array([[4, 5]])

#### All of these operations can be composed and combined in nearly limitless ways

In [41]:
# 1000 points in 3 dimensions
x = np.random.random((1000, 3))
x.shape

(1000, 3)

In [42]:
# Broadcasting to find pairwise differences
diff = x.reshape(1000, 1, 3) - x
diff.shape

(1000, 1000, 3)

In [43]:
# Aggregate to find pairwise distances
D = (diff ** 2).sum(2)
D.shape

(1000, 1000)

In [45]:
# Set diagonal of matrix to infinity to skip self-neigbors
i = np.arange(1000)
D[i, i] = np.inf

In [49]:
# Print the indices of the nearest neighbor
i = np.argmin(D, 1)
print(i[:10])

[283 565  29 814 972 293 991 476 994 519]


## Summary ...
- Writing Python is fast; loops can be slow
- NumPy pushes loops into its compiled layer:
    - fast development time of Python
    - fast execution time of compiled code

### Strategies
1. **ufuncs** for element-wise operations
2. **aggregations** for array summarization
3. **broadcasting** for combining arrays
4. **slicing**, **masking**, and **fancy indexing** for selecting and operating on subsets of arrays