# Unlocking the power of numpy

"Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today."

"NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level."



### Ok, lets get started

I'm going to assume a bit of familiarity with numpy; specifically that you have used arrays, and figured out some of their properties, but don't really have a deep grasp of the underlying principles.

In [None]:
import numpy as np

## Vectorization

"Vectorization" refers setting computations so that they can operate in parallel on many elements of a large array.
This allows many speedups both in interpreting / compilling the code, as well as in exectuting it.  \

To demonstrate, here is a loop that someone new to python might write, if asked to generate an array that contains the cumulative sum of the first N integers.

We are going to use the notebook built-in "magic" command "%timeit" to run and time several iterations of each version of the function.

In [22]:
def cumul_sum(N):
    out_vals = []
    running_sum = 0
    for i in range(N):
        running_sum += i
        out_vals.append(running_sum)
    return np.array(out_vals)

In [23]:
N = 100000
print("Version 1, using the function above with the loop")
%timeit v = cumul_sum(N)
print("Version 2, using a numpy function to do the cumulative sum")
a = np.arange(N)
%timeit v = np.cumsum(a)

Version 1, using the function above with the loop
14.9 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Version 2, using a numpy function to do the cumulative sum
153 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


So, for that particular case the "vectorized" version is about 10x faster.  

### Your intuition about processing time is probably wrong.

I have been repeatedly surprised by the relative amount of time that it take to do things different ways in python.
Some things that I thought would be very slow and actually reasonable quick, some things that seem like they should relatively quick are actually very slow.  

Pretty much the only constants seem to be:
  1. If you use the right numpy function it will be quick
  2. If you do anything else it will be slower, possibly much slower
  
To illustrate this sort of variation I wrote several version of functions to do 4 simple operarions:
  1. Summing all the integers up to N
  2. Filling an array with the cumulative sum of all the integers up to N
  3. Filling an array with the squares of all the integers up to N
  4. Matrix multiplication

In [21]:
from xipe.funcs_to_profile import matmul_v0, matmul_v6

<function sum_v0 at 0x7ffbb58b2b80>(1000000) = 499999500000, dt=2.8610e-06, factor=1.0
<function sum_v1 at 0x7ffbb6f14f70>(1000000) = 499999500000, dt=5.3041e-03, factor=1853.9
<function sum_v2 at 0x7ffb8fadd040>(1000000) = 499999500000, dt=1.3193e-01, factor=46113.2
<function sum_v3 at 0x7ffb8fadd0d0>(1000000) = 499999500000, dt=2.0415e-01, factor=71356.3
<function sum_v4 at 0x7ffb8fa1a8b0>(1000000) = 499999500000, dt=4.9607e-02, factor=17338.9
<function cumul_sum_v0 at 0x7ffb8fa1a820>(1000000) = 499999500000, dt=7.6571e-03, factor=1.0
<function cumul_sum_v1 at 0x7ffb8f030ee0>(1000000) = 499999500000, dt=1.0444e-02, factor=1.4
<function cumul_sum_v2 at 0x7ffb8fadc160>(1000000) = 499999500000, dt=2.7365e-01, factor=35.7
<function cumul_sum_v3 at 0x7ffb8fadc280>(1000000) = 499999500000, dt=1.2948e-01, factor=16.9
<function cumul_sum_v4 at 0x7ffb8fadc0d0>(1000000) = 499999500000, dt=4.4804e-01, factor=58.5
<function cumul_sum_v5 at 0x7ffb8fadc310>(1000000) = 499999500000, dt=1.9642e-01, 

It was interesting to see some discussion of the #software-dev channel about the effect of the loop-ordering in the matrix-multiplication. 

Loop ordering was a big deal to people my age when we were writing FORTRAN, c and c++ code back in grad school. 
Although loop-ordering does have an effect at the 10-50% level, simply using loops at all, instead of the built-in numpy functions _has already slowed the code down by a factor of between 10x and 1000x_ in each of these example cases.

### The single most useful thing you can do to improve your numpy experience

Just have a look at the available functions in numpy.  There are a lot.  There is a very good change that the one you need is there on the list somewhere.  You will be much better off using it.

In [202]:
dir(np)

['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'MachAr',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__dir__',
 '__doc__',
 '__file__',
 '__getattr__',
 '__git_revision__',
 '__loader__',
 '__mkl_version__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',
 '_globals',
 '_mat',
 '_pytesttester',
 'abs',
 'absol

Arithmatic operations on numpy arrays are all vectorized.

In [165]:
def square(vect):
    return np.array([val*val for val in vect])
vect = np.linspace(0, 1, 10001)
print("Time using arithmatic operation")
%time v2 = vect*vect
print("Time using function")
%time v2 = square(vect)

Time using arithmatic operation
CPU times: user 110 µs, sys: 52 µs, total: 162 µs
Wall time: 128 µs
Time using function
CPU times: user 2.44 ms, sys: 399 µs, total: 2.84 ms
Wall time: 2.85 ms


## Indexing

In short, numpy gives you a very flexible array indexing syntax that allows you to do many very clever things relatively easily.  

Once you learn the syntax.

Before that you will probably feel like you are randomly trying things until you hit on the one that works.  That's fine, we've all been there.

Some key points:

    1. The syntax for array indexing along one axis is [start:stop:step]
    2. The syntax for indexing along multiple axes is to use commas to seperate the axes, e.g., [i,j,k]
    3. Numpy tries to to be efficient by making arrays "views" into a block of memory, rather than recopying the memory each time you change the indexing.



In [113]:
a_vect = np.arange(360)

In [200]:
def print_array_info(an_array):
    base = an_array.base
    if base is None:
        base_shape = None
    else:
        base_shape = "array%s" % str(base.shape)
    print("Array of %s: n=%i, nb=%i, shape=%s, strides=%s -> %s" % (an_array.dtype, an_array.size, an_array.nbytes,
                                                                        str(an_array.shape), str(an_array.strides),
                                                                        str(base_shape)))

In [203]:
print_array_info(a_vect)

Array of int64: n=360, nb=2880, shape=(360,), strides=(8,) -> None


In [204]:
print_array_info(a_vect)
v = np.expand_dims(a_vect, 0)
print_array_info(v)
v2 = np.expand_dims(a_vect, -1)
print_array_info(v2)
v3 = a_vect.reshape(12,5,6)
print_array_info(v3)
print_array_info(v3[0])
print_array_info(v3[:,:,0])
print_array_info(v3[:,:,0:3])
print_array_info(v3[:,:,None,:])
print_array_info(v3[:,:,np.newaxis,:])

Array of int64: n=360, nb=2880, shape=(360,), strides=(8,) -> None
Array of int64: n=360, nb=2880, shape=(1, 360), strides=(2880, 8) -> array(360,)
Array of int64: n=360, nb=2880, shape=(360, 1), strides=(8, 8) -> array(360,)
Array of int64: n=360, nb=2880, shape=(12, 5, 6), strides=(240, 48, 8) -> array(360,)
Array of int64: n=30, nb=240, shape=(5, 6), strides=(48, 8) -> array(360,)
Array of int64: n=60, nb=480, shape=(12, 5), strides=(240, 48) -> array(360,)
Array of int64: n=180, nb=1440, shape=(12, 5, 3), strides=(240, 48, 8) -> array(360,)
Array of int64: n=360, nb=2880, shape=(12, 5, 1, 6), strides=(240, 48, 0, 8) -> array(360,)
Array of int64: n=360, nb=2880, shape=(12, 5, 1, 6), strides=(240, 48, 0, 8) -> array(360,)


#### Advanced indexing

In [107]:
print(v3[(1,2,3)])    # Gets element 1,2,3
print(v3[1,2,3])      # Gets element 1,2,3,
print(v3[(1,2,3),])   # Gets sub-arrays 1,2,3 from axis 0

45
45
[[[ 30  31  32  33  34  35]
  [ 36  37  38  39  40  41]
  [ 42  43  44  45  46  47]
  [ 48  49  50  51  52  53]
  [ 54  55  56  57  58  59]]

 [[ 60  61  62  63  64  65]
  [ 66  67  68  69  70  71]
  [ 72  73  74  75  76  77]
  [ 78  79  80  81  82  83]
  [ 84  85  86  87  88  89]]

 [[ 90  91  92  93  94  95]
  [ 96  97  98  99 100 101]
  [102 103 104 105 106 107]
  [108 109 110 111 112 113]
  [114 115 116 117 118 119]]]


#### Indexing using a sequence of integers

In [142]:
idx = [1,3,34,21,113]
print(a_vect[idx])
idx = list((1,3,34,21,113))
print(a_vect[idx])

[  1   3  34  21 113]
[  1   3  34  21 113]


In [143]:
idx = (1,3,34,21,113)
print(a_vect[idx])

IndexError: too many indices for array

In [144]:
print(a_vect[idx,])

[  1   3  34  21 113]


#### Indexing using a mask

In [149]:
short_vect = a_vect[idx,]
mask = [False, True, False, True, True]
short_vect[mask]

array([  3,  21, 113])

In [156]:
randoms = np.random.uniform(size=(1000))
print(randoms.min(),randoms.max())
mask = randoms > 0.5
masked_randoms = randoms[mask]
print(masked_randoms.shape, masked_randoms.min(), masked_randoms.max())

7.758478230091015e-06 0.9959555852393748
464 0.5018564219894138 0.9959555852393748


In [160]:
rand3d = randoms.reshape((10,10,10))
mask3d = mask.reshape((10,10,10))
masked_randoms = rand3d[mask3d]
print(masked_randoms.shape, masked_randoms.min(), masked_randoms.max())

(464,) 0.5018564219894138 0.9959555852393748


In [164]:
mask3d[:,1,:]

(10, 10)

In [147]:
print(v3[(3,1,2),(3,4,3),(1,2,3)])

[109  56  81]


## Array Broadcasting

### broadcasting
Broadcasting is a way of performing operations on numpy arrays of different shapes. 

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

* they are equal, or
* one of them is 1

If these conditions are not met, an ```ValueError```  is thrown

In [183]:
values = np.ones((3,4))

In [184]:
print("Original array", values.shape)
print("Array * scalar", (values * np.ones((1))).shape)
print("Array * array(1,4)", (values * np.ones((1,4))).shape)
print("Array * array(3,4)", (values * np.ones((3,4))).shape)
print("Array * array(1,1,4)", (values * np.ones((1,1,4))).shape)
print("Array * array(4,1,1)", (values * np.ones((4,1,1))).shape)


Original array (3, 4)
Array * scalar (3, 4)
Array * array(1,4) (3, 4)
Array * array(3,4) (3, 4)
Array * array(1,1,4) (1, 3, 4)
Array * array(4,1,1) (4, 3, 4)


In [182]:
np.ones((2,1,1)) + np.ones((3)) + np.ones((1,2,1)) 

array([[[3., 3., 3.],
        [3., 3., 3.]],

       [[3., 3., 3.],
        [3., 3., 3.]]])

In [185]:
np.ones((3,4)) * np.ones((4,3))

ValueError: operands could not be broadcast together with shapes (3,4) (4,3) 