# Unlocking the power of numpy

"Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today."

"NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level."



### Ok, lets get started

I'm going to assume a bit of familiarity with numpy; specifically that you have used arrays, and figured out some of their properties, but don't really have a deep grasp of the underlying principles.

In [None]:
import numpy as np

## Vectorization

"Vectorization" refers setting computations so that they can operate in parallel on many elements of a large array.
This allows many speedups both in interpreting / compilling the code, as well as in exectuting it. 

To demonstrate, here is a loop that someone new to python might write, if asked to generate an array that contains the cumulative sum of the first N integers.

We are going to use the notebook built-in "magic" command "%timeit" to run and time several iterations of each version of the function.

In [None]:
def cumul_sum(N):
    out_vals = []
    running_sum = 0
    for i in range(N):
        running_sum += i
        out_vals.append(running_sum)
    return np.array(out_vals)

In [None]:
N = 100000
print("Version 1, using the function above with the loop")
%timeit v = cumul_sum(N)
print("Version 2, using a numpy function to do the cumulative sum")
a = np.arange(N)
%timeit v = np.cumsum(a)

So, for that particular case the "vectorized" version is about 10x faster.  

### Your intuition about processing time is probably wrong.

I have been repeatedly surprised by the relative amount of time that it take to do things different ways in python.
Some things that I thought would be very slow and actually reasonable quick, some things that seem like they should relatively quick are actually very slow.  

Pretty much the only constants seem to be:
  1. If you use the right numpy function it will be quick
  2. If you do anything else it will be slower, possibly much slower
  
To illustrate this sort of variation I wrote several version of functions to do 4 simple operarions:
  1. Summing all the integers up to N
  2. Filling an array with the cumulative sum of all the integers up to N
  3. Filling an array with the squares of all the integers up to N
  4. Matrix multiplication

In [None]:
from xipe import funcs_to_profile

It was interesting to see some discussion of the #software-dev channel about the effect of the loop-ordering in the matrix-multiplication. 

Loop ordering was a big deal to people my age when we were writing FORTRAN, c and c++ code back in grad school. 
Although loop-ordering does have an effect at the 10-50% level, simply using loops at all, instead of the built-in numpy functions _has already slowed the code down by a factor of between 10x and 1000x_ in each of these example cases.

### The single most useful thing you can do to improve your numpy experience

Just have a look at the available functions in numpy.  There are a lot.  There is a very good change that the one you need is there on the list somewhere.  You will be much better off using it.

In [None]:
dir(np)

Arithmatic operations on numpy arrays are all vectorized.

In [None]:
def square(vect):
    return np.array([val*val for val in vect])
vect = np.linspace(0, 1, 10001)
print("Time using arithmatic operation")
%time v2 = vect*vect
print("Time using function")
%time v2 = square(vect)

## Indexing

In short, numpy gives you a very flexible array indexing syntax that allows you to do many very clever things relatively easily.  

Once you learn the syntax.

Before that you will probably feel like you are randomly trying things until you hit on the one that works.  That's fine, we've all been there.

Some key points:

    1. The syntax for array indexing along one axis is [start:stop:step]
    2. The syntax for indexing along multiple axes is to use commas to seperate the axes, e.g., [i,j,k]
    3. Numpy tries to to be efficient by making arrays "views" into a block of memory, rather than recopying the memory each time you change the indexing.



In [None]:
a_vect = np.arange(360)

In [None]:
def print_array_info(an_array):
    base = an_array.base
    if base is None:
        base_shape = None
    else:
        base_shape = "array%s" % str(base.shape)
    print("Array of %s: n=%i, nb=%i, shape=%s, strides=%s -> %s" % (an_array.dtype, an_array.size, an_array.nbytes,
                                                                        str(an_array.shape), str(an_array.strides),
                                                                        str(base_shape)))

In [None]:
print_array_info(a_vect)

In [None]:
print_array_info(a_vect)
v = np.expand_dims(a_vect, 0)
print_array_info(v)
v2 = np.expand_dims(a_vect, -1)
print_array_info(v2)
v3 = a_vect.reshape(12,5,6)
print_array_info(v3)
print_array_info(v3[0])
print_array_info(v3[:,:,0])
print_array_info(v3[:,:,0:3])
print_array_info(v3[:,:,None,:])
print_array_info(v3[:,:,np.newaxis,:])

#### Advanced indexing

The cell below shows that numpy has a couple of very different ways to deal with array indexing.   In numpy documentation they are refered to as 'basic indexing' and 'advanced indexing'.

  1. Basic indexing uses the [start:stop:step,start:stop:step] conventions.   
  2. Advanced indexing uses arrays of either:
     1. Arrays of integers to select particular elements by index
     2. Arrays of booleans that act as a mask to select particular elements

In [None]:
print(v3[(1,2,3)])    # Gets element 1,2,3
print(v3[1,2,3])      # Gets element 1,2,3,
print(v3[(1,2,3),])   # Gets sub-arrays 1,2,3 from axis 0

#### Advanced indexing using a sequence of integers

In [None]:
idx = [1,3,34,21,113]  # a list of integers
print(a_vect[idx])
idx = list((1,3,34,21,113))  # a tuple of integers converted to a list 
print(a_vect[idx])
idx = np.array([1, 3, 34, 21, 113]) # an array of integers
print(a_vect[idx])

In [None]:
idx = (1,3,34,21,113)  # A tuple of integers is treated as indices for the individual axes
print(a_vect[idx])

In [None]:
print(a_vect[idx,])  # A tuple of tuples does work

#### Advanced indexing using a mask

In [None]:
short_vect = a_vect[idx,]  # make a small array
mask = [False, True, False, True, True] # make a mask that is the same shape
short_vect[mask]

In [None]:
# You can also use numpy to easily create masks  
randoms = np.random.uniform(size=(1000))
print(randoms.min(),randoms.max())
mask = randoms > 0.5
masked_randoms = randoms[mask]
print(masked_randoms.shape, masked_randoms.min(), masked_randoms.max())

In [None]:
# Using masks will always result in flattened arrays
rand3d = randoms.reshape((10,10,10))
mask3d = mask.reshape((10,10,10))
masked_randoms = rand3d[mask3d]
print(masked_randoms.shape, masked_randoms.min(), masked_randoms.max())

## Array Broadcasting

### broadcasting
Broadcasting is a way of performing operations on numpy arrays of different shapes. 

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

* they are equal, or
* one of them is 1

If these conditions are not met, an ```ValueError```  is thrown

In [None]:
# Lets start with a small 2D array
values = np.ones((3,4))

In [None]:
print("Original array", values.shape)  # same shape
print("Array * scalar", (values * np.ones((1))).shape) # multiply by a scalar, same shape
print("Array * array(1,4)", (values * np.ones((1,4))).shape) # broadcast over first axes, same shape
print("Array * array(3,4)", (values * np.ones((3,4))).shape) # both arrays have the same shape, output has the same shape
print("Array * array(1,1,4)", (values * np.ones((1,1,4))).shape) # add a new axes, broadcast over next axes, iterate over elements of last axes, output shape will be 1,3,4
print("Array * array(4,1,1)", (values * np.ones((4,1,1))).shape) # add new axes, then broadcast over two original axes, output shape will be 4,3,4 


In [None]:
# Can you figure out what the resulting value will be?
np.ones((2,1,1)) + np.ones((3)) + np.ones((1,2,1)) 

In [None]:
# Array broadcast is NOT matrix multplication
np.ones((3,4)) * np.ones((4,3))

In [None]:
# You can use broadcasting and np.newaxis (which is actually just 'None') to do matrix multiplication,
# It is a useful exercise to understand broadcasting, but np.matmul is way faster.
m1 = np.ones((3,4))
m2 = np.ones((4,3))
m_prod_v0 = np.sum(m1[:,np.newaxis,:] * m2.T[np.newaxis,:,:], axis=2)
m_prod_v1 = np.matmul(m1,m2)
np.allclose(m_prod_v0, m_prod_v1)