# NumPy - Multidimensional data arrays

The NumPy library is used for all numerical data analytic computations in Python.
It is a package that provides high-performance vector, matrix and higher-dimensional
data structures for Python. It is implemented in C and Fortran, so when calculations
are vectorized (formulated with vectors and matrices), performance is very good. 

To use NumPy we need to start by importing the NumPy library.  You can
import the module namespace as usual

```python
import numpy
```

though by convention, scripts and notebooks that perform a lot of NumPy
computations will often import NumPy with an alias as follows

In [1]:
import numpy as np

We will follow this convention and import numpy using the alias `np` for use
in all of our notebooks for this class.

This notebook lecture is using materials from past versions of this course
(see the archive).  We have also based many of our new examples on
the
[Quickstart tutorial](https://numpy.org/doc/stable/user/quickstart.html)
provided by the official NumPy library maintainers website.

# The Basics

NumPy's main object is the homogeneous multidimensional array.  It is a table
of elements (usually numbers), all of the same type, indexed by a tuple
of non-negative integers. 

Because the main NumPy array is a homogeneous collection of elements, it
is more similar to plain arrays you may be familiar with from C or Java, than
the `List` type from the main Python language.  Homogeneous means
that all of the items in the array must be of the same type.  So typically
we will work with a homogeneous array of all floats or all int types for
various machine learning tasks.

 In NumPy dimensions are called **axes**. For example the coordinates of a point
 in 3D space `[1, 2, 1]` has one axis.  That axis has 3 elements in it, so we
 say it has a length of 3.  A table of values like this:
 
 ```python
 [[1., 2., 1.],
  [0., 1., 3.]]
```

is said to have 2 axes.  The first axis (the rows, and referred to numerically
as axis 0) has a length of 2 (e.g. there are 2 rows).  The second axis
(the columns, numerically referred to as axis 1) has a length of 3
(there are 3 columns).

The vast majority of the data we use in this class will be arranged in a table
like this with 2 axis, where each row holds a sample of data, and each column
represents an attribute of the sampled data.  For example, for an experiment
each row might hold the information for a participant or subject of the experiment,
while the columns might be attributes like the subject's age, weight, height,
and results like how they responed to the experiment.

Lots look at an example array in NumPy and some of its attributes.

In [2]:
# create an array with elements ranging from 0 to 14, and reshape into
# an array with 2 axis, with 3 rows in the axis 0 and 5 columns in the axis 1
a = np.arange(15).reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
a.ndim

In [None]:
a.shape

In [None]:
a.size

In [4]:
a.dtype

dtype('int64')

In [None]:
a.itemsize

In [None]:
a.data

In [3]:
type(a)

numpy.ndarray

In [12]:
l = [[0, 'string', 2, 3, 4, 5],
     [6, 7, 8, 9, True, 11]]
l

[[0, 'string', 2, 3, 4, 5], [6, 7, 8, 9, True, 11]]

list

NumPy's array class is called `ndarray` as the last cell shows.  An `ndarray`
is a general n-dimensional array.

In the previous example we demonstrated many of the basic attributes of
a NumPy `ndarray`

- **ndarray.ndim**: The number of axes (or dimensions) of the array.  Our array
  has 2 dimensions, axis 0 (rows) and axis 1 (columns)
- **ndarray.shape**: The length of each dimension of the array.  Since we have 3
  rows and 5 columns, axis 0 has a length of 3 and axis 1 has a length of 5.
- **ndarray.size**: The total number of elements of the array, has to be equal to
  the product of the shape (e.g. $3 \times 5 = 15$ here).
- **ndarray.dtype**: The type of elements in the array.  `ndarray`'s are **homogeneous**
  which means all elements are of the same type, the type describe here.  You
  will often see types like this `int64`, where `int` is the basic type and
  the number 64 indicates the number of bits being used for the data type
  representation.
- **ndarray.itemsize**: The size in bytes of each element of the array.  Since
  a byte uses 8 bits, a size of 8 bytes here equals 64 bits of memory for each of
  the 15 elements in the array.
- **ndarray.data**: The actually pointer to the buffer in memory holding the values
  of this array.  We don't normally access this directly.
  
Another quick example of an array, compare this one to the first one.

In [13]:
# set the display options so that our arrays are displayed as decimal notation
# rather than floating point notation in the rest of the notebook
np.set_printoptions(suppress=True)

In [14]:
# an array of random floating point numbers that range from -5 to 5
# the 3rd parameter, a tuple, gives the shape of the array to randomly create

# I think of this as axis 0 is the number of regular 2-d tables, so here we have
# 5 tables, where each table has 4 rows and 3 columns
b = np.random.uniform(-5.0, 5.0, (5, 4, 3))
b

array([[[ 4.91463478,  2.72014166,  2.20270517],
        [-2.40172018, -0.42262995, -0.04752124],
        [ 1.4126038 , -3.23217043, -0.9959877 ],
        [-1.20286033,  0.95707324, -3.20667402]],

       [[-1.97808616,  0.04099682, -4.7763905 ],
        [ 4.09483831,  4.14678955, -1.61166196],
        [-2.18955469,  2.2690551 ,  1.79952707],
        [-2.51534874,  3.88123434, -3.0941251 ]],

       [[ 4.22020354, -0.79768108,  2.95552203],
        [ 2.32351891, -1.29975097, -3.26606597],
        [ 1.86737578,  3.92134814, -0.78048388],
        [-0.57348928, -1.59772117,  1.93861645]],

       [[-1.77912517,  1.39773974,  0.7133414 ],
        [ 1.79129032,  4.44480709, -0.6605687 ],
        [-1.51176746,  0.0205056 , -0.8332939 ],
        [ 4.09586537, -4.44110834, -0.27024187]],

       [[-1.33939577,  3.2867668 , -3.72083489],
        [-1.85922831,  1.58190146, -2.2160052 ],
        [-4.67054374, -3.54378683,  0.25158285],
        [-4.42705511,  3.48176689,  2.96905678]]])

In [15]:
b.ndim

3

In [16]:
b.shape

(5, 4, 3)

In [17]:
b.size

60

In [18]:
b.dtype

dtype('float64')

In [19]:
b.itemsize

8

# Array creation

There are a number of ways to create a NumPy `ndarray`.  We saw two
array generation functions in the previous example, one creating
an array with a range of values, and another using the NumPy `random`
submodule to generate arrays of random values.

One basic way to create an array is by initializing it form a regular
Python list or tuple.

In [None]:
a = np.array([2, 3, 4])
print(a)
print(a.dtype)
print(a.shape)

In [None]:
b = np.array([[1.2, 3.5, 5.1], 
              [3.8, 4.6, 2.7]])
print(b)
print(b.dtype)
print(b.shape)

In [None]:
c = np.array(['gouda', 42, False, 3-2j])
print(c)
print(c.dtype)
print(c.shape)

Notice that when you give a nonhomogeneous list of values, NumPy will convert to
the most general type, which is this case is a string representation of each element
with a maximum length of 6 characters for each string.

We saw an example of array `b` where a sequence of lists is transformed into
an array with 2 axis.  This can be done for 3 or even higher numbers of axis, though
of course it gets complicated to construct such a list by hand in python.

The type of the array can be explicitly specified at creation time.

In [None]:
d = np.array( [[1, 2], [3, 4]], dtype=complex)
print(d)
print(d.dtype)
print(d.shape)

## Array generation functions

We saw some array generating functions above.  The `arange()` function
is analogous to the `range()` built-in funciton, it creates arrays with
1 axis with a range of values.

In [None]:
# array of values from 0 up to by not including 12
np.arange(12)

In [None]:
# range of event values from 10 to 50 
np.arange(10, 51, 2)

In [None]:
# stepping backwards
np.arange(100, -1, -5)

In [27]:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(x)
print(x.shape)

y = x.reshape((4,2))
print(y.shape)
print(y)

[1 2 3 4 5 6 7 8]
(8,)
(4, 2)
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


Another function for generating sequences that might be more useful in this
class is `linspace()`.  This function can be used to generate an array of
a particular size with points linearly spaces over some range.  For example

In [48]:
x = np.array([1, 2, 3, 4, 5])

y = x**2 + 3 * x - 5

print(x**2)
print(3*x)

print(x**2 + 3*x)
print(x**2 + 3*x - 5)
print(y)

[ 1  4  9 16 25]
[ 3  6  9 12 15]
[ 4 10 18 28 40]
[-1  5 13 23 35]
[-1  5 13 23 35]


In [49]:
mask = (y > 5) & (y <= 23)
print(mask)

[False False  True  True False]


In [50]:
y[mask] = y[mask]**2 + 5 * y[mask]
print(y)

[ -1   5 234 644  35]


In [45]:
indexes = [1, 4]
y[indexes]

array([ 5, 35])

In [46]:
i = 3+2j
print(i)
print(type(i))

(3+2j)
<class 'complex'>


In [None]:
# 10 point linearly spaces from 0.0 to 1.0
# notice that this funciton is inclusive of the endpoints, the first and last points
# will begin and end at the stated begin and end of the range
np.linspace(0.0, 1.0, 10)

In [None]:
# 100 points linearly spaces from -10 to 10
np.linspace(-10.0, 10.0, 100)

The  `np.random.uniform()` function we used before shows a common pattern for many other
array generating functions.  Functions in NumPy's random library can be used
to generate arrays of random numbers with different distributions (uniform,
normal, gaussian distribution, etc).  The 3rd parameter is a tuple indicating
the shape of the array to be generated.

Often we know what shape of array we need and the type of the values that will
be in it, but we will generate the data later.  We can use functions
like `zeros()` or `empty()` to create empty arrays of a particular shape and type.

In [None]:
# array of zeros with 3 rows and 4 columns
a = np.zeros( (3, 4) )
print(a)
print(a.shape)
print(a.dtype)

Notice that if you don't specify a datatype, many array generation functions default
to generating arrays of floating point numbers (with the standard number of bits 
for a float type used).

`zeros()` is useful if you need to ensure all values are initially 0, but if
you know you will be filling in all values, you can just get an empty array.
This sometimes has all 0's anyway, but the memory is not initialized so garbage
values can also be in the array you get back using `empty()`.

In [None]:
# empty array of  3 rows and 4 columns
a = np.empty( (6, 8) )
print(a)
print(a.shape)
print(a.dtype)

There are many other
[array generation functions](https://numpy.org/doc/stable/reference/routines.array-creation.html) 
that can be used.  

# Basic operations (Vectorized computations)

NumPy arrays support a style of programming known as vectorized computations,
first popularized in the Matlab programming enviornment (I believe).

What this means is that, instead of using loops to access and perform
operations on the elements of an array, we can treat an array as a single
value/variable and apply an operation to all elements with a single statement.

An example should make this clearer.

## Elementwise arithemetic

For example, you can easily apply simple operations to all elemnts of an array,
like addition or multiplication, by performing a scalar operation with an
`ndarray` object.  A **scalar** value is simply a fancy way of saying a
single value, e.g. a regular Python variable holding 1 value, instead of an
array that holds many values.

So any array we have we can perform elementwise scalar operations on it.

In [None]:
# 100 random integers in range from 1 to 10
a = np.random.randint(1,11,(10,10))
a

In [None]:
# subtract 10 from each element of array using vectorized operation
a - 10

In [None]:
# multiple each array element by 10
10 * a

In [None]:
# divide all elemsnt by 10, result should be floats in range 0.1 to 1.0
a / 10

## Vectorized functions (Universal Functions)

Many functions have been provided in NumPy to perform vectorized operations
on NumPy arrays.  These are also called Universal functions (ufunc).

For example, all of the math function in the standard `math` library are
scalar functions, the expect a single value.  NumPy provides vectorized
versions of all of the math functions that performs the elementwise
operation on all of the values.
[NumPy mathematical functions reference](https://numpy.org/doc/stable/reference/routines.math.html)
These are all in the top NumPy namespace.  So for example


In [None]:
# array of random numbers in range -2pi to 2pi
b = np.random.uniform(-np.pi, np.pi, (10,10))
b

In [None]:
# sin of each value
np.sin(b)

In [None]:
# logarithm of each value, need to take absolute value because logarithm
# not defined for negative values
np.log(np.abs(b))

So far all of the previous examples have applied a scalar or a function
transformation to each element of an array.

Operations between 2 arrays are also defined, and by default perform elementwise
operations.  They are defined normally when the arrays are of the same shape.
An example should make this clearer.

In [None]:
# arrays of same size and shape
a = np.array([1, 5, 7, 9])
b = np.array([-1, 4, -10, 6])

In [None]:
a + b

In [None]:
a - b

In some vectorized languages, multiplication opeator `*` is defined to perform
matrix multiplcation by default.  For NumPy arrays, it performs elementwise
multiplication by default.

In [None]:
A = np.array( [ [1, 1],
                [0, 1] ])

B = np.array( [ [2, 0],
                [3, 4] ])

# this is just the elementwise product of the two matrices
A * B

If you need true matrix multiplication, the `@` operator has been overloaded
in recent versions of Numpy for matrix multiplication, or you can use the
`np.dot()` function which computes the dot product, another name for matrix
multiplication.

In [None]:
# matrix multiplication
A @ B

In [None]:
# also the dot() member function performs matrix multiplication
A.dot(B)

## Combining vectorized operations

In the previous examples we  showed that elementwise vectorized operations can be chained.
The result of taking the absolute value of the matrix was then fed to the
`log()` function, so we applied absolute value followed by log function
operations successively to each value of array b.

We also showed that when 2 arrays are used on both sides of an operation, the
result is an elementwise vectorized operation.

In general more complex vectorized expressions can be built up by composing
many basic vectorized operations together.  So for example, jumping ahead a bit,
but a very common thing we might do is something like the following.

In [None]:
# x values range from -4 to 4
x = np.linspace(-4.0, 4.0, 1000)

Calculate and display function

$$
f(x) = 50 \sin(x^2) + \frac{1}{2} x^3 - 10 \cos(\frac{x}{2})
$$

In [None]:
# calculate the function f(x) composed of elementwise transformations
y = 50.0 * np.sin(x**2.0) + 0.5 * x**3.0 - 10.0 * np.cos(x/2.0)

In [None]:
# display the resulting function as a plot
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.plot(x, y);

## Array member functions

In addition to many universal functions (like the vectorized math
functions) provided by NumPy, there ae many member functions of
an array itself that we will use.  These member functions
are sometimes known as unary operations.
[ndarray member function reference](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)

A few examples of some useful array member functions.

In [None]:
# array of random integers
a = np.random.randint(1, 6, (3,6))
a

In [None]:
# minimum value of the array
a.min()

In [None]:
# minimum of each column, use axis=0
a.min(axis=0)

In [None]:
# minimum of each row, use axis=1
a.min(axis=1)

In [None]:
# mean of all values
a.mean()

In [None]:
# sum of all values
a.sum()

In [None]:
# sum of each column
a.sum(axis=0)

In [None]:
# reshape the array
a.reshape( (6,3) )

In [None]:
a.reshape( (2,3,3) )

In [None]:
a.flatten()

## Elementwise logical expressions

Logical expressions work elementwise for NumPy arrays, just like numerical
expressions.  The result will be an array of boolean values.

So for example you can use the logical operators like `<`, `>=`, `==`, etc.
to get a result of booleans.

In [None]:
a

In [None]:
a == 1

In [None]:
a <= 2

We will see later on when we perform boolean indexing how and why this
might be useful.

You can make more complex boolean expressions, however the logical operators
like `and` `or` and `not` do not work on arrays for elementwise boolean
expressions, so the following will fail if you want the locations in
a where the value is either a 1 or a 5.

In [None]:
try:
    (a == 1) or (a == 5)
except ValueError:
    print("ValueError was generated: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()")

The error message here is misleading.  You should use the numpy logical functions
to perform these operations, like this.

In [None]:
np.logical_or(a == 1, a == 5)

Also, though `and` and `or` were not overloaded to perform elementwise logical
comparision, instead the bitwise operators were overloaded for NumPy arrays
for this purpose.  These are the operators `&` for and `|` for or,
`~` for not and `^` for xor.  

So you will often commonly see these operators used instead to create logical
expressions between multiple NumPy arrays.

In [None]:
(a == 1) | (a == 5)

# Indexing, Slicing and Iterating

It is important you understand array slicing.  A one-dimensional array
can be indexed, sliced and iterated over in pretty much an identical
manner to lists and other Python sequences.

In [None]:
# a 1-dimensional array of 10 values
a = np.arange(10)**3
a

In [None]:
# index the 3rd element at index 2
a[2]

In [None]:
# get a slice from element 2 to up to by not including element 5
a[2:5]

In [None]:
# use step size to get even values from start up to element 6
a[:7:2]

In [None]:
# reverse the array
a[::-1]

In [None]:
# iterate over elements in the array
for val in a:
    print(val**(1.0/3.0))

Although as we noted in a previous video, there is one big difference between
arrays and regular Python sequences when it comes to slicing.

When you slice an array, you get a view of the array.

In [None]:
b = a[2:5]
b

In [None]:
a[2:5] = 0
a

In [None]:
b

So notice by change the values to 0 in a, the view in b now also shows them as 0.

But be aware that some operations work in place, so the view will stay, but some
operations end up creating a new copy of hte values.  For example.

In [None]:
# b is still a view
b[1] = 10
b

In [None]:
# sow the change to a value of 10 shows up in a as well
a

In [None]:
# operators +=, *=, etc. will work in place, so b remains a view of a
b += 5
b

In [None]:
a

In [None]:
# but performing the same operation again, but using just a regular assignment
# causes a copy to be made
b = b + 5
b

In [None]:
a

The lesson of the last example is you do have to be a bit careful.  I usually
assume slices will be views.  Making a view rather than a copy is done for
efficiency reasons.  A primary goal of NumPy is to provide efficient numerical
in memory calculations.  So copying arrays when evaluating results is avoided
if at all possible, thus views are the normal result from a slice.

## Multidimensional indexing and slicing

When your `ndarray` has 2 or more dimensions, you can still perform indexing
and slicing operations.  All of the same indexing and slicing syntax is
available on a NumPy array for every dimension of the array.

So for a multidimensional array you can hae one indexing/slicing expression
per axis.  These indices are given in a tuple separated by commas.


In [None]:
# construct array as a function of each coordinate.  Since
# the array is 2 dimensional, the function will receive two coordinates,
# the row number and column number
def f(row, col):
    return 3*row + col**2

# create an 2-dimensional array of integers
b = np.fromfunction(f, (5,4), dtype=int)
b

In [None]:
# item at row 2, column 3
b[2,3]

In [None]:
# item at row 3, column 1
b[3,1]

We can extract a column or set of columns.  This is common if/when we want to
perform an operation on some attribute or feature of a data set.

In [None]:
# all rows, the column 1 only of them
b[0:5, 1]

In [None]:
# since there are only 5 rows, this is equivalent to extract column 1
b[:, 1]

Likewise we can extract rows.  This is common to get a subset of samples from
a data set.  

In [None]:
# extract rows 2 and 3, all columns
b[2:4, :]

If we want all columns, you can omit any final dimensions where you want all values
in them.  So to get rows 2 and 3 we simply do the following.  We will use this
type of slice frequently to extract samples from a table of data values we are
manipulating.

In [None]:
b[2:4]

We can get even more complex, like get the values in the middle and use
steps, which sometimes can be useful.

In [None]:
# the clip off first and last row and column to get middle values
b[1:-1, 1:-1]

In [None]:
# get every other row
b[::2]

In [None]:
# get every other column, start at column 1
b[:,1::2]

An notice like we did before for a single dimension, we can use a slice to
manipulate only the values in a row or column, or in general some slice.

In [None]:
# make column 2 be the negative of its current value
b[:,2] = -b[:,2]

# subtract 10 from values in row 3
b[3] -= 10

b

We almost never iterate over NumPy arrays.  If you are iterating over a NumPy
array you are probably not using more efficient vectorized elementwise operations.

However, we can do it like we did for a single dimensional array.  If we do it for
the 2-dimensional array, we will iterate over each row 

In [None]:
for row in b:
    print(row)

If you want to iterate over every element, you can flatten the array using
either the `flatten()` method, or better use the `flat` attribute of arrays which
gives a 1-dimensional view into the array.

In [None]:
for element in b.flat:
    print(element)

If you want to iterate over columns, you can use an explicit index and slice it.

In [None]:
# extract number of cols in b
rows,cols = b.shape
for col in range(cols):
    print(b[:,col])

# Reshaping and Stacking arrays

We already showed some examples of reshaping an array.  In general you can
always flatten an array to a one-dimensional vector if needed.  All
reshaping will always end up with the same number of elements, just shaped into
different dimensions.

In [None]:
b.shape

In [None]:
b.flatten()

In [None]:
# reshape as 3-dimenstional, 2 tables of 5 rows and 2 columns
b.reshape((2,5,2))

Several arrays can be stacked together along different axes.  The
size of the axis/dimension being stacked along will need to match for
the stacking to work.

In [None]:
a = np.random.randint(0, 5, (2,2))
a

In [None]:
b = np.random.randint(0, 5, (2,2))
b

In [None]:
# vertical stacking, put a on top of be
# notice you need to pass in the arrays as a tuple here, because you can actually
# stack 3 or more arrays of the same number of columns if you wish.
np.vstack((a,b))

In [None]:
# horizontal stacking, stack b beside a
np.hstack((a,b))

# Advanced Indexing and Index Tricks

NumPy offers more indexing facilities than retular Python sequences.
In addition to indexing by integers and slices (as we saw above), arrays
can be indexed by arrays of integers and arrays of booleans.

## Indexing with arrays of indices

If we have an array of indexes (also works with a regular Python sequence of 
indexes, like a list), we can use this to index into an array.

For example.

In [None]:
# use our 2-dimensional array b again from before 

# construct array as a function of each coordinate.  Since
# the array is 2 dimensional, the function will receive two coordinates,
# the row number and column number
def f(row, col):
    return 3*row + col**2

# create an 2-dimensional array of integers
b = np.fromfunction(f, (5,4), dtype=int)
b

In [None]:
# selection rows 1 3 and 4 only
rows = np.array([1,3,4])
b[rows]

In [None]:
# works with a regular python list as well
rows = [1, 3, 4]
b[rows]

In [None]:
# works to select columns
cols = [0, 2]
b[:,cols]

In [None]:
# and we can select only some columns from some rows.  But when you supply
# both an index for rows and columns, the lists must have the same shape
# so to get rows 1,3,4 and only the columns 0 and 2 from them we need
# to do something like this

# first start simply, get column 0 and 2 from row 1 only
rows = np.array([1, 1])
cols = np.array([0, 2])
b[rows,cols]

In [None]:
# now we want those columns from rows 1 and 3
# notice the arrays have to be of the same shape
rows = np.array([[1, 1],
                 [3, 3]])
cols = np.array([[0, 2],
                 [0, 2]])

b[rows,cols]

In [None]:
# finally what we really wanted, rows 1,3 and 4 but only columns 0 and 2.
# result of this is 3 rows and 2 columns, so our rows and columns
# indexes need to be 3x2 to match the shape we are trying to extract
rows = np.array([[1, 1],
                 [3, 3],
                 [4, 4]])
cols = np.array([[0, 2],
                 [0, 2],
                 [0, 2]])

b[rows,cols]

This latter example is less common, we almost always need some particular rows
or some particular columns.  And if you do need some rows and only some columns
from those rows, it can be easier to perform the slicing in two steps instead.

In [None]:
# first extract the sample rows
rows = [1, 3, 4]
samples = b[rows]
samples

In [None]:
# now keep only the attributes we need
cols = [0, 2]
samples = samples[:,cols]
samples

Here is a more complex example of using lists to index into an array.
We often might want to find the particular value in each row or column
that is special or of some interest.

Say we have a data set of 4 time dependent series.
Here each series is a row, and the column would represent the common sample
time in each case, e.g. t=0, 1, 2, 3, 4, 5, 6 (though we are just making
up some example data here).

In [None]:
# 4 time-dependent series sampled in each row of 7 time steps each
data = np.sin(np.arange(28)).reshape(4,7)
data

In [None]:
# index of maxima for each series,
# argmax returns the index where the maximum occurs, 
# using axis=1 means we get the index for each of our 4 sample rows where 
# maximum value occurs
ind = data.argmax(axis=1)
ind

In [None]:
# these are the indexes.  Say the columns refer to a sample of the time
# series starting at time 20 and going til time 120.  So the sample times
# represent a time being sampled at
time = np.linspace(20, 120, 7)
time

In [None]:
# so we can use the indexes to find the actual time where the maximum
# occured for each time series sample
time[ind]

## Indexing with boolean arrays

When we index arrays with arrays of (integer) indices we are providing the lit of
indices to pick.

With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don't.

The most naturaly way to think of and use boolean indexing is to use an array
of boolean values that has the *same shape* as the original array.

For example, we can create a boolean array using boolean operations, as we
discussed earlier.

In [None]:
b

In [None]:
# boolean array of those items divible by 3
b % 3 == 0

In [None]:
# use boolean indexing to extract only those values divible by 3
b[ (b % 3 == 0) ]

You will notice the result will be a 1-dimensional vector, even though in this
case were selecting columns 0 and the last column, so we could have
wanted a `(5,2)` shaped result.

But this happens because a boolean selection can pick arbitrary values from
the array.  For example.

In [None]:
# notice you should not use and or or to boolean expressions between arrays, use &
# and | instead, or a.any() a.all()
(b >= 4) & (b <= 10)

In [None]:
# values between 4 and 10
b[ (b >= 4) & (b <= 10) ]

Boolean indexing can be very useful for assignments, for example if we
want to zero-out all of the numbers divisbly by 3


In [None]:
# assign something into the locations selected by our boolean index
b[ (b % 3 == 0) ] = 0
b

The second way of indexing with booleans is more similar to integer
indexing: for each dimension of the array we give a 1D boolean array selecting
the slices we want.

This implies that the boolean array has to have the same size as the size for the
dimension you are slicing on.

So we can get the same rows and columns we did previously but using
boolean indexing like the following.

In [None]:
# use our 2-dimensional array b again from before 

# construct array as a function of each coordinate.  Since
# the array is 2 dimensional, the function will receive two coordinates,
# the row number and column number
def f(row, col):
    return 3*row + col**2

# create an 2-dimensional array of integers
b = np.fromfunction(f, (5,4), dtype=int)
b

In [None]:
# select rows 1, 3 and 4 again but with a boolean index
rows = [False, True, False, True, True]
b[rows]

In [None]:
# select columns 0 and 2
cols = [True, False, True, False]
b[:,cols]