<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [Some Simple Setup](#Some-Simple-Setup)
* [Additional Selection from Arrays](#Additional-Selection-from-Arrays)
* [Boolean Reduction](#Boolean-Reduction)
* [Grid/Window/Neighbor Operations](#Grid/Window/Neighbor-Operations)
* [Window Ops with Pandas](#Window-Ops-with-Pandas)
* [Fancy Indexing](#Fancy-Indexing)
* [Fancy Indexing with Pandas](#Fancy-Indexing-with-Pandas)
* [Gotchas](#Gotchas)

# Learning Objectives:

After completion of this module, learners should be able to:

* use and explain, *boolean indexing*, & *fancy indexing* in numpy

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

# Additional Selection from Arrays

Boolean operations on an array proceed elementwise and result in bool values.

In [None]:
# this also holds for comparisons, but results in a boolean array
arr = 2 ** np.arange(5)
print("arr:")
dump_array(arr)

print("\nafter a boolean test:")
tested = arr > 4
dump_array(tested)

We can also use the boolean array as an indexer back into the original array.

In [None]:
arr[tested]

In [None]:
arr[arr < 4]

Sometimes we want to remove elements from an array that don't meet a certain criteria.  We can do that with `np.where` which is defined in the NumPy module.  It is not a method on NumPy arrays.  We will encounter more of these functions shortly.  One note:  `np.where` is designed to work on N-dimensional arrays.  Because of that, `np.where` insists on returning a tuple of arrays; one array of indices per dimension of the original array.  This makes perfect sense when the number of dimension is greater than one.  However, it can be a *gotcha* when there is only one dimension.  In either case, the indices can be used directly as an `array` index.

This is our first look at fancy indexing.  We'll go into more detail later.  

In [None]:
# pick the indices of the even elements of arr 
arr = np.arange(10, 20)
indices = np.where(arr % 2 == 0)
print("the indices:  ", indices)
print("the elements: ", arr[indices])

In the multi-dimensional case from `np.where`, the indices are lined up pairwise and used to select elements at that position.

In [None]:
arr = np.arange(20).reshape(5,4)
print("a 2D array:")
print(arr, end=vsep)

indices = np.where(arr % 2 == 0)
print("the indices:  ", indices)
print("the elements: ", arr[indices])

In [None]:
np.where(np.logical_not(np.isnan(arr)))

`np.where` also has an alternate usage pattern:  `np.where(CONDITION, VALUE_WHEN_TRUE, VALUE_WHEN_FALSE)`.  Here, two arrays provide the values when the condition is met or fails.  If `VALUE_WHEN_TRUE/FALSE` is not of the right shape, it will be *broadcast* (expanded) into a compatible shape, if possible.

In [None]:
# np.where can also be used to select elements out of arrays (directly)
arr = np.arange(10, 20)
print("                        original:", arr, end=vsep)
print("evens -> 1.0, odds -> -99.0:", 
      np.where(np.logical_not(arr % 2 == 1), 1.0, -99.0), end=vsep)

# either or both VALUEs can be arrays
print("odds -> 0.0, evens stay same:", 
      np.where(arr % 2 == 0, arr, 0.0))

In [None]:
arr = np.arange(20, dtype=float).reshape(5,4)
masked = np.where(arr % 3, arr, float('nan'))
masked

In [None]:
np.where(np.logical_not(np.isnan(masked)))

In [None]:
plt.plot(arr.flatten(), masked.flatten(), "r-");

# Boolean Reduction

Numpy arrays cannot be easily converted to boolean values so reduction member methods `any()` and `all()` are provided.

In [None]:
bool(arr)

In [None]:
arr

In [None]:
arr.any() #True if any value casts to a True

In [None]:
arr.all() #True if ALL values cast to a True

Reductions can be applied to any expression that returns a Numpy array

In [None]:
arr2 = 2 ** np.arange(5)
print(arr2>4)
print((arr2>4).any())
print((arr2>4).all())

Remember that even arrays created with `np.empty` are not empty but filled with unitialized data.

In [None]:
arr3=np.empty(100)
print(arr3)
print(arr3.any())

# Grid/Window/Neighbor Operations

Sometimes, clever slicing and mathematics can implement "window" style operations.  Note here, the selection we've done reduces the total length by two.

<center>
![](img/numpygrid.scaled-noalpha.png)
</center>

In [None]:
test = np.random.randint(10, size=(10,))
print("some sample data:")
dump_array(test)

firstVals  = test[ :-2]  # how many values are in each of these?
secondVals = test[1:-1]
thirdVals  = test[2:  ]

movingWindowAverage = (firstVals + secondVals + thirdVals) / 3.0 
print("\na 3-element moving average")
dump_array(movingWindowAverage)

# we can write that more compactly as:
# (test[:-2] + test[1:-1] + test[-2:]) / 3.0

# Window Ops with Pandas

We will visit pandas later, but here's a comparison of the same operation above expressed as a pandas operation.

In [None]:
s = pd.Series(test)
s

Notice that we are fully preserving the shape of the input, which is mapped back to the original index.

In [None]:
s.rolling(window=3).mean()

# Fancy Indexing

There are a few forms of *fancy indexing* in NumPy.  The first is indexing an array by other arrays (we've seen one example of this).  The result has the same shape as the indexing arrays.

In [None]:
arr = np.arange(15).reshape((3,5))
print(arr)

# select 0,0  0,2  1,3   2,4  as a 2x2 array
rows = np.array([[0, 0], 
                 [1, 2]])
cols = np.array([[0, 2], 
                 [3, 4]])

# values are lined up pairwise (from inputs) to positions of output
print("\nSelect the position pairs:")
print("I.e. (0,0) (0,2) (1,3) (2,4) as a 2x2 array")
print(arr[rows, cols]) # ---> in shape of row,col index matrices

In [None]:
arr[np.array([0,0,1,2]), np.array([0,2,3,4])]

In [None]:
arr[rows.reshape(1,4), np.array([0,2,3,4])]

In [None]:
arr[2, np.array([0,2,3,4])]

Another form of fancy indexing comes when we use an array of *np.bool_*.  Here, the indexing array must have an element (True/False) for *every* position in the base array.

In [None]:
evens = (arr % 2 == 0) # note, arr % 2 is not boolean. 
                       # quick question:  how can we check that?
print(arr, end=vsep)
print(evens, end=vsep)
print(arr[evens]) # boolean array: yes or no for each element
                  # compare with indices: pick elements

# Fancy Indexing with Pandas

In [None]:
df = pd.DataFrame(arr)
df

pandas will preserve the input shape, replacing the value with ``NaN`` where the condition is not ``True``. Notice the dtype change as well.

In [None]:
df.where(arr % 2 == 0)

# Gotchas

One other gotcha:  Python's `bool` type is *not* the same as `np.bool_`.  In fact, Python's `bool`s are Python `int`s (you can check it below).  Thus, NumPy will use them as the values `0` and `1` — which means numerical indexes.

In [None]:
arr = np.arange(35).reshape(5,7)
print("arr:")
print(arr, end=vsep)

b = arr > 20
print("a boolean selection:\n", arr[b], end=vsep)
print("some of the boolean indices:\n", b[:,5], end=vsep)

# compare, Python's True/False (python bool) are -ints-.  
# prove it: print isinstance(True, int)
# NumPy uses them as 0/1 integers
#      ... unless they are in a np.array of type np.bool
print("raw Python bools as indices")
# select row[0], row[0], row[0], row[1], row[1]
pyBools = [False, False, False, True, True]
print(arr[pyBools], end=vsep)

print("np.bools")
npBools = np.array(pyBools, dtype=np.bool)
# broadcast out across columns, so apply selection to each column
print(arr[npBools])

<img src='img/copyright.png'>