In [73]:
%matplotlib inline
import numpy as np
from IPython.core.display import HTML
HTML('<link href="https://fonts.googleapis.com/css?family=Cabin|Quicksand" rel="stylesheet"><style>.container{width:90% !important; font-family: "Cabin", sans-serif;}em{color: red !important;}</style><style>.output_png {display: table-cell;text-align: center;vertical-align: middle;}</style>')

# Indexing Arrays
## Numerical Elaboration in Python


## Boolean Indexing

- if one writes a logical test (e.g. X == 1) then 
  - every element of the np array X is tested
  - the result is an array of the same shape of Booleans (True or False)



In [74]:
X = np.array([[0,1,2],[3,4,5]])
print(X)
X >= 2

[[0 1 2]
 [3 4 5]]


array([[False, False,  True],
       [ True,  True,  True]])

## Fancy Indexing

- indexing that uses boolean or integer arrays, also called *masks*
- the resulting array will be a *copy*, **not** a *view*
- Question: what is the shape of the returned array?

In [75]:
X = np.array([[0,1,2],[3,4,5],[6,7,8]])
print(X)
M = X >= 2
M

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [76]:
X = np.array([[0,1,2],[3,4,5],[6,7,8]])
print(X)
M = X >= 2
M

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [77]:
X[M]

array([2, 3, 4, 5, 6, 7, 8])

In [78]:
X[X >= 2]

array([2, 3, 4, 5, 6, 7, 8])

**Note:** the shape of the original array is lost!

The result is a 1-dimensional array.

The meaning of the selection procedure is: 

- pick certain elements and not others. 
- the elelements that are not selected do not exist, cannot be referenced, they do not have a position.  

In [79]:
# we can however modify the selected elements! 

X[X>=2]=-1
X

array([[ 0,  1, -1],
       [-1, -1, -1],
       [-1, -1, -1]])

The meaning of the operation is: 

- act on the selcted elements of the array
- do not act on the non selected elements

So the original array shape can be preserved.

Can we use the result of boolean conditions on array A to index array B?

Yes, just be careful with the shapes!

In [80]:
A = np.arange(20).reshape(4,5)
B = np.arange(20,40).reshape(4,5)
print(A)
print('-'*20)
print(B)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
--------------------
[[20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]]


In [81]:
B[A%2==0]

array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38])

What happens if A and B do not have the same shape?

In [83]:
A = np.arange(20).reshape(4,5)
B = np.arange(20,40).reshape(4,5)
B[A%2==0]

array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38])

## Indexing with an Integer Array (or list)

In [19]:
X = np.arange(20).reshape(4,5)
X

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [44]:
ids = [1,3]
X[ids]

array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])

**behaviour:** an array (or list) of integers selects *rows* in a multi dimensional array

How can I select desired columns?

In [46]:
X = np.arange(20).reshape(4,5)
print(X)
ids = [1,2]
X[:, ids]

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


array([[ 1,  2],
       [ 6,  7],
       [11, 12],
       [16, 17]])

Ok then, how about selecting specified rows and columns?

In [47]:
X = np.arange(20).reshape(4,5)
print(X)
ids = [1,3]
X[ids, ids]

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


array([ 6, 18])

this is not what one expects!

The selection with `[ids, ids]` is asking to use the tuple `(ids,ids)`

In [36]:
ids = [1,3]
(ids, ids)

([1, 3], [1, 3])

In [37]:
X[([1, 3], [1, 3])]

array([ 6, 18])

How can we ask to select rows 1 and 3 and columns 1 and 3?

In [49]:
X = np.arange(20).reshape(4,5)
print(X)
r_ids = [1,3]
c_ids = [1,3]
X[r_ids][:,c_ids]

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


array([[ 6,  8],
       [16, 18]])

## Where

- the function `where` checks the truth of the statement on each element of the matrix
- it returns a tuple of arrays with the indices of the elements that satisfy the statement
- similar to boolean masking but exposes the selected ids

In [90]:
X = np.arange(20).reshape(4,5)
print(X)
np.where(X>3)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


(array([0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]),
 array([4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

In [52]:
r_ids, c_ids = np.where(X>3)
print(r_ids)
print(c_ids)
print(X[r_ids,c_ids])
print('-'*50)
ids = np.where(X>3)
print(X[ids])

[0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3]
[4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]
[ 4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
--------------------------------------------------
[ 4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [71]:
X = np.arange(20).reshape(4,5)
print(X)
ids = np.where(X>3)
X[ids]=-1
X

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


array([[ 0,  1,  2,  3, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1]])

## Stacking together different arrays

- arrays can be stacked together along different axes
- `vstack` for vertical stacking and `hstack` for horizontal stacking
- when array has more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes

In [8]:
X = np.arange(10).reshape(2,-1)
Z = (np.arange(10)+20).reshape(2,-1)
print(X)
print('-'*20)
print(Z)

[[0 1 2 3 4]
 [5 6 7 8 9]]
--------------------
[[20 21 22 23 24]
 [25 26 27 28 29]]


In [13]:
np.hstack([X,Z])

array([[ 0,  1,  2,  3,  4, 20, 21, 22, 23, 24],
       [ 5,  6,  7,  8,  9, 25, 26, 27, 28, 29]])

In [14]:
np.vstack([X,Z])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

## Splitting arrays

- arrays can be split along different axes
- `vsplit` and `hsplit` for vertical and horizontal splitting
- e.g. `hsplit` can split an array along its horizontal axis
  - either by specifying the number of equally shaped arrays
  - or by specifying the columns after which to do the division

In [58]:
X = np.arange(12).reshape(2,-1)
print(X)
L = np.hsplit(X,3) # split in 3
L

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


[array([[0, 1],
        [6, 7]]),
 array([[2, 3],
        [8, 9]]),
 array([[ 4,  5],
        [10, 11]])]

In [25]:
L = np.hsplit(X,(3,4))   # split a after the third and the fourth column
print(L[0])
print(L[1])
print(L[2])

[[0 1 2]
 [6 7 8]]
[[3]
 [9]]
[[ 4  5]
 [10 11]]


- check other manipulations offered by numpy
- array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack


# Operations between arrays

All arithmetic operations (+, -, ...) between two arrays are supported if both arrays have the same shape

In [66]:
A = np.arange(10).reshape(2,5)
B = np.arange(10,20).reshape(2,5)
print(A)
print('-'*30)
print(B)

[[0 1 2 3 4]
 [5 6 7 8 9]]
------------------------------
[[10 11 12 13 14]
 [15 16 17 18 19]]


In [73]:
A+B

array([[10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

# Broadcasting

- broadcasting is used to decide how to handle arrays with different shapes
- e.g. all arithmetic operations (+, -, ...) between ndarrays *broadcast* the arrays before operation

- only specific cases are allowed
- it obeys four rules

# Broadcasting Rules

|#|what|
|---|:---|
| 1| all input arrays with ndim smaller than the input array of largest ndim, have 1’s *prepended* to their *shapes* |
| 2| the size in each dimension of the output shape is the *maximum* of all the input sizes in that dimension |
| 3| an input can be used in the calculation *if* its size in a particular dimension either
|  |  - matches the output size in that dimension|
|  | - or has value exactly 1 |
| 4| if an input has a dimension *size of 1* in its shape, the first data entry in that dimension will be used for all calculations along that dimension |


In [59]:
a = np.arange(5).reshape(5,1)
a

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [60]:
b = np.arange(6).reshape(1,6)
b

array([[0, 1, 2, 3, 4, 5]])

In [61]:
c = np.arange(6).reshape(6,)
c

array([0, 1, 2, 3, 4, 5])

In [62]:
d = np.array(7)
d

array(7)

Let's say we want to do `a*b`
- `a` has shape (5,1), `b` (1,6)
- Rule 2: shape is (5,6)
- for `a` Rule 4 so that `a[:,0]` is broadcast to the other columns
- for `b` Rule 4 so that `b[0,:]` is broadcast to the other rows

In [39]:
a*b

array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4,  5],
       [ 0,  2,  4,  6,  8, 10],
       [ 0,  3,  6,  9, 12, 15],
       [ 0,  4,  8, 12, 16, 20]])

Let's say we want to do `a*c`
- `a` has shape (5,1), `c` (6,)
- for `c` Rule 1, so `c` acts like a `(1,6)`
- Rule 2: shape is (5,6)
- Rule 4: `c[:]` is broadcast to every row

In [118]:
a*c

array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4,  5],
       [ 0,  2,  4,  6,  8, 10],
       [ 0,  3,  6,  9, 12, 15],
       [ 0,  4,  8, 12, 16, 20]])

Let's say we want to do `a*d`
- `a` has shape (5,1), `d` ()
- for `d` Rule 1, so `d` acts like a `(1,)`
- again Rule 1, so `d` acts like a `(1,1)`
- Rule 2: shape is (5,1)
- Rule 4: `d` is broadcast to every row

In [75]:
a*d

array([[ 0],
       [ 7],
       [14],
       [21],
       [28]])

Let's say we want to do `b*d`
- `b` has shape (1,6), `d` ()
- for `d` Rule 1, so `d` acts like a `(1,)`
- again Rule 1, so `d` acts like a `(1,1)`
- Rule 2: shape is (1,6)
- Rule 4: `d` is broadcast to every column

In [122]:
b*d

array([[ 0,  7, 14, 21, 28, 35]])


# Summary

1. Arrays can be indexed in many ways and offer support for bulk updates
2. Broadcasting allows compact expressions to manipulate arrays without resorting to loops

Create a one dimensional array from 1 to 99

Choose random number n in the interval 0 100

find closest value to n in a one dimensional array

In [40]:
import numpy as np
X = np.random.rand(100)*100
np.sort(X)[:10]

array([0.00622131, 0.2442277 , 0.32437554, 0.52134989, 0.7220988 ,
       1.07085177, 2.13058833, 4.27559007, 4.79619539, 5.61173482])

In [51]:
n = np.random.rand()*100

closest = 10000
for value in X:
    if abs(value - n) < closest:
        closest = value
print(n, closest)


13.165774250931694 0.24422770420389206


In [72]:
X = np.random.randint(0,100,40).reshape(10,-1)

newIndices = np.argsort(X, axis=0)
print(X[newIndices].shape)

(10, 4, 4)
