# CPSC380: 2_numpy_2_indexing

In this notebook, you will learn:
 - Array indexing and slicing
 - Fancy indexing: integer array index, boolean array index 
 - Concatenation and splitting
 
Read more: 
 - textbook (https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) and
 - [Numpy website] (https://numpy.org/).

In [2]:
import numpy as np

## 1 Array indexing and slicing

Numpy offers several ways to index into arrays.

**Slicing**: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [3]:
a = np.array(range(10))
print(a[-1])    # last item in the array
print(a[-2:])   # last two items in the array
print(a[:-2])   # everything except the last two items
print()

print(a[::-1])    # all items in the array, reversed
print(a[1::-1])   # the first two items, reversed 
print(a[:-3:-1])  # the last two items, reversed
print(a[-3::-1])  # everything except the last two items, reversed
print()

print(a[::-2])    # all items in the array, reversed
print(a[1::-2])   # the first  item
print(a[:-3:-2])  # the last two items, reversed
print(a[-3::-2])  # everything except the last two items, reversed

print(a[-1:0:-1])
print(a[::-1])
print(a[:0:-1])


9
[8 9]
[0 1 2 3 4 5 6 7]

[9 8 7 6 5 4 3 2 1 0]
[1 0]
[9 8]
[7 6 5 4 3 2 1 0]

[9 7 5 3 1]
[1]
[9]
[7 5 3 1]
[9 8 7 6 5 4 3 2 1]
[9 8 7 6 5 4 3 2 1 0]
[9 8 7 6 5 4 3 2 1]


In [4]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2,1:3]
print (b)


[[2 3]
 [6 7]]


**Note**: A slice of an array is a view into the same data, so modifying it will modify the original array.

In [5]:
print (a[(0, 1)])  
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print (a[0, 1]) 

2
77


**Important**:

Two ways of accessing the data in the middle row of the array.
 - Mixing integer indexing with slices yields an array of **lower rank**
 - Using only slices yields an array of the **same rank** as the original array:

In [6]:
# Create the following rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print (a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [7]:
# example on row slicing
row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a

print (row_r1, row_r1.shape) 
print (row_r2, row_r2.shape)
print (row_r3, row_r3.shape)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[[5 6 7 8]] (1, 4)


In [8]:
print (a, '\n')

# example on column slicing
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]

print (col_r1, col_r1.shape,'\n') # Rank 1 view of the second column of a  
print (col_r2, col_r2.shape) # Rank 2 view of the second column of a  

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]] 

[ 2  6 10] (3,) 

[[ 2]
 [ 6]
 [10]] (3, 1)


## 2. Fancy indexing

### 2.1 Integer array indexing: 

When you index into numpy arrays using slicing, the resulting array view will always be a **subarray** of the original array. 

In contrast, integer array indexing allows you to construct **arbitrary arrays** using the data from another array. 

Here is an example:

In [9]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)
print()

# An example of integer array indexing.
# The returned array will have shape (3,)
# the first [] indicate the row index, and the second [] indicates the column, so
# a[[r1, r2,...],[c1, c2...]] will be equivalent to a[r1, c1], a[r2, c2]
print (a[[0, 1, 2], [0, 1, 0]])

# The above example of integer array indexing is equivalent to this:
print (np.array([a[0, 0], a[1, 1], a[2, 0]]))

[[1 2]
 [3 4]
 [5 6]]

[1 4 5]
[1 4 5]


In [10]:
# When using integer array indexing, you can reuse the same
# element from the source array:
print (a[[0, 0], [1, 1]])

# Equivalent to the previous integer array indexing example
print (np.array([a[0, 1], a[0, 1]]))

[2 2]
[2 2]


**Selecting** or **mutating** one element from each row of a matrix:

In [11]:
# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print (a)
print()

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print (a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

print (a[[0,1,2,3], [0,2,0,1]])  # equivalent to the above one

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

[ 1  6  7 11]
[ 1  6  7 11]


In [12]:
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10
print (a)

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


**One-hot encoded array**

In [30]:
a = np.array([1, 5, 3, 4, 3, 2, 1, 3, 4, 5, 2, 4, 5, 2, 1]) # integer index
b = np.zeros((a.size, a.max() + 1))
b[np.arange(a.size), a] = 1
c=b[:, 1:]
print(c)

[[1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0.]]


### 2.2 Boolean array indexing: 

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [14]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.
print (bool_idx)
print()

print (a[bool_idx])

[[False False]
 [ True  True]
 [ True  True]]

[3 4 5 6]


**Note**: We use boolean array indexing to construct a rank 1 array consisting of the elements of a corresponding to the True values of bool_idx

In [15]:
# We can do all of the above in a single concise statement:
print (a[a > 2])

[3 4 5 6]


## 3. Concatenation and splitting

### 3.1 Concatenation
- ``np.concatenate``: concatenate horizontally or vertically
- ``np.vstack``: concatenate vertically
- ``np.hstack``: concatenate horizontally

In [16]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
print(x.shape, y.shape)
print(np.concatenate((x, y), axis=0))

z = np.array([99, 99, 99])
print(np.concatenate([x, y, z]))

(3,) (3,)
[1 2 3 3 2 1]
[ 1  2  3  3  2  1 99 99 99]


In [17]:
# two-dimensional arrays:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [18]:
# concatenate along the first axis (along row axis)
np.concatenate([grid, grid], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [19]:
# concatenate along the second axis (along column axis)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of **mixed dimensions**, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [20]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
print(np.vstack([x, grid]), '\n')

# equivalent to np.concatenate
# question: why adding [] for x
print (np.concatenate([[x], grid], axis=0))

[[1 2 3]
 [9 8 7]
 [6 5 4]] 

[[1 2 3]
 [9 8 7]
 [6 5 4]]


In [21]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
print (np.hstack([grid, y]), '\n')

# equivalent to np.catenate
print (np.concatenate([grid, y], axis=1))

[[ 9  8  7 99]
 [ 6  5  4 99]] 

[[ 9  8  7 99]
 [ 6  5  4 99]]


### 3.2 Splitting of arrays
 - ``np.split``: for N split points creates N+1, sub arrays.
 - ``np.vsplit``: splits along the vertical axis.
 - ``np.hsplit``: splits along the horizontal axis. 

In [22]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [23]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [24]:
upper, lower = np.vsplit(grid, [2])
print(upper,'\n')
print(lower)

[[0 1 2 3]
 [4 5 6 7]] 

[[ 8  9 10 11]
 [12 13 14 15]]


In [25]:
left, right = np.hsplit(grid, [2])
print(left,'\n')
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]] 

[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]
