## 3. NumPy Indexing and Selection

`ndarrays` can be indexed using the standard Python `x[obj]` syntax, where x is the array and obj the selection. There are three kinds of indexing available: 
   - field access, 
   - basic slicing, 
   - advanced indexing. 
 
Which one occurs depends on obj.
   - https://docs.python.org/release/2.3.5/whatsnew/section-slices.html
   - https://realpython.com/pandas-settingwithcopywarning/s

`Referencing narrays follows the principles:`
- `Slicing arrays returns views.`
- `Using index and mask arrays returns copies.`

### 3.1. One-dimensional arrays
One-dimensional NumPy arrays can be accessed more or less like regular python arrays:

In [1]:
import numpy as np
#Creating sample array
arr = np.arange(0,11)
#Show
print(arr)
#Get a value at an index
arr[8]

[ 0  1  2  3  4  5  6  7  8  9 10]


8

In [2]:
#Get values in a range
arr[1:5] = 5 

#Get values in a range
arr[0:5]

array([0, 5, 5, 5, 5])

#### Differences with regular python arrays
Contrary to regular python arrays, if you assign a single value to an `ndarray` slice, it is copied across the whole slice, thanks to broadcasting rules discussed above.

In [3]:
#We got an error for list
a = [1,2,2,4,5,6,7,8]
a[0:5] = 100

TypeError: can only assign an iterable

In [38]:
#Setting a value with index range (Broadcasting)
arr[0:5] = 100
#Show
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [39]:
# Reset array, we'll see why I had to reset in  a moment
arr = np.arange(0,11)

#Important notes on Slices
slice_of_arr = arr[:]

#Change Slice
slice_of_arr[1:4]=99

#Show slice
slice_of_arr

array([ 0, 99, 99, 99,  4,  5,  6,  7,  8,  9, 10])

Base object if memory is from some other object.

In [40]:
slice_of_arr.base is None

False

Now note the changes also occur in our original array!

In [41]:
arr

array([ 0, 99, 99, 99,  4,  5,  6,  7,  8,  9, 10])

but if we use mask, we will end up with copies

In [42]:
arr = np.arange(0,11)

#Important notes on mask and index
mask_of_arr = arr[[0,1,2,3]]

#Change Slice
mask_of_arr[1:4]=9999

#Show slice
mask_of_arr

array([   0, 9999, 9999, 9999])

In [43]:
mask_of_arr.base is None

True

In [44]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

#### Slicing are Views
`ndarray` **slices are actually *views*** on the same data buffer. This means that if you create a slice and modify it, you are actually going to modify the original `ndarray` as well!
Data is not copied, it's a view of the original array! This avoids memory problems!

In [45]:
#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

### 3.2. Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**. I recommend usually using the comma notation for clarity.

In [4]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [5]:
#Indexing row
arr_2d[1]

array([20, 25, 30])

In [6]:
# Getting individual element value
arr_2d[1][0]

20

In [7]:
arr_2d[(1,0)] # tuple needed

20

In [8]:
# Getting individual element value
arr_2d[1,0]

20

In [9]:
# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

In [10]:
#Shape bottom row: 1D-array
arr_2d[2]

array([35, 40, 45])

In [7]:
#Shape bottom row: 1D-array
arr_2d[:,-1]

array([15, 30, 45])

In [11]:
#Shape bottom row : 2D-array
arr_2d[:,-1:]

array([[15],
       [30],
       [45]])

### 3.3. Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order:

In [8]:
#Set up matrix
arr2d = np.zeros((10,10))

In [9]:
#Length of array
arr_length = arr2d.shape[1]

In [10]:
#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

Fancy indexing allows the following

In [39]:
arr2d[[2,4,6,8]]

array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])

In [40]:
#Allows in any order
arr2d[[6,4,2,7]]

array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

### 3.4. More Indexing Help
Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to find useful images, like this one:
![image.png](attachment:image.png)

### 3.5 Boolean selection

Let's briefly go over how to use brackets for selection based on comparison operators.

In [13]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [14]:
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [15]:
bool_arr = arr>4

In [16]:
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

they are also views

In [17]:
arr[bool_arr] = 99

In [18]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

but mask selection return copies!!!!!!!!!!!

In [22]:
b = arr[arr>2] 
b

array([ 3,  4, 99, 99, 99, 99, 99, 99])

In [25]:
b[0] = 5
b

array([ 5,  4, 99, 99, 99, 99, 99, 99])

In [26]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

In [27]:
b = 5
b

5

In [28]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

In [29]:
x = 2
arr[arr>x]

array([ 3,  4, 99, 99, 99, 99, 99, 99])

In [9]:
data = np.random.randn(7,4)
data

array([[ 0.43212991, -0.11997689,  0.16571208, -0.67141264],
       [-0.43794276,  1.53036645, -0.04936644, -0.5034855 ],
       [-0.69016976,  0.2238644 ,  1.12138806, -1.18660284],
       [-1.82814243,  0.09321716,  0.49068386, -1.41635057],
       [ 0.95239241,  1.75698074, -1.07719084,  0.82529906],
       [-0.45936231, -1.40980872, -1.22443406, -1.21790745],
       [-1.17908565, -0.03544574, -0.98794145, -0.01366365]])

In [10]:
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [11]:
data[names == 'Bob'] # same index

array([[ 0.43212991, -0.11997689,  0.16571208, -0.67141264],
       [-1.82814243,  0.09321716,  0.49068386, -1.41635057]])

In [12]:
data[names == 'Bob',2:] 

array([[ 0.16571208, -0.67141264],
       [ 0.49068386, -1.41635057]])

In [13]:
data[names != 'Bob',2:] 

array([[-0.04936644, -0.5034855 ],
       [ 1.12138806, -1.18660284],
       [-1.07719084,  0.82529906],
       [-1.22443406, -1.21790745],
       [-0.98794145, -0.01366365]])

In [14]:
data[ ~(names == 'Bob')] 

array([[-0.43794276,  1.53036645, -0.04936644, -0.5034855 ],
       [-0.69016976,  0.2238644 ,  1.12138806, -1.18660284],
       [ 0.95239241,  1.75698074, -1.07719084,  0.82529906],
       [-0.45936231, -1.40980872, -1.22443406, -1.21790745],
       [-1.17908565, -0.03544574, -0.98794145, -0.01366365]])

In [16]:
data[data >1]

array([1.53036645, 1.12138806, 1.75698074])

### 3.6. Logical expression: where

In [31]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True, False,True,True,False])

l = [(x if c else y) for x,y,c in zip(xarr,yarr,cond)]
l

[1.1, 2.2, 1.3, 1.4, 2.5]

In [32]:
l = np.where(cond,xarr,yarr)
l

array([1.1, 2.2, 1.3, 1.4, 2.5])

In [38]:
#This can be used on multidimensional arrays too:
np.where([[True, False], [True, True]], [[1, 2], [3, 4]], [[9, 8], [7, 6]])

array([[1, 8],
       [3, 4]])

In [46]:
a = np.random.randn(4,4)
a

array([[-1.14434619,  1.09011097, -0.07725356, -0.07833098],
       [ 0.17459116, -2.07300534, -1.76722861, -0.1250024 ],
       [ 0.40839744, -1.124789  , -0.12591595, -0.48047874],
       [-0.42826187, -0.56999814, -0.32481036,  0.66153819]])

In [47]:
a < 1

array([[ True, False,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [49]:
np.where(a<1, 2, 1)

array([[2, 1, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

In [50]:
np.where(a<1, 2, a)

array([[2.        , 1.09011097, 2.        , 2.        ],
       [2.        , 2.        , 2.        , 2.        ],
       [2.        , 2.        , 2.        , 2.        ],
       [2.        , 2.        , 2.        , 2.        ]])

In [54]:
#~ is the bitwise complement operator in python which essentially calculates -x - 1

l1 = a<-1

l2 = ~(a>1)

print(l2)s
np.where(l1 & ~l2 ,2,np.where(a>1,1,a))

[[ True False  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]


array([[-1.14434619,  1.        , -0.07725356, -0.07833098],
       [ 0.17459116, -2.07300534, -1.76722861, -0.1250024 ],
       [ 0.40839744, -1.124789  , -0.12591595, -0.48047874],
       [-0.42826187, -0.56999814, -0.32481036,  0.66153819]])