## 3. NumPy Indexing and Selection

`ndarrays` can be indexed using the standard Python `x[obj]` syntax, where x is the array and obj the selection. There are three kinds of indexing available: 
   - field access, 
   - basic slicing, 
   - advanced indexing. 
 
Which one occurs depends on obj.
   - https://docs.python.org/release/2.3.5/whatsnew/section-slices.html
   - https://realpython.com/pandas-settingwithcopywarning/s

`Referencing narrays follows the principles: Right Assignment ( = xxx)`
- `Slicing arrays returns views, so the initial narray can be modified`
- `Using index and mask arrays returns copies.`

`Referencing narrays follows the principles: Left assignment (xxx = )`
- `Slicing, index and mask arrays returns views, so the initial narray can be modified`


### 3.1. One-dimensional arrays
One-dimensional NumPy arrays can be accessed more or less like regular python arrays:

In [2]:
import numpy as np
#Creating sample array
arr = np.arange(0,11)
#Show
print(arr)
#Get a value at an index
arr[8]

[ 0  1  2  3  4  5  6  7  8  9 10]


8

In [3]:
#Get values in a range
arr[1:5] = 5 

#Get values in a range
arr[0:5]

array([0, 5, 5, 5, 5])

#### Differences with regular python arrays
Contrary to regular python arrays, if you assign a single value to an `ndarray` slice, it is copied across the whole slice, thanks to broadcasting rules discussed above.

In [4]:
#We got an error for list
a = [1,2,2,4,5,6,7,8]
a[0:5] = 100

TypeError: can only assign an iterable

In [5]:
#Setting a value with index range (Broadcasting)
arr[0:5] = 100
#Show
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [5]:
import numpy as np
# Reset array, we'll see why I had to reset in  a moment
arr = np.arange(0,11)
print (arr)

[ 0  1  2  3  4  5  6  7  8  9 10]


In [6]:

#Important notes on Slices
slice_of_arr = arr[:]
print(slice_of_arr)

[ 0  1  2  3  4  5  6  7  8  9 10]


In [7]:
#Change Slice
slice_of_arr[1:4] = 99

#Show slice
slice_of_arr

array([ 0, 99, 99, 99,  4,  5,  6,  7,  8,  9, 10])

Base object if memory is from some other object.

In [8]:
print(arr)

[ 0 99 99 99  4  5  6  7  8  9 10]


In [7]:
slice_of_arr.base is None

False

Now note the changes also occur in our original array!

In [8]:
arr

array([ 0, 99, 99, 99,  4,  5,  6,  7,  8,  9, 10])

but if we use mask, we will end up with copies

In [9]:
arr = np.arange(0,11)

#Important notes on mask and index
mask_of_arr = arr[[0,1,2,3]] 

#Change Slice
mask_of_arr[1:4]=9999

#Show slice
mask_of_arr

array([   0, 9999, 9999, 9999])

In [10]:
mask_of_arr.base is None

True

In [11]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [36]:
arr>4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [42]:
mask_arr = arr[[True,False,True,False,False,False,False,False,False,True]]
mask_arr[:] = 10
mask_arr

array([10, 10, 10])

In [43]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

In [44]:
mask_arr = arr[arr>4]
mask_arr[:] = 100
mask_arr

array([100, 100, 100, 100, 100, 100])

In [45]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

#### Slicing are Views
`ndarray` **slices are actually *views*** on the same data buffer. This means that if you create a slice and modify it, you are actually going to modify the original `ndarray` as well!
Data is not copied, it's a view of the original array! This avoids memory problems!

In [None]:
#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

### 3.2. Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**. I recommend usually using the comma notation for clarity.

In [14]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [15]:
#Indexing row
arr_2d[1]

array([20, 25, 30])

In [16]:
# Getting individual element value
arr_2d[1][0]

20

In [17]:
arr_2d[(1,0)] # tuple needed

20

In [18]:
# Getting individual element value
arr_2d[1,0]

20

In [19]:
# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

In [20]:
#Shape bottom row: 1D-array
arr_2d[2]

array([35, 40, 45])

In [21]:
#Shape bottom row: 1D-array
a = arr_2d[:,-1]
print(a.shape,'\n', a)

(3,) 
 [15 30 45]


In [22]:
#Shape bottom row : 2D-array
b = arr_2d[:,-1:]
print(b.shape,'\n', b)

(3, 1) 
 [[15]
 [30]
 [45]]


### 3.3. Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order:

In [23]:
#Set up matrix
arr2d = np.zeros((10,10))

In [24]:
#Length of array
arr_length = arr2d.shape[1]

In [25]:
#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

Fancy indexing allows the following

In [26]:
arr2d[[2,4,6,8]]

array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])

In [27]:
#Allows in any order
arr2d[[6,4,2,7]]

array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

### 3.4. More Indexing Help
Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to find useful images, like this one:

![indexing with numpy](https://i.stack.imgur.com/Q3kpa.png)

### 3.5 Boolean selection

Let's briefly go over how to use brackets for selection based on comparison operators.

In [56]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [57]:
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [58]:
bool_arr = arr>4

In [59]:
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

they are also views

In [60]:
arr[bool_arr] = 99

In [61]:
arr

array([ 1,  2,  3,  4, 99, 99, 99, 99, 99, 99])

but mask selection return copies!!!!!!!!!!!

In [62]:
arr[arr>2] = 100
arr

array([  1,   2, 100, 100, 100, 100, 100, 100, 100, 100])

In [63]:
b = arr[arr>2] 
b

array([100, 100, 100, 100, 100, 100, 100, 100])

In [64]:
b[0] = 5
b

array([  5, 100, 100, 100, 100, 100, 100, 100])

In [65]:
arr

array([  1,   2, 100, 100, 100, 100, 100, 100, 100, 100])

In [None]:
b = 5
b

In [None]:
arr

In [66]:
x = 2
arr[arr>x]

array([100, 100, 100, 100, 100, 100, 100, 100])

In [67]:
data = np.random.randn(7,4)
data

array([[ 1.01549579, -0.36563129,  0.50562643, -0.91787201],
       [-0.0881094 , -0.23797371, -0.80590176, -0.43411639],
       [-1.32342317, -0.20444442,  0.2276854 ,  0.41738631],
       [-1.90720916, -0.89328934, -0.92957773,  0.87455662],
       [ 0.95939917, -2.22242996,  1.31493978,  0.62981948],
       [-1.04039248, -0.80932181,  0.38766721, -0.7503536 ],
       [ 1.45015601, -0.19372728, -1.42229341,  1.09177803]])

In [68]:
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [69]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [70]:
data[names == 'Bob'] # same index

array([[ 1.01549579, -0.36563129,  0.50562643, -0.91787201],
       [-1.90720916, -0.89328934, -0.92957773,  0.87455662]])

In [71]:
data[names == 'Bob',2:] 

array([[ 0.50562643, -0.91787201],
       [-0.92957773,  0.87455662]])

In [72]:
data[names != 'Bob',2:] 

array([[-0.80590176, -0.43411639],
       [ 0.2276854 ,  0.41738631],
       [ 1.31493978,  0.62981948],
       [ 0.38766721, -0.7503536 ],
       [-1.42229341,  1.09177803]])

In [73]:
data[ ~(names == 'Bob')] 

array([[-0.0881094 , -0.23797371, -0.80590176, -0.43411639],
       [-1.32342317, -0.20444442,  0.2276854 ,  0.41738631],
       [ 0.95939917, -2.22242996,  1.31493978,  0.62981948],
       [-1.04039248, -0.80932181,  0.38766721, -0.7503536 ],
       [ 1.45015601, -0.19372728, -1.42229341,  1.09177803]])

In [74]:
data[data >1]

array([1.01549579, 1.31493978, 1.45015601, 1.09177803])

### 3.6. Logical expression: where

In [75]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5])

yarr = np.array([2.1,2.2,2.3,2.4,2.5])

cond = np.array([True, False,True,True,False])

l = [(x if c else y) for x,y,c in zip(xarr,yarr,cond)]
l

[1.1, 2.2, 1.3, 1.4, 2.5]

In [76]:
l = np.where(cond,xarr,yarr)
l

array([1.1, 2.2, 1.3, 1.4, 2.5])

In [77]:
#This can be used on multidimensional arrays too:
np.where([[True, False], [True, True]], [[1, 2], [3, 4]], [[9, 8], [7, 6]])

array([[1, 8],
       [3, 4]])

In [82]:
a = np.random.randn(4,4)
a

array([[-0.232338  ,  1.52654927,  0.05712923,  0.63117179],
       [ 0.09959236,  0.19381587,  1.33033599,  0.16617993],
       [ 0.12866979, -1.60648407, -0.8760068 , -0.2987458 ],
       [-0.16735369, -0.62298384, -0.75365832,  0.14848889]])

In [83]:
a < 1

array([[ True, False,  True,  True],
       [ True,  True, False,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [84]:
np.where(a<1, 2, 1)

array([[2, 1, 2, 2],
       [2, 2, 1, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

In [85]:
np.where(a<1, 2, a)

array([[2.        , 1.52654927, 2.        , 2.        ],
       [2.        , 2.        , 1.33033599, 2.        ],
       [2.        , 2.        , 2.        , 2.        ],
       [2.        , 2.        , 2.        , 2.        ]])

In [86]:
#~ is the bitwise complement operator in python which essentially calculates -x - 1

l1 = a<-1

l2 = ~(a>1)

print(l2)
np.where(l1 & ~l2 ,2,np.where(a>1,1,a))

[[ True False  True  True]
 [ True  True False  True]
 [ True  True  True  True]
 [ True  True  True  True]]


array([[-0.232338  ,  1.        ,  0.05712923,  0.63117179],
       [ 0.09959236,  0.19381587,  1.        ,  0.16617993],
       [ 0.12866979, -1.60648407, -0.8760068 , -0.2987458 ],
       [-0.16735369, -0.62298384, -0.75365832,  0.14848889]])