# Indexing and Slice

- indexing
- slice
- boolean slice
- fancy indexing

In [1]:
import numpy as np

regular indexing works just like python

In [2]:
arr = np.array([1, 2, 3, 4, 5])
arr[2]

3

slices almost work like python list slices too

for example
```python
arr[2:4]
# will start selecting from second index until the 4th one [2, 4)
# so it will return 2nd and 3rd indices
```

In [3]:
arr[2:4]

array([3, 4])

just like regular python [:] will return from start to end

In [4]:
arr[:]

array([1, 2, 3, 4, 5])

and the point is the returned value using slice is a `view`, it does NOT duplicate the data so it will just create a reference,  
in other words if i modify this view, the original array will get changes too:

In [5]:
original_arr = np.array([1, 2, 3, 4, 5, 6, 7])
slice_arr = original_arr[1:4]
slice_arr[:] = 42
copied_slice_arr = slice_arr.copy()         # unless you copy the slice like this
copied_slice_arr[:] = 20
original_arr

array([ 1, 42, 42, 42,  5,  6,  7])

### but how can we do it in higher dimensions?

with higher dimensional arrays, you have many more options. in a two dimensional array, the element at each are no longer scalar but rather one-dimensional arrays:

In [6]:
arr_2d = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
arr_2d[2]

array([7, 8, 9])

### then how can we access individual elements?

first we can do it recursively, which is a bit too much work
```python
arr_2d[2][1]
```
but you can also use comma to access individual indices like this:
```python
arr_2d[2, 1]
```

In [7]:
print(arr_2d[2][1])
print(arr_2d[2, 1])

8
8


clearly in higher dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array
for example:

In [8]:
arr_3d = np.arange(8).reshape(2, 2, 2)
result = arr_3d[0]
print(arr_3d.ndim)
print(result.ndim)

3
2


### modifying multi-dimensional arrays:
just like 1d arrays:
```python
arr = np.array([1, 2, 3, 4, 5])
arr[3:] = 50
# arr -> [1, 2, 3, 50, 50]
```
we can do the same here too but unlike python's list where we should use a iterator, we can do it with a single number in numpy

```python
my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_list[1][:] = 0, 0, 0

# but in numpy we can do this:
my_arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my_arr[1] = 0
```

In [9]:
my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_list[1][:] = (0 for _ in range(len(my_list[1])))
print(my_list)

my_arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my_arr[1] = 0
print(my_arr)

[[1, 2, 3], [0, 0, 0], [7, 8, 9]]
[[1 2 3]
 [0 0 0]
 [7 8 9]]


### slices in higher dimensions

we can use slices with commas with no problems

In [10]:
print(arr_2d, end='\n\n')
print(arr_2d[1:2, 1], end='\n\n')
print(arr_2d[1:2, 1:2], end='\n\n')
print(arr_2d[1:, 1:], end='\n\n')

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[5]

[[5]]

[[5 6]
 [8 9]]



naturally assigning to these slice will effect the whole selection 

In [11]:
my_arr = np.arange(1, 10).reshape(3, 3)
print(my_arr)
my_arr[1:, 1:] = 0
print(my_arr)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3]
 [4 0 0]
 [7 0 0]]


## Boolean Indexing

Let's consider an example where we have some data in an array and an array of names with duplications.  

In [12]:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
data = np.random.randn(7, 4)
print(names)
data

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']


array([[ 0.89030258,  0.02117903, -1.18856433,  0.23383206],
       [ 1.14724446, -1.29656201,  0.23961757, -2.94887384],
       [-0.41078135,  0.83265186,  1.13218476, -0.38457714],
       [ 0.60818738, -0.50456145,  0.69322144,  1.10503669],
       [ 0.79312669, -0.39042135,  1.02387612, -0.63844818],
       [ 0.39060587, -1.64286146, -0.44139983, -0.80814333],
       [ 0.10996834, -1.58563587,  1.4920875 , -1.32958791]])

Supposed each name corresponds to a row in the data array and we want to select all the rows with corresponding name `Bob`,  
to do so, we can compare the array names with string `"Bob"`, the result will be a boolean array
```python
names == 'Bob'
# -> array([True, False, False, True, False, False, False])
```

In [13]:
mask = names == 'Bob'
print(mask)
data[mask]  # this will only return 1st and 4th row

[ True False False  True False False False]


array([[ 0.89030258,  0.02117903, -1.18856433,  0.23383206],
       [ 0.60818738, -0.50456145,  0.69322144,  1.10503669]])

we can combine this with other selections too (and we can also use != instead of == when we are creating the mask)

In [14]:
data[names != "Bob", :2]

array([[ 1.14724446, -1.29656201],
       [-0.41078135,  0.83265186],
       [ 0.79312669, -0.39042135],
       [ 0.39060587, -1.64286146],
       [ 0.10996834, -1.58563587]])

how ever instead of using != when creating the mask, we could use ~ when we use the mask:

In [15]:
data[~mask, :2]

array([[ 1.14724446, -1.29656201],
       [-0.41078135,  0.83265186],
       [ 0.79312669, -0.39042135],
       [ 0.39060587, -1.64286146],
       [ 0.10996834, -1.58563587]])

we can select even more names in our mask,  
but for that, we need to use bitwise operators instead of python's `and` or `or`

In [16]:
mask2 = (names == "Bob") | (names == "Will")
mask2

array([ True, False,  True,  True,  True, False, False])

- unlike other methods we discussed so far, boolean indexing do not return a `view` instead, it will return a copy
- we can also use other bitwise operators like XOR(^), not(~), and(&), etc while creating the mask

In [17]:
mask3 = data < 0
data[mask3] = 0
data

array([[0.89030258, 0.02117903, 0.        , 0.23383206],
       [1.14724446, 0.        , 0.23961757, 0.        ],
       [0.        , 0.83265186, 1.13218476, 0.        ],
       [0.60818738, 0.        , 0.69322144, 1.10503669],
       [0.79312669, 0.        , 1.02387612, 0.        ],
       [0.39060587, 0.        , 0.        , 0.        ],
       [0.10996834, 0.        , 1.4920875 , 0.        ]])

## Fancy indexing

Fancy indexing is a name adopted by NumPy to describe indexing using integer arrays.  

in this method we use an array of indices and it will return a new array from original one ordered as we describe in our array

In [18]:
arr = np.empty((8, 4), dtype=np.int8)
for i in range(8):
    arr[i] = i
arr


array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4],
       [5, 5, 5, 5],
       [6, 6, 6, 6],
       [7, 7, 7, 7]], dtype=int8)

In [19]:
arr[[0, 3, 5, 1]]

array([[0, 0, 0, 0],
       [3, 3, 3, 3],
       [5, 5, 5, 5],
       [1, 1, 1, 1]], dtype=int8)

naturally we can use negative indices too:

In [20]:
arr[[-1, 0]]

array([[7, 7, 7, 7],
       [0, 0, 0, 0]], dtype=int8)

and we can use this for other dimensions too: 

In [21]:
arr = np.arange(32).reshape(8, 4)
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [22]:
arr[:, [0, 3, 1, 2]] 

array([[ 0,  3,  1,  2],
       [ 4,  7,  5,  6],
       [ 8, 11,  9, 10],
       [12, 15, 13, 14],
       [16, 19, 17, 18],
       [20, 23, 21, 22],
       [24, 27, 25, 26],
       [28, 31, 29, 30]])

but if we try some thing like this:  
```python
arr[[1, 5, 7, 2], [0, 3, 1, 2]] 
```
we will end up with a one-dimensional array since we are pointing specific elements:

In [23]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]] 

array([ 4, 23, 29, 10])

how ever we can change the orders and keep the whole row like this:

In [24]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])