# Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select
a subset of your data or individual elements. So far we have studied the various types of arrays, and how to make them. Now let us look at how we can work with these arrays. 

Once you have created an array, you most definitely want to fetch data from it. There are various ways in which one can fetch data from an array. This chapter will cover some of the most useful operations you can perform on arrays.

Let us start with a 1D array as shown in the below example.

![](./images/1.JPG)

In [1]:
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8])

The first element of the array is marked by index `0` and the last element of the array is marked by index `7`. This is shown in the above representation for reference. 

We can directly query a single cell of the array by specifying the index we want to select. Let us try a few examples.

In [2]:
arr[7]

8

In [3]:
arr[1]

2

Now what if we want to query a range of the array. Basically fetch out a slice from the primary array. We can do so by specifying an index range in the form `start:stop`. The `start` refers to the start index (inclusive) and the `stop` refers to the last index upto which we want to fetch (exclusive). This means element at `start` will be included in the slice, but element at `stop` will not be included in the slice.

![](./images/2.JPG)

In [4]:
arr[2:5]

array([3, 4, 5])

As can be seen in this example, we fetched all values from index 2 to index 5. Obvious the value at index 5 was not included in the array, as the `stop` value is exlusive, but the `start` value is inclusive. 

## Values fetched are by reference

There is one very important concept we need to understand here. The slice fetched is not actually a new array, but references the primary array. Which means any changes in the primary array, will reflect in the slice and vice-versa.

![](./images/3.JPG)

In [5]:
arr_slice = arr[2:5]

In [6]:
arr_slice[:] = 25

In [7]:
arr

array([ 1,  2, 25, 25, 25,  6,  7,  8])

We can see that the 3 corresponding values within the array also got set to 25. It is important to note that we made the change to `arr_slice`, but the change also showed up inside `arr`. 

# Copying Values

Let us say we did not want to pick up the slice by reference, but actually create a copy of the slice. We can do so by using the `copy()` function. Let us take a look. 

In [8]:
arr = np.array([1,2,3,4,5,6,7,8])
arr_slice = arr[2:5].copy()
arr_slice[:] = 25

In [9]:
print(arr)

[1 2 3 4 5 6 7 8]


In [10]:
print(arr_slice)

[25 25 25]


As we can see in the above example, the elements within `arr` did not change. The `arr_slice` however got the updated value of 25.

Keep in mind that the `copy()` function is literally creating a new array from the slice. This operation could take a while to perform if the array is very large. It is always recommended to not use the copy and just get a view of the original array. 

# Broadcasting

``` Python
arr[2:5] = 25
```

This line is an example of broadcasting. Over here we can say that the value 25 is progated to the entire selection, or in other words we can say that the value 25 is broadcasted to the entire selection. 

In data science, Broadcasting is very commonly used in several data operations. Python makes it really easy for us to change the values of several elements, in a single command. 

# 2D Array

Now that we say how a 1D array looks, let us perform some quick operations on 2D arrays.

![](./images/4.2.jpg)

In [11]:
arr2d = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

![](./images/4.1.jpg)

## Fetching a single element

Let us use the row and column indexes of the array, to fetch a single element. It is important to the remember here, that this 2D array is actually made of 3 independent arrays that have the cell values, and another outer array that holds the 3 rows. Keep this in mind when you are fetching the indexes. 


<img src="./images/5.1.JPG" alt="drawing" width="800"/>

In [12]:
arr2d[0][3]

4

We have successfully fetched row = 0 and column = 3. The cell value of `4` gets returned. 

Let us try to fetch an entire row.

In [13]:
arr2d[1]

array([5, 6, 7, 8])

We can see that we fetched the slice of an entire row. This is row #2 or rather the row at index 1.

# 3D Array

Let us take a look at a 3D array, and some of the operations we can perform on the same. 

In [14]:
arr3d = np.array([[[1, 2, 3],
                   [4, 5, 6]], 
                  
                  [[7, 8, 9], 
                   [10, 11, 12]]])

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])


<img src="./images/6.JPG" alt="drawing" width="600"/>

The 3D array shown above, is actually 2 arrays, each of shape `(2,3)`.

In [15]:
arr3d.shape

(2, 2, 3)

## Extracting a 2D Slice

Let us try to extra one of the 2D arrays as a slice of the 3D arry

<img src="./images/7.JPG" alt="drawing" width="600"/>

In [16]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

We can see that `arr3d[0]` got us the row at index 0. And the row at index 0 is actually a 2D array, so `arr3d[0]` actually represents a 2D array.

## Copy / Broadcasting

In [17]:
vals = arr3d[0].copy() # Saving a copy of the first 2D array
arr3d[0] = 31 # Replacing the array elements of the first 2D array

arr3d

array([[[31, 31, 31],
        [31, 31, 31]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [18]:
arr3d[0] = vals  # Replacing the elements of the first 2D array with the copy we made earlier

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

By default we always get a view of the same primary array, unless we use the `copy()` function to create a new array. The broadcasting operation can be performed on the entire 2D array slice, wherein all elements in the 2D array, that is located within the 3D array, got set to value `31`.

# Advanced Operations

<img src="./images/8.JPG" alt="drawing" width="600"/>

In [19]:
arr3d = np.array([[[1, 2, 3],
                   [4, 5, 6]], 
                  
                  [[7, 8, 9], 
                   [10, 11, 12]]])
arr3d[0][0]

array([1, 2, 3])

The above example shows how to fetch the first row of the first 2D array that is present within the 3D array. Once we extract `arr3d[0]`, anything we do after that operates the same way as a 2D array. We should remember this for ease of understanding. 

## Indexing with Slices

Let us start with a 2D array. Here we will try and use some advanced slicing operations.

In [20]:
arr2d = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Now let us pick out the first 2 rows from the 2D array.

In [21]:
arr2d[:2]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

You can read this expression as

> select the first 2 rows of arr2d

We must become fully comfortable with this syntax and how it operates. We use such slicing operations extensively in machine learning problems. 

We can also pass the slice ranges for each dimension. 

In [22]:
arr2d[:2, 1:]

array([[2, 3, 4],
       [6, 7, 8]])

In [23]:
arr2d[:2, :1]

array([[1],
       [5]])

First of all, we did an index and slice extraction across both the dimensions. But what is the difference between `:1` and `1:`? 

It is simple actually. 

**:1**

The `:` before the index means get everything before the specified index. Data at the specified index *IS NOT* included. This means `:0` would actually be an invalid operation. 

**1:**

The `:` after the index means get everythign after the specified index; upto the end of the array. At the specified index *IS* included. 

## Few more examples

In [24]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [25]:
arr2d[2, :] # same as arr2d[2]

array([7, 8, 9])

In [26]:
arr2d[2:, :] # same as above

array([[7, 8, 9]])

In [27]:
arr2d[1, :2]

array([4, 5])

In [28]:
arr2d[1:2, :2] # same as above

array([[4, 5]])

# Boolean Indexing



In [29]:
names = np.array(['Tom', 'Amy', 'Will', 'Tom', 'Will', 'Will', 'Amy'])
data = np.array([[ 1,  2 , 3,  4],
       [ 5, 6, 7, 8],
       [ 9, 10, 11, 12],
       [ 13, 14, 15, 16],
       [ 17, 18, 19, 20],
       [ 21, 22, 23, 24],
       [ 25, 26, 27, 28]])

Lets us take 2 arrays. The `names` array represents the names of persons, where as the `data` array corresponds to some activity done by the persons. 

We have total 7 names inside the `names` array, and we have a row of data per person, thereby 7 rows of data inside the `data` array. 

In [30]:
names

array(['Tom', 'Amy', 'Will', 'Tom', 'Will', 'Will', 'Amy'], dtype='<U4')

In [31]:
data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24],
       [25, 26, 27, 28]])

Now let us say we want to fetch all the rows of data belonging to people with the name `Amy`. How could we quickly and efficiently do this? 

The answer is to create an array of booleans, by running a comparision operation on the primary names array. 

In [32]:
names == 'Amy'

array([False,  True, False, False, False, False,  True])

The above code produces an array of booleans. The value is `True` when the condition is satisfied and `False` othewise. 

In essense, this is simply an array of booleans. However, NumPy allows a concept of called boolean indexing. This is where we can pass an array of indexes with booleans in it. For the index that has `True` the element will returned, while for `False` the element will not be returned. 

Let us try querying the `data` array with a boolean index. 

In [33]:
boolArray = names == 'Amy'
data[boolArray]

array([[ 5,  6,  7,  8],
       [25, 26, 27, 28]])

Well, as we can see above, we got 2 rows in response. These rows correspond to indexes in the `names` array that had the name `Amy`.

We can actually perform the same operation without using the extra variable `boolArray`. Let us see how. 

In [34]:
data[names == 'Amy']

array([[ 5,  6,  7,  8],
       [25, 26, 27, 28]])

As you can see, the result in the same. The convention used above is the convetion that is commonly followed.

In fact, using the boolean index is same as querying directly by specifying the index.

In [35]:
data[[1,6]]

array([[ 5,  6,  7,  8],
       [25, 26, 27, 28]])

Which is also same as:

In [36]:
data[[1,6], :]

array([[ 5,  6,  7,  8],
       [25, 26, 27, 28]])

What we just saw above is called ***Fancy Indexing***. More on this in the next section. 

### Rule
> The boolean array must be of the same length as the array axis it’s indexing.

Let us try some more examples. Say get only columns from index 2 and further where name == Amy

In [37]:
data[names == 'Amy', 2:]


array([[ 7,  8],
       [27, 28]])

We can also do a not equals, to get all rows corresonding to people who are not named `Amy`

In [38]:
data[names != 'Amy']

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

The operation also supports combining multiple conditions

In [39]:
data[(names == 'Amy') | (names == 'Will')]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [17, 18, 19, 20],
       [21, 22, 23, 24],
       [25, 26, 27, 28]])

## Setting values conditionally

We can also use boolean indexs when setting values to elements in an array. All elements qualified with the boolean condition will get the value set. 

Let us try to set all even numbers in our data array to `0`. We can do this in a single line of code.

In [40]:
data[data % 2 == 0] = 0

In [41]:
data

array([[ 1,  0,  3,  0],
       [ 5,  0,  7,  0],
       [ 9,  0, 11,  0],
       [13,  0, 15,  0],
       [17,  0, 19,  0],
       [21,  0, 23,  0],
       [25,  0, 27,  0]])

# Fancy Indexing

**Fancy indexing** simply means passing an array of indices to access multiple array elements at once. If we know that we want to pick up elements at a specific index, then we can simply specify that index rather than passing a Boolean array with `True` / `False` values. 

In [42]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Suppose we want to access three different elements. We could do it like this:

In [43]:
[arr[3], arr[5], arr[7]]

[4, 6, 8]

Alternatively, we can pass a single list or array of indices to obtain the same result:

In [44]:
ind = [3, 5, 7]
arr[ind]

array([4, 6, 8])

When using fancy indexing, the shape of the result reflects the shape of the index arrays rather than the shape of the array being indexed:

In [45]:
ind = np.array([[3, 7],
                [4, 5]])
arr[ind]

array([[4, 8],
       [5, 6]])

## Negative indices

We can use a negative index to count the index from the bottom of the array. A value of -1 means the last index of the array. -2 means the second last index, and so on. 

In [46]:
arr[[-1, -2, -3]]


array([10,  9,  8])