# 3) Accessing subsets of arrays

In [None]:
import numpy as np

## NumPy documentation

**Note: Numpy has a very good and extensive documentation, which you can find at https://numpy.org/doc/stable/. If you need any further details about numpy arrays, you can always refer to it.**

## Slicing arrays

Note that Numpy indexing (which includes slicing) is very powerful and that we can cover only a tiny fraction of this topic here. To learn more you can for example consult the [Numpy reference](https://numpy.org/devdocs/user/basics.indexing.html).

### One-dimensional arrays

Let's create a simple 1D array first:

In [None]:
my_array = np.random.normal(size=10)
my_array

Remember that when we want to access a single element, we use the index of that element in square brackets:

In [None]:
my_array[1]

Remember that **we start counting from 0** in Python, which is why with index 1 we access the *second* element.

For so-called "slicing", we extend the notation and extract a range of elements by using the ```from_index:to_index (excluded)``` notation. Here ```excluded``` means that the **last index** specified is **not included**. For example if we want to recover elements with indices from 1 to 3 we write:

In [None]:
my_array[1:4]

We can also set values in the array in the same maner. For example let's set the above elements to 10:

In [None]:
my_array[1:4] = 10

In [None]:
my_array

Note that you can use default values to simplify the notation. For example if you want to extract all elements from the 4th one **to the last one**, you don't have to specify the last index:

In [None]:
my_array[4:]

### Higher dimensions

We have seen before that we can create arrays with more than one dimension (think e.g. of the pixels of an image). For example:

In [None]:
array2D = np.random.normal(size=(3,5))
array2D

The indexing system works in the same way here. We just have to specify now for each dimension which rows/columns we want to extract with ```my_array[start_row:end_row, start_column:end_column]```:

In [None]:
array2D[1:3, 0:2]

Here again, we can simplify the notation. If we want to select a few columns but **want to keep all rows**, we can leave away the start (defaults to 0) *and* the end (defaults to the max) and simply put ```:```, which means "all" for that dimension:

In [None]:
array2D[:, 1:3]  # Take all rows, but only columns 1 and 2

On the contrary, we can reduce the result to a single row by specifying an index (instead of a range) for the first dimension - combining simple indexing and slicing:

In [None]:
array2D[0, 1:3]

Sidenote: With indexing, we remove the respective dimension, so the result is a 1D array. If we want to keep a 2D array, we can use a range encompassing the first dimension only:

In [None]:
array2D[0:1, 1:3]

### Slicing with images

Let's apply the slicing notation to images, which, as we've seen, can be treated as 2D (or higher-dimensional) arrays.

First, let's load an image as a numpy array and display it:

In [None]:
import skimage.io
from matplotlib import pyplot as plt

img_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Apollo_dan_Pithon.jpg/500px-Apollo_dan_Pithon.jpg"
image_array = skimage.io.imread(img_path, as_gray=True)  # Use `as_gray=True` to convert it to grayscale (no channel dimension)

plt.imshow(image_array, cmap='gray')

Let's now extract and display the columns 300 to 500 of the image:

In [None]:
mid_cols = image_array[:, 300:500]  # Extract columns 300 to 500 (but keep all rows)
plt.imshow(mid_cols, cmap='gray')

We can also set values in the array in the same maner. For example, let us select the top 50 rows and set them to black (i.e. set all pixel values to 0):

In [None]:
image_array[:50, :] = 0  # Set the top 50 rows to black
plt.imshow(image_array, cmap='gray')

Now, let's combine the two ideas to do the same on the bottom: Let's first save the bottom part of the image in a new variable and then set it to black:

In [None]:
bot50 = image_array[-50:, :] # Remember that negative indices count from the end
bot50[:] = 0  # Set all pixels of the slice to black
# NOTE: we need to use `[:]` because `bot50 = 0` would just overwrite the variable with the value 0

Let's check the original image:

In [None]:
plt.imshow(image_array, cmap='gray')

As we can see, the bottom **pixels of the original image also changed** to black. This is because we did **not create an independent copy** of the bottom part, but just created a new variable that points to the same data in memory (a "view"). It is still linked to the original one.

Depending on the application, this behavior can be useful or not. If we want to create a new independent copy, we can use the ```copy()``` method:

In [None]:
left50 = image_array[:, :50].copy()  # Extract the first 50 columns AS A COPY
left50[:] = 0  # Set the slice to black

Let's verify that the pixels in the left50 slice did indeed change, but that this did not affect the original image:

In [None]:
plt.imshow(left50, cmap='gray')
plt.show()
plt.imshow(image_array, cmap='gray')
plt.show()

### Skipping and reversing elements

The general slicing notation ```my_array[start:stop:step]``` contains a third parameter ```step```. This allows us to skip elements. For example, we can extract every second column of the image array (while keeping all rows). What result do you expect?

In [None]:
img_skipped = image_array[:, ::2]  # Skip every second column
plt.imshow(img_skipped, cmap='gray')

The step parameter can also be negative, which allows us to reverse the order of the elements. For example, we can reverse the order of the columns in the image:

In [None]:
img_reversed = image_array[:, ::-1]  # Reverse the order of the columns
plt.imshow(img_reversed, cmap='gray')

## Boolean indexing (masking)

Instead of using numerical indices to extract values from the array, we can also select them by some criteria. Let's create a new random array:

In [None]:
my_array2 = np.random.normal(size=10)
my_array2

How to proceed now if we for example only want to recover the elements that are larger than 0 ?

Let's try to see what happens when we just write it down as we would in regular mathemetics with a single number:

In [None]:
above_zero = my_array2 > 0
above_zero

We see that the **output is again an array**, but instead of being filled with numbers, it contains only ```False``` and ```True```. Those values also exist in plain Python and are called booleans. For example:

In [None]:
a = 3
a > 10

What we've created is a so-called **boolean array**. It is an array of the same shape as the original one, but with ```True``` where the condition is met and ```False``` where it is not.

We can now use this **boolean array** ```above_zero``` as a so-called **"mask" to extract values** from any array of the same size. A natural candidate is the original array itself: if we superpose ```above_zero``` to ```my_array2```, we only select those values in ```my_array2``` which are ```True``` in ```above_zero```. We do this by passing the entire boolean array to square brackets (instead of an index ```my_array[i]```):

In [None]:
my_array2[above_zero]

Naturally the output array is typically smaller than the original one as it only contains the values that in fact meet the condition (here "larger than 0").

<img src="https://github.com/guiwitz/ISDAwPython_day2/raw/master/images/logical_indexing.jpeg" alt="masking" width="500">

### Masking in images

Masking is also very useful for images. Let's take the image from above (with the black top part) and create a boolean mask that selects only the pixels that are very dark.

*Note, that this image was imported as a float data type with values between 0.0 and 1.0, so that dark values could be defined to have a value below 0.1.*

Then we use this mask to extract the black pixels from the image and set them to white instead (i.e. to 1.0):

In [None]:
black = image_array < 0.1  # Create a boolean mask for black pixels
image_array[black] = 1  # Set black pixels to white
plt.imshow(image_array, cmap='gray')

We can see now that the top/bottom parts - and some other pixels - have been set to white.

# Exercises

1. Create a one-dimensional numpy array of length 10 with random integers between -10 and 10.

2. Extract the the last three elements of the array using slicing.

3. Create a boolean array telling which values in the array from (1.) are smaller than 0.

4. Recover only those values in a new array via indexing.

In [None]:

### YOUR CODE HERE


5. Load he following image as a numpy array (using `skimage.io.imread()`), check its shape and display it: https://upload.wikimedia.org/wikipedia/en/0/04/Snake_trs-80.jpg

In [None]:

### YOUR CODE HERE


6. Take the right half of the image and save it in a new variable.

7. Squeeze the extracted slice along the vertical axis by skipping every second row, and reverse it at the same time (also vertically). Show the result.

In [None]:

### YOUR CODE HERE


8. Find out the image's data type and the range of pixel values. 

9. Set all pixels with a value larger than a certain threshold to white. Find out yourself what this threshold should be and what "white" means in this context.

In [None]:

### YOUR CODE HERE
