![NumPy logo](img/numpylogo.svg)

# Selecting and modifying data

In the first module we saw a variety of ways to create arrays, with various shapes and dimensions.  We also had a brief introduction to the idea that a reshaped array is a *view* into the same data.

In order to perform meaningful computation on data we need to be able to do two main things:

* Select only a portion of a larger data collection
* Modify values in some systematic way

# Slicing

NumPy provides a flexible "slicing" notation that allows you to select only part of an array.  We saw a very simple version of this notation in the first module, when we used a comma between dimensions.  But we can do much more than that by using *slicing*.

Let us contrast Python nested lists with NumPy arrays.  There are many differences, but a certain superficial similarity too.

In [None]:
import numpy as np
from pprint import pprint
lst = [[1, 2, 3, 4],
       [5, 9, 7, 8],
       [9, 0, 1, 2]]
pprint(lst, width=20)

In [None]:
# We might even make an array out of nested lists
arr = np.array(lst)
pprint(arr)

In [None]:
# Could double index into list-of-lists
print("Index twice in list:", lst[1][2])

# SOMETIMES you'll get the same result with array
print("Index twice in array:", arr[1][2])

## Indexing Dimensions

Almost always you want to use comma-separated indices into each of multiple dimensions.  This will always work, and will be faster even where equivalent to neseted indexing. The same operation will raise an exception for lists.

In [None]:
# Better
print("Index into dimensions of array:", arr[1, 2])

Showing this a bit more visually:

![Dimension indexing](img/numpy-selection.png)

## Indexing Slices

Just as with Python lists, each dimension may contain a slice into the data.  The Python syntax of `[<start>:<stop>:<step>]` applies here, including each component being optional.  As with Python lists, negative indices, and negative steps, are permitted to count from the end rather than the start.

Visualization of selecting a row:

![Row indexing](img/numpy-row.png)

Visualization of selecting a column:

![Col indexing](img/numpy-col.png)

## Indexing Regions

By extension of the above principles, we can also select a *region* of an array.  For example:

In [None]:
# A 4x5 array
arr = np.arange(20).reshape(4, 5)
print("The 4x5 array:")
print(arr)

# Select a middle section of the array
print("\nA 2x2 view of the middle:")
print(arr[1:3, 2:4])

## Assigning to Regions

In the first module, we saw examples of assigning one value into an array.  We can also assign multiple values into regions.  In some cases, this will involve a concept we will look at later, called *broadcasting*.

In [None]:
arr = np.arange(20).reshape(4, 5)
arr[1:3, 2:4] = np.random.randint(100, 200, size=4).reshape(2, 2)
arr

In [None]:
arr = np.arange(20).reshape(4, 5)
arr[1:3, 2:4] = 999   # Just a scalar?
arr

## Ellipsis and Open Ranges

Sometimes you would like to select all the index positions in some dimension(s).  Doing this is not different than with plain Python lists, just with the extra commas to separate dimensions.

In [None]:
# Create a 3-D array
arr = np.linspace(10., 47/3, 18).reshape(2, 3, 3)
print(arr)

In [None]:
# One panel, some columns, all rows
arr[1, :, 1:3]

In [None]:
# One panel, all columns, all rows
arr[0, :, :]

In [None]:
# All panels, all columns, one row
arr[:, :, 2]

A special symbol allow you to select "all available" dimensions over a range. In principle this allows you to do a similar operation without knowing the exact number of dimensions of a given array (but it is only infrequently needed or desirable).

In [None]:
# All first dimensions, one row
arr[..., 2]

In [None]:
# One from first dimension, all in subsequent dimensions
arr[1, ...]

# Exercises

The exercises below can each be done with a provided Python object.  These objects have a few properties.  Simply echoing the object in a cell produces a "pretty" display that may emphasize some aspect of the data of interest.

Positive numbers are used to indicate "interesting" cells for purpose of the exercise, and negative numbers are used to indicate the "background" data.  Colors further emphasize this.

Each object has an `obj.arr` attribute containing the actual array you should work with.  Each also contains an `obj.result` attribute that contains another array that is some sort of transformation of the original array which you are trying to match.

In [None]:
from src.numpy_exercises import *
ex0

In [None]:
# The array to work with
ex0.arr

In [None]:
# A tranformation we are trying to match
ex0.result

**Hint**: To verify your answers, you may want to compare your work to the provided result.  NumPy allows comparison of arrays, but it is not just a yes or no answer. E.g.

In [None]:
ex0.arr == ex0.result

Fortunately, there is also an `np.all()` function that asks whether every Boolean in an array of comparisons is true.  There is also `np.any()` with a corresponding meaning.

In [None]:
print("Any the same:", np.any(ex0.arr == ex0.result))
print("All the same:", np.all(ex0.arr == ex0.result))

## Selection exercises

In each of the next exercises, you wish to select the (positive) values highlighted.  For these exercises, all results must be 2-dimensional arrays.  Be careful not to change the dimensionality of your answer.

In [None]:
# Select the highlighted values
ex2_1

In [None]:
# Select the highlighted values
ex2_2

In [None]:
# Select the highlighted values
ex2_3

In [None]:
# Select the highlighted values
ex2_4

In [None]:
# Select the highlighted values
# ...make your solution to this problem and the previous problem identical!
ex2_5

## Assignment exercises

The next few exercises build on the selection ones.  You probably need to solve those first problems first.

**Note**: In order to avoid overwriting your data, it is a good idea to force a copy of the attached array, and work with that.  You can experiment this way by making fresh copies.

In [None]:
# Make column with index 4 contain the same values as the column with index 2
arr = ex2_6.arr.copy()
arr      # ... modify this somehow

In [None]:
# Make the first row contain the first 5 prime numbers
arr = ex2_7.arr.copy()
print(arr)
ex2_7

In [None]:
# Make every cell of the array contain the value 999
arr = ex2_8.arr.copy()
print(arr)
ex2_8

In [None]:
# Make the top left 2x2 section of the array contain values 100-103
arr = ex2_9.arr.copy()
print(arr)
ex2_9

In [None]:
# Solve the same problem as the previous exercise in a DIFFERENT way
# Make the top left 2x2 section of the array contain values 100-103
arr = ex2_9.arr.copy()
ex2_9

In [None]:
# Make all the positive values be 200-205 rather than 100-105
arr = ex2_10.arr.copy()
print(arr)
ex2_10

In [None]:
# EXTRA credit: solve the problem in the previous exercise in a DIFFERENT way
# Make all the positive values be 200-205 rather than 100-105
arr = ex2_10.arr.copy()
ex2_10

In [None]:
# ADVANCED: solve the problem in the previous exercise in a THIRD way
# Make all the positive values be 200-205 rather than 100-105
arr = ex2_10.arr.copy()
ex2_10