![Erudio logo](img/Erudio-logo.png)

---

![NumPy logo](img/numpylogo.svg)

# Selecting and modifying data

In the first module we saw a variety of ways to create arrays, with various shapes and dimensions.  We also had a brief introduction to the idea that a reshaped array is a *view* into the same data.

I order to perform meaningful computation on data we need to be able to do two main things:

* Select only a portion of a larger data collection
* Modify values in some systematic way

# Vectorization

NumPy is an *array library*, which in computer-programming terms does not mean only that it stores dimensions of elements of the same datatype, but also that operations are performed concurrently.†

The way that NumPy achieves concurrency is performing most operations *elementwise*.  That is, each cell or element is modified in a similar fashion, but without dependency between the different operations.

†A footnote here is to remind students that concurrency is not necessarily parallelism. However, NumPy often also does a good job of taking advantage of multiple cores for actual parallelism as well.


## Universal functions

NumPy uses what it calls *ufuncs* for functions that operate on arrays element-by-element.  Most of these functions also work on scalars directly, but with more overhead than equivalent operations from the Python `math` module where both exist.

In many cases, ufuncs use a bit of Python magic to allow operator symbols to call apropriate functions behind the scenes.

A simple plus sign can add arrays:

![Sum arrays](img/numpy-sum-arrs.png)

In [None]:
import numpy as np
arr1 = np.linspace(1., 5, 5)
arr2 = np.arange(10., 50.1, 10)
print(arr1)
print(arr2)
arr1 + arr2

In [None]:
# Other spellings of same operation
print("np.add(arr1, arr2):", np.add(arr1, arr2))
print("arr1.__add__(arr2):", arr1.__add__(arr2))

## Types of ufuncs

The comparisons and predicates return *boolean arrays*. Most of the other ufuncs return the same dtype as the original array(s), but sometimes type promotion will occur (usually an integer to a floating point number).  Many of these ufuncs are binary, but many others are unary.

**comparison:** `<`, `<=`, `==`, `!=`, `>=`, `>`

**arithmetic:** `+`, `-`, `*`, `/`, `reciprocal`, `square`

**exponential:** `exp`, `expm1`, `exp2`, `log`, `log10`, `log1p`, `log2`, `power`, `sqrt`

**trig:** `sin`, `cos`, `tan`, `acsin`, `arccos`, `atctan`, `sinh`, `cosh`, `tanh`, `acsinh`, `arccosh`, `atctanh`

**bitwise:** `&`, `|`, `~`, `^`, `left_shift`, `right_shift`

**logical operations:** `logical_and`, `logical_xor`, `logical_not`, `logical_or`

**predicates:** `isfinite`, `isinf`, `isnan`, `signbit`

**other:** `abs`, `ceil`, `floor`, `mod`, `modf`, `round`, `sinc`, `sign`, `trunc`

## Functions and methods

Many mathematical operations on arrays are defined as functions in the NumPy module.  A subset of these are also methods on NumPy arrays. For a complete list, see  http://docs.scipy.org/doc/numpy/reference/routines.math.html.

Notice that some functions/methods are elementwise, but others are reductions.

In [None]:
print("Mean as method/function:", arr1.mean(), np.mean(arr1))
print("Clip as method/function:", arr1.clip(2.5, 4.5), np.clip(arr1, 2.5, 4.5))

## Elementwise vs. matrix operations

Most operations on NumPy arrays are elementwise, but a few—especially in linear algebra—are overall tranformations.  These latter typically change the shape of the result and hence produce new arrays rather than modifying elements in place.  See: [Linear algebra (numpy.linalg)](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)

In [None]:
# Several ways to spell dot product
print("Dot product function:", np.dot(arr1, arr2))
print("Dot product method:", arr1.dot(arr2))
print("Dot product operator", arr1 @ arr2)

In [None]:
# More linear algebra operations
np.outer(arr1, arr2)

In [None]:
np.linalg.eigvals(np.outer(arr1, arr2))

## Shifting and windows


We can use negative indices to indicate positions from the end a Python list or NumPy array.  Moreover, slices are *half-open* intervals, leading to nice additive properties. E.g. `x == x[:N] + x[N:]` (for both positive and negative N).

We can use indexing from bothends of arrays to create windows that operate on nearby values.

In [None]:
arr = np.random.randint(0, 100, 10)
arr.sort()
arr

In [None]:
# Compute the difference between neighboring values.
s1 = arr[1:]
s2 = arr[:-1]
out = arr[1:] - arr[:-1]
print("s1", s1)
print("s2", s2)
print("out", out)

For this specific operation, there is a handy shortcut of `np.diff(arr, n=1)`.  For other adjacency operations, that may not work.

In [None]:
np.diff(arr, n=1)

# Inplace operations versus copying

We showed earlier that you can always explicitly require a copy of an array with `arr.copy()`.  However, most operations make a copy "behind the scenes."  Shaping and slicing do not make copies, but numeric operations do.

Sometimes copying is desirable, sometimes it is not.  For large arrays where you do not need intermediate results retained, a memory allocation to copy into is unnecessarily exensive (in time and in finite computer memory).

For example, here we do several operations that use three memory allocations.  One for `arr1`, one for `arr2`, and a third for `result`.

In [None]:
# allocate initial arrays
arr1 = np.logspace(1, 5, 5, base=np.e)
arr2 = np.arange(100, 10, -np.pi * 6)
print(arr1)
print(arr2)

In [None]:
# Ways of adding into newly allocated array
result = arr1 + arr2
print(result)
result = np.add(arr1, arr2)
print(result)

While we are unlikely to care about extra 5 element arrays, we might not want to allocate extra 100,000,000 element arrays if we do not need to.  For example, maybe we just want to update the data in `arr1` in a way that utilizes the values in `arr2`.

In [None]:
# Ways of modifying arr1 inplace
arr1 = np.logspace(1, 5, 5, base=np.e)
arr1 += arr2
print(arr1)

# Augmented assigment is elegant but many functions do not have operators
arr1 = np.logspace(1, 5, 5, base=np.e)
np.add(arr1, arr2, out=arr1)
print(arr1)

# Exercises

The exercises below can each be done with a provided Python object.  These objects have a few properties.  Simply echoing the object in a cell produces a "pretty" display that may emphasize some aspect of the data of interest.

Positive numbers are used to indicate "interesting" cells for purpose of the exercise, and negative numbers are used to indicate the "background" data.  Colors further emphasize this.

Each object has an `obj.arr` attribute containing the actual array you should work with.  Each also contains an `obj.result` attribute that contains another array that is some sort of transformation of the original array which you are trying to match.

In [None]:
from src.numpy_exercises import *
ex0

In [None]:
# The array to work with
ex0.arr

In [None]:
# A tranformation we are trying to match
ex0.result

**Hint**: To verify your answers, you may want to compare your work to the provided result.  NumPy allows comparison of arrays, but it is not just a yes or no answer. E.g.

In [None]:
ex0.arr == ex0.result

Fortunately, there is also an `np.all()` function that asks whether every Boolean in an array of comparisons is true.  There is also `np.any()` with a corresponding meaning.

In [None]:
print("Any the same:", np.any(ex0.arr == ex0.result))
print("All the same:", np.all(ex0.arr == ex0.result))

## Elementwise Exercises

In each of the next exercises, you will need to transform some or all of the elements of an array in a described fashion.

Transform each of these array byte values to contain only their "low order bits."  In other words, make the values "7-bit clean."

In [None]:
arr = ex2_11.arr.copy()
ex2_11

In [None]:
ex2_11.result

In [None]:
# Solve the same problem as the previous exercise using only
# operations of a different type than in your first solution
# (see the section "types of ufuncs" above)
arr = ex2_11.arr.copy()

In [None]:
# Solve the same problem as the previous exercise using only
# operations of a THIRD type than in your first two solutions
# (see the section "types of ufuncs" above)
arr = ex2_11.arr.copy()

---

Each column in `ex2_12` contains angles (in radians) of successive quadrants of a circle.  Transform the first two quadrants by the $sin$ operation and the last two quadrants by the $cos$ operation.

In [None]:
arr = ex2_12.arr.copy()
ex2_12

In [None]:
# Solve the same problem as the previous exercise, but transforming 
# the array in-place rather than allocating another array for the results
# HINT: rebinding a name like `arr = arr.sin()` still makes a copy 
#       that is garbage collected in a short time, but exists temporarily
arr = ex2_12.arr.copy()

---

Suppose we have a series of temperatures in a 1-D NumPy array, spanning just under a year.  You wish to calculate the mean weekly high for each of the 52 weeks.

**Note:** In general, we try to avoid looping in NumPy, but you might first try looping over the weeks.

In [None]:
%matplotlib inline
arr = ex2_13.arr.copy()
ex2_13

In [None]:
ex2_13.graph

Now try to solve the same problem  as the previous exercise, but not using any loops.

**Hint:** Reduction operations in NumPy contain an `axis` argument to control their operation over multidimensional arrays.

In [None]:
# Reduce over an axis
arr = ex2_13.arr.copy()

## Vectorization exercise

In these exercises, we look at ways of writing algorithms using vectorization.

### The Wallis formula for pi

One way of calculating $\pi$ is with the 1655 Wallis formula:

$$\pi=2\prod_{i=1}^{\infty}\frac{4i^2}{4i^2-1}$$

In plain Python, we could write this as below.  Write the same algorithm using NumPy and **not using any loops** in your implementation.

In [None]:
# Play around with number of terms
terms = 1000
cumprod = 2.0
for n in range(1, terms+1):
    foursquare = 4 * n**2
    cumprod *= foursquare / (foursquare - 1)
print(cumprod)

In [None]:
# Calculate pi using NumPy and Wallis formula
ex2_14

---

Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors