# SAO/LIP Python Primer Course Lecture 5

In this notebook, you will learn about:
- Universal Functions
- Manipulating Arrays
- Array Operations
- `numpy` I/O

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/acorreia61201/SAOPythonPrimer/blob/main/lectures/Lecture5.ipynb)

In the last lecture, we went over the basic beginnings of using `numpy` as part of a discussion on data structures. We'll continue talking about the `numpy` library and some of the powerful things you can do with it.

## Universal Functions

Recall that the `math` library has several functions for evaluating common functions:

In [67]:
import math as m

m.sin(m.pi/2)

1.0

If we want to apply an operation to a list of values, we can't just input the list into the function; we have to iterate over it using a loop:

In [68]:
import numpy as np

test = np.linspace(-m.pi, m.pi, 11)
results = np.empty(11)

for i in range(test.shape[0]):
    results[i] = m.sin(test[i])
    
results

array([-1.22464680e-16, -5.87785252e-01, -9.51056516e-01, -9.51056516e-01,
       -5.87785252e-01,  0.00000000e+00,  5.87785252e-01,  9.51056516e-01,
        9.51056516e-01,  5.87785252e-01,  1.22464680e-16])

`numpy` provides a more memory- and time-efficient method of applying an operation to all elements in an array. We can instead use *universal functions* supplied by `numpy`. When applying them to one value, these functions are indistinguishable from their `math` counterparts:

In [69]:
print(m.sin(m.pi))
print(np.sin(m.pi))

1.2246467991473532e-16
1.2246467991473532e-16


However, look what happens when we input `test` in `numpy.sin()`:

In [70]:
np.sin(test)

array([-1.22464680e-16, -5.87785252e-01, -9.51056516e-01, -9.51056516e-01,
       -5.87785252e-01,  0.00000000e+00,  5.87785252e-01,  9.51056516e-01,
        9.51056516e-01,  5.87785252e-01,  1.22464680e-16])

It automatically takes the sine of all the values in `test`, with no extra effort on our part. `numpy` has a universal function counterpart for nearly every `math` function that works in the same way:

In [71]:
np.ceil(test)

array([-3., -2., -1., -1., -0.,  0.,  1.,  2.,  2.,  3.,  4.])

In [72]:
np.rad2deg(test) # comparable to math.degrees()

array([-180., -144., -108.,  -72.,  -36.,    0.,   36.,   72.,  108.,
        144.,  180.])

In [73]:
np.exp(test)

array([ 0.04321392,  0.08100259,  0.1518358 ,  0.28460954,  0.53348809,
        1.        ,  1.87445609,  3.51358562,  6.58606196, 12.34528394,
       23.14069263])

There are even some equivalents to the fundamental constants found in `math` of the type `numpy.float64`:

In [74]:
np.pi

3.141592653589793

In [75]:
np.sin(np.pi)

1.2246467991473532e-16

In [76]:
np.e

2.718281828459045

In [77]:
np.log(np.e)

1.0

Recall my warning about using import statements of the form `from library import function`. Here lies the reason; let's say I called `exp()` directly from `numpy` and `math`:

In [78]:
from numpy import exp
from math import exp

Now, if I try to input an array into `exp()`, there's no way to tell which `exp()` I'm using until I get an error:

In [79]:
exp(test)

TypeError: only size-1 arrays can be converted to Python scalars

There are some instances when it's convenient to call functions directly from packages. In longer codes, however, it can get hard to track down where you may be importing functions. Use statements like this sparingly when writing your own codes.

## Manipulating `numpy` Arrays

Now that we know how to generate arrays, let's look into how we can modify them. Let's start with two arrays:

In [80]:
x = np.array([1, 4, 6, 3, 5, 2])
y = np.array([[7, 8]]) # The double bracket makes this a 2D array

First, let's order the elements of `x` by using the `numpy.sort()` function:

In [81]:
np.sort(x)

array([1, 2, 3, 4, 5, 6])

Let's print `x` to check:

In [82]:
x

array([1, 4, 6, 3, 5, 2])

What happened? It turns out `numpy` functions that manipulate arrays don't do in-place modifications. If we want to overwrite the array with the sorted counterpart, we have to redefine `x` with the function above:

In [83]:
x = np.sort(x)
x

array([1, 2, 3, 4, 5, 6])

We can also change the shape of an existing array using the `reshape()` method. Its input is a comma-separated list representing the shape you want the output to be. `numpy` will automatically distribute the elements to fit the new shape:

In [84]:
x.reshape(2, 3)

array([[1, 2, 3],
       [4, 5, 6]])

In [85]:
x.reshape(6, 1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [86]:
x.reshape(3, 1, 2)

array([[[1, 2]],

       [[3, 4]],

       [[5, 6]]])

The shape input in `reshape` must contain the same number of elements as the original array. That is, `x.size` and `x.reshape().size` must be equal:

In [87]:
x.reshape(4, 2)

ValueError: cannot reshape array of size 6 into shape (4,2)

Let's use `x.reshape(3, 2)` so `x` and `y` will have the same number of columns:

In [88]:
x = x.reshape(3,2)
x

array([[1, 2],
       [3, 4],
       [5, 6]])

There are some unique manipulations with their own functions. The method `flatten()` reshapes the array to one dimension:

In [89]:
x.flatten()

array([1, 2, 3, 4, 5, 6])

The method `transpose()` takes the *transpose* of an array. In two dimensions, this equates to $x[a, b] \rightarrow x[b, a]$ for all possible indices $a, b$.

In [90]:
x.transpose()

array([[1, 3, 5],
       [2, 4, 6]])

We can also use the attribute `T` to compactify the above statement:

In [91]:
x.T

array([[1, 3, 5],
       [2, 4, 6]])

We can combine two existing arrays using `numpy.concatenate()`. The only required input is a tuple containing the arrays you wish to combine. There's also a parameter `axis` that controls which axis to combine the arrays along. It defaults to 0, the highest-order axis. For example, using `axis = 0` to combine `x` and `y` will yield the following:

In [57]:
np.concatenate((x, y))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

There are some constraints to which arrays we can concatenate. All of the input arrays must have the same `ndim` value. Notice that when I defined `y`, I used a double bracket to denote it as a 2D array, even though it looks like a 1D array since it has one row. Let's see what happens if I redefine `y` as a 1D array:

In [58]:
y_eg = np.array([7, 8])
np.concatenate((x, y_eg))

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Fortunately, if you didn't initially define all your arrays to have the same number of dimensions, you can just use `reshape()`:

In [61]:
y_eg = y_eg.reshape(1, 2)
y_eg

array([[7, 8]])

In [62]:
np.concatenate((x, y_eg))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

What will happen if I try to change the `axis` argument?

In [63]:
np.concatenate((x, y), axis=1)

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 3 and the array at index 1 has size 1

This is another constraint: the sizes of the arrays along all axes except the axis of concatenation must be equivalent. That is, if we want to concatenate `x` and `y` along axis 1, then both arrays must have the same shape in axis 0. In this case, as the error above says, `x` has an axis 0 size of 3 and `y` has an axis 1 size of 1. If we want to add a new column to x as-is, we'd have to define a new array with an axis 0 with the correct size:

In [65]:
z = np.array([9, 10, 11])
z = z.reshape(3, 1)
z

array([[ 9],
       [10],
       [11]])

In [69]:
comb = np.concatenate((x, z), axis=1)
comb

array([[ 1,  2,  9],
       [ 3,  4, 10],
       [ 5,  6, 11]])

If we want to remove elements from an array, we can use `numpy.delete()`. The syntax is as follows:

In [70]:
np.delete(comb, 1, axis=0) # delete the second row

array([[ 1,  2,  9],
       [ 5,  6, 11]])

In [71]:
np.delete(comb, 2, axis=1) # delete the last column

array([[1, 2],
       [3, 4],
       [5, 6]])

The first argument is the array you wish to modify. The second argument is the index or indices you want to remove. The keyword argument `axis` again specifies on which axis you wish to apply the deletion. 

Note that slicing also works on the second argument, albeit with a slightly different syntax. We need to use the `slice()` built-in function, which has the same input syntax as `range()`:

In [74]:
np.delete(comb, slice(1, 3), axis=1) # delete the last two columns

array([[1],
       [3],
       [5]])

In [75]:
np.delete(comb, slice(0, 3, 2), axis=0) # delete the first and the last row

array([[ 3,  4, 10]])

## Array Operations

One powerful use of arrays is to do linear algebra in Python. You can apply many common matrix operations to `numpy` arrays. The simplest of these is applying element-wise, which we can do with the basic arithmetic operators `+, -, *, /`.

In [17]:
vec_1 = np.array([1, 2, 3])
vec_2 = np.copy(vec_1) # numpy.copy() is used to copy arrays similar to built-in list()
ones = np.ones(3)

In [18]:
vec_1 + ones

array([2., 3., 4.])

In [19]:
vec_1 - ones

array([0., 1., 2.])

In [20]:
vec_1 * vec_2

array([1, 4, 9])

In [21]:
ones / vec_1

array([1.        , 0.5       , 0.33333333])

`numpy` also supports *broadcasting* operations with scalars. That is, I can operate on an array with a scalar, and the operation will be applied to all elements in the array.

In [92]:
vec_1 + 2.4

array([3.4, 4.4, 5.4])

In [93]:
vec_1 * np.pi

array([3.14159265, 6.28318531, 9.42477796])

Element-wise operations and broadcasting are useful in their own right, but as you probably know, there are many matrix operations that don't behave like this. `numpy` has built-ins for the most common vector and matrix operations. For example, we can take the *dot product* of two arrays, recalling that for two vectors $u$ and $v$ with $N$ elements each:

\begin{equation}
u \cdot v = \sum_{i=0}^N u_iv_i
\end{equation}

We can implement this with the function `numpy.dot()`:

In [22]:
np.dot(vec_1, vec_2)

14

We can also take a *cross product* of two arrays using `numpy.cross()`:

In [24]:
np.cross(vec_1, ones)

array([-1.,  2., -1.])

Notice we've only dealt with 1D arrays, which stand in for vectors. We can do full matrix multiplication as well. If we define a matrix (i.e. a 2D array), we can use the above functions just as well as with vectors. `numpy.dot()` will return the result of regular matrix multiplication:

In [44]:
mat1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mat2 = np.array([[7, 8, 9], [4, 5, 6], [1, 2, 3]])

In [45]:
np.dot(mat1, mat2)

array([[ 18,  24,  30],
       [ 54,  69,  84],
       [ 90, 114, 138]])

This is fine to do matrix multiplication, but generally it's better to use the `numpy.matmul()` function for arrays with greater than 1 dimension:

In [48]:
np.matmul(mat1, mat2)

array([[ 18,  24,  30],
       [ 54,  69,  84],
       [ 90, 114, 138]])

`numpy.cross()` has a special behavior. It treats each 2D array as a "list" of 1D arrays and calculates the cross product across each row of the matrices:

In [46]:
np.cross(mat1, mat2)

array([[ -6,  12,  -6],
       [  0,   0,   0],
       [  6, -12,   6]])

We can use keyword arguments to control across which axis the multiplication is done:

In [47]:
np.cross(mat1, mat2, axisa=0, axisb=0)

array([[-24,  48, -24],
       [-30,  60, -30],
       [-36,  72, -36]])

In addition to these operations, `numpy` has a multitude of *aggregation functions*, which take in an array and output a singular value. Two examples of this are `numpy.max()` and `numpy.min()`, which respectively return the maximum or minimum value of an array:

In [49]:
np.max(mat1)

9

In [51]:
np.min(mat2)

1

All aggregation functions have optional keyword arguments that allow aggregation over one axis:

In [52]:
np.max(mat1, axis=0)

array([7, 8, 9])

In [54]:
np.min(mat1, axis=1)

array([1, 4, 7])

Here are some examples of other aggregation functions:

In [55]:
np.sum(mat1) # sum up all elements

45

In [57]:
np.prod(mat1) # multiply all elements together

362880

In [60]:
np.prod(mat1, axis=0) # to check this, let's only multiply the columns

array([ 28,  80, 162])

In [61]:
np.mean(mat1) # take the arithmetic mean of all elements

5.0

In [62]:
np.std(mat1) # calculate the standard deviation of all elements

2.581988897471611

If you'd prefer, aggregation functions are also available as array methods:

In [63]:
mat1.max() # identical to np.max(mat1)

9

## I/O with `numpy`

One important aspect of programming is *I/O*, short for *input/output*. As the name suggests, it deals with reading and writing files external to your program for saving results long-term. This is more common when dealing with lengthy codes that can produce several kilobytes or megabytes of data. I've dedicated an entire lecture to I/O in Python using base functions as well as another library `pandas`, but to finish our discussion of `numpy` (at least, the most essential bits) I'll introduce methods for saving and reading arrays.

Let's generate an array below:

In [94]:
a = np.array([1, 2, 3, 4, 5, 6])

If I want to save this array to an external file, I can use the function `numpy.save()`, with inputs `(filename, array)`.

In [95]:
np.save('example', a)

This command created a new file, `example.npy`, in your current working directory (i.e. the directory from which you're currently running this notebook). If we want to load this array to a different variable, I can simply use `numpy.load()`, with the only required argument being the name of the file we're loading from:

In [97]:
b = np.load('example.npy')
b

array([1, 2, 3, 4, 5, 6])

That's the basics of it; as with most `numpy` functions there are some additional keyword arguments that give you more control over how they work. There are two additional functions `savetxt()` and `loadtxt()` that give you control over the file extensions for the external files (e.g. if we wanted to generate `txt` or `csv` files instead). For now, I'll leave the discussion on file types for when we discuss general I/O.