# NumPy and Matplotlib

This lecture introduces NumPy and Matplotlib, two of the most fundamental parts of the scientific python "ecosystem".
Most everything else is built on top of them.

<img src="https://github.com/numpy/numpy/blob/main/branding/logo/primary/numpylogo.png?raw=true" width="200px" />

**Numpy**: _The fundamental package for scientific computing with Python_

- Website: <https://numpy.org/>
- GitHub: <https://github.com/numpy/numpy>

<img src="https://matplotlib.org/_static/logo2_compressed.svg" width="300px" />

**Matplotlib**: _Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python._

- Website: <https://matplotlib.org/>
- GitHub: <https://github.com/matplotlib/matplotlib>
 


# Numpy features

* Mathematical functions
  - trigonometry
  - statistics
  - random numbers
  - linear algebra
* Multidimensional arrays
* Masked arrays (for missing data)

There's too much to teach it all. See Numpy documentation

https://numpy.org/doc/stable/reference/

In [None]:
import numpy as np

## NDArrays

The core class is the numpy ndarray (n-dimensional array).

Comparing NDArray vs. lists
- Arrays hold many values of the same type (e.g. `int`, `float`), while lists can contain anything
- NDarrays can have N dimensions (e.g. x,y,z coordinates), while lists and tuples have only 1.
- Numpy optimizes numerical operations on arrays. Numpy is _fast!_

In [None]:
# create an array from a list
a = np.array([9,0,2,1,0])

print(type(a))
print(a)

In [None]:
# find the datatype of each element
a.dtype

In [None]:
# find the shape
a.shape

In [None]:
# another array with a different datatype and shape
b = np.array([[1., 2, 3],
              [4, 5, 6]])

# check dtype and shape
print(b.dtype)
print(b.shape)

In [None]:
# Math with arrays
2 * a + 1

In [None]:
# Square every element of an array
a**2

In [None]:
x = np.array([[0,1],
              [1,0]])
y = np.array([[2,0],
              [1,1]])
# Elementwise multiplication
x * y

In [None]:
# Matrix multiplication
x @ y
# Equivalent: np.matmul(x,y)

In [None]:
# Apply functions to each element of an array
np.exp(x)
np.sin(y)

## Array Creation

There are lots of ways to create arrays.

In [None]:
# create some uniform arrays
c = np.zeros((9,9))
d = np.ones((3,6,3))
e = np.full((3,3), np.pi)
e = np.ones_like(c)  # same shape as c, but all ones
f = np.zeros_like(d) # same shape as d, but all zeros

`arange` works very similar to `range`, but it populates the array "eagerly" (i.e. immediately), rather than generating the values upon iteration.

In [None]:
np.arange(10)

`arange` is left inclusive, right exclusive, just like `range`, but also works with floating-point numbers.

In [None]:
np.arange(2,4,0.25)

A frequent need is to generate an array of N numbers, evenly spaced between two values. That is what `linspace` is for.

In [None]:
np.linspace(2,4,20)

In [None]:
# log spaced
np.logspace(1,2,10)

### Exercises

Use `np.array` to create an array with the following values:
```1000, 950, 800, 700, 500, 300```

Use `np.arange` to create an array with the following values:
```2, 2.5, 3, 3.5, 4, 4.5, 5```

Use `np.linspace` to create an array with the same values

Create an array with 25 elements covering the range 5 to 10. Will you use `np.arange` or `np.linspace`?

In [None]:
# Write your code here

In [None]:
x = np.arange(10)[:,np.newaxis]
y = np.ones((1,5))
x

Numpy has functions to help multi-dimensional arrays.
`meshgrid` creates 2D arrays out of a combination of 1D arrays.

In [None]:
x = np.linspace(-180, 180, 200)
y = np.linspace(-90,  90,  100)
xx, yy = np.meshgrid(x, y)
xx.shape, yy.shape

## Indexing

Basic indexing is similar to lists

In [None]:
# get some individual elements of xx
xx[0,0], xx[-1,-1], xx[3,-5]

In [None]:
# get some whole rows and columns
xx[0].shape, xx[:,-1].shape

In [None]:
# get some ranges
xx[3:10,40:70].shape

There are many advanced ways to index arrays. You can [read about them](https://numpy.org/doc/stable/reference/arrays.indexing.html) in the manual. Here is one example.

In [None]:
# use a boolean array as an index
idx = xx<0
yy[idx].shape

In [None]:
# the array got flattened
xx.ravel().shape

## Visualizing Arrays with Matplotlib

It can be hard to work with big arrays without actually seeing anything with our eyes!
We will now bring in Matplotib to start visualizating these arrays.
For now we will just skim the surface of Matplotlib.
Much more depth will be provided in the next chapter.

In [None]:
import matplotlib.pyplot as plt

For plotting a 1D array as a line, we use the `plot` command.

In [None]:
plt.plot(x)

There are many ways to visualize 2D data.
He we use `pcolormesh`.

In [None]:
plt.pcolormesh(xx)

In [None]:
plt.pcolormesh(yy)

## Array Operations ##

There are a huge number of operations available on arrays. All the familiar arithemtic operators are applied on an element-by-element basis.

### Basic Math

In [None]:
f = np.sin(xx*np.pi/180) * np.cos(0.5*yy*np.pi/180)

In [None]:
plt.pcolormesh(f)

## Manipulating array dimensions ##

Swapping the dimension order is accomplished by calling `transpose`.

In [None]:
f_transposed = f.transpose()
plt.pcolormesh(f_transposed)

We can also manually change the shape of an array...as long as the new shape has the same number of elements.

In [None]:
g = np.reshape(f, (8,9))

However, be careful with reshapeing data!
You can accidentally lose the structure of the data.

In [None]:
g = np.reshape(f, (800,25))
plt.pcolormesh(g)

We can also "tile" an array to repeat it many times.

In [None]:
f_tiled = np.tile(f,(3, 2))
plt.pcolormesh(f_tiled)

Another common need is to add an extra dimension to an array.
This can be accomplished via indexing with `np.newaxis`.

In [None]:
x.shape

In [None]:
x[np.newaxis, :].shape

In [None]:
x[np.newaxis, :, np.newaxis, np.newaxis].shape

## Broadcasting


Not all the arrays we want to work with will have the same size.
One approach would be to manually "expand" our arrays to all be the same size, e.g. using `tile`.
_Broadcasting_ is a more efficient way to multiply arrays of different sizes
Numpy has specific rules for how broadcasting works.
These can be confusing but are worth learning if you plan to work with Numpy data a lot.

The core concept of broadcasting is telling Numpy which dimensions are supposed to line up with each other.

Example 1: Array + Scalar

<img src="https://numpy.org/doc/stable/_images/broadcasting_1.png" style="background-color:white;">

Example 2: Array + Array

<img src='http://scipy-lectures.github.io/_images/numpy_broadcasting.png'
     width=720 />

[Numpy broadcasting documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html)

Dimensions are automatically aligned _starting with the rightmost (last) dimension_.
If dimensions have the same length or one of them is length 1, then the two arrays can be broadcast.

In [None]:
print(f.shape, x.shape)
g = f * x
print(g.shape)

In [None]:
plt.pcolormesh(g)

However, if the last two dimensions are _not_ the same, Numpy cannot just automatically figure it out.

In [None]:
# multiply f by y
print(f.shape, y.shape)
h = f * y

We can help numpy by adding an extra dimension to `y` at the end.
Then the length-50 dimensions will line up.

In [None]:
print(f.shape, y[:, np.newaxis].shape)
h = f * y[:, np.newaxis]
print(h.shape)

In [None]:
plt.pcolormesh(h)

## Reduction Operations

In scientific data analysis, we usually start with a lot of data and want to reduce it down in order to make plots of summary tables.
Operations that reduce the size of numpy arrays are called "reductions".
There are many different reduction operations. Here we will look at some of the most common ones.

In [None]:
# sum
g.sum()

In [None]:
# mean
g.mean()

In [None]:
# standard deviation
g.std()

A key property of numpy reductions is the ability to operate on just one axis.

In [None]:
# apply on just one axis
g_ymean = g.mean(axis=0)
g_xmean = g.mean(axis=1)

In [None]:
plt.plot(x, g_ymean)

In [None]:
plt.plot(g_xmean, y)

## Data Files

It can be useful to save numpy data into files.

In [None]:
np.save('g.npy', g)

```{warning}
Numpy `.npy` files are a convenient way to store temporary data, but they are not considered a robust archival format.
Later we will learn about NetCDF, the recommended way to store earth and environmental data.
```

In [None]:
g_loaded = np.load('g.npy')

np.testing.assert_equal(g, g_loaded)

### Exercise

Make an array containing a multiplication table using broadcasting.

1. Create an array named `x` containing the whole numbers 1-5
2. Make `x` have shape (1,5)
3. Create an array named `y` containing the whole numbers 1-4
4. Make `y` have shape (4,1)
5. Compute `z = x * y` and print it.

<details>
    <summary>Hint</summary>
    Use `np.arange` to create the arrays. Use `[np.newaxis,:]` or `reshape()` to reshape x.
</details>

<details>
    <summary>Solution</summary>

    x = np.arange(5)[np.newaxis,:]
    y = np.arange(4)[:,np.newaxis]
    x.shape,y.shape
    z = x * y
    print(z)
</details>

In [None]:
# Write your code here