TIP: Store a backup before running.

# Numpy 

## Why NumPy? <img src="https://github.com/numpy/numpy/blob/main/branding/logo/logomark/numpylogoicon.png?raw=true" width=50 align="right"/>

<img src="https://github.com/Hem-W/python-climate-visuals/blob/HMW-tutor/assets/images/tutor/numpy_why.png?raw=true" width="200" align="center"/>

In [5]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
import numpy as np
arr = np.array(list_of_lists)
arr[:2, 1]

array([2, 5])

In [None]:
list_of_lists[1][:2]
import numpy as np
arr = np.array(list_of_lists)
arr[:2, 1]

To use NumPy, we need to import the `numpy` package at first:

In [6]:
import numpy as np
print(np.__version__)

1.24.3


## Array and its Creation

A NumPy array (a.k.a. [`ndarray`](https://numpy.org/doc/stable/reference/arrays.ndarray.html)) is the core of this package. 

`ndarray` is a grid of values, all of **the same type**. 

We can create a NumPy array by passing a python list to it using `np.array()`. 

In [9]:
a = np.array([1, 2, 3])  # Create a rank 1 array from a list
a.size

3

In [None]:
print(a.shape, a[0])

<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_ndarray.png" width="900" align="center"/>

Using a list of lists with the same size, we could create 2D, 3D, or even higher dimensional arrays.

In [11]:
b = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])  # Create a rank 2 array

c = np.array([[[111, 112, 113, 114], [121, 122, 123, 124]],
              [[211, 212, 213, 214], [221, 222, 233, 234]],
              [[311, 312, 313, 314], [321, 322, 323, 324]]])
                                            # Create a rank 3 array
c.shape

(3, 2, 4)

In [None]:
print(b, b.shape, b[1, 2])
print(c, c.shape, c[0, 1, 2])

Numpy also provides many useful [methods](https://numpy.org/doc/stable/reference/routines.array-creation.html) to create arrays for specific purposes.

In [15]:
np.random.random((3, 2))

array([[0.64568187, 0.3860173 ],
       [0.76419866, 0.75369836],
       [0.71342207, 0.94886103]])

In [None]:
d = np.arange(5, 50, 10)  # Create an array starting at 5, ending at 50, with a step of 10
d = np.zeros((2, 2))      # Create an array of all zeros with shape (2, 2)
d = np.ones((1, 2))       # Create an array of all ones with shape (1, 2)
d = np.random.random((3, 1))  # Create an array of random values with shape (3, 1)
# Try printing them
print(d)

## Array Indexing

There are several ways to pull out a section of arrays:
+ **slicing**,
+ **integer array indexing**,
+ **Boolean array indexing**, and
+ many other fancy ways.

+ ***Slicing***

    We could specify an index slice in the form `start:end` or `start:end:step` for each dimension of the array to access subarrays, quite similar to slicing python list.

In [17]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a[:, ::2]

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])


<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_slice.png" width="800" align="center"/>


In [None]:
print(a[:2, 1:3])  # Slice 1st to 2nd rows and 2nd to 3rd columns
print(a[:, ::2])   # Slice all odd columns
print(a[:2, ])

+ ***Integer array indexing***

    Integer indexing allows you to index arbitrary elements in the array by separately assign the indexing for each dimension.

In [18]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
row = [0, 1, 2]  # Explicitly express row indices
col = [0, 1, 0]  # and col indices
print(a[row, col])

[1 5 7]


In [20]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
a_1row = a[:, 1]
a_1row

array([ 2,  5,  8, 11])

+ ***Boolean array indexing***
    
    Boolean array indexing lets you pick out elements of an array based on the Boolean array with the same shape.

In [22]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
a[a > 8]

array([ 9, 10, 11, 12])

In [None]:
a[a > 8]

## Array Manipulation

+ Reshape array sizes with the `reshape()` method

In [30]:
a = np.arange(12)
np.reshape(a, (5, -1))

ValueError: cannot reshape array of size 12 into shape (5,newaxis)

In [None]:
print(a)
print(a.reshape((3, 4)))
print(np.reshape(a, (3, -1)))  # use the class method and put object as 1st argument is the same

+ Join multiple arrays with 
    1. `hstack()` – horizontally concatenate arrays
    2. `vstack()` – vertically concatenate arrays
    3. `concatenate()` – concatenate arrays across the specified axis

In [28]:
a = np.arange(12).reshape((3, 4))
c = np.arange(6).reshape((3, 2))
print(a)
print(c)
np.hstack((a, c))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[0 1]
 [2 3]
 [4 5]]


array([[ 0,  1,  2,  3,  0,  1],
       [ 4,  5,  6,  7,  2,  3],
       [ 8,  9, 10, 11,  4,  5]])

In [None]:
ac = np.hstack((a, c))
print(ac)

Arrays can also be **split**, **tiled**, and **rearranged in other ways**. 

Please refer to official documention of [array manipulation](https://numpy.org/doc/stable/reference/routines.array-manipulation.html) when you need them.

## Array Math

The real power of NumPy is that arrays can be operated for mathematical calculations easily, along with a bunch of mathematical methods provided. Let's see some examples.

### Basic Arithmetic

In [34]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)

# Elementwise sum; both produce an array
np.log(x)

array([[0.        , 0.69314718],
       [1.09861229, 1.38629436]])

In [None]:
x + y
# Elementwise square root; produces an array
print(np.sqrt(x))
# Elementwise natural logarithm; produces an array
print(np.log(x))

### Aggregation Calculations

We could get basic statistics of array along different axes through Aggregation Calculations. 
+ `min()`/`max()` – get minimum/maximum value
+ `sum()` – summation
+ `mean()` – average
+ `std()` - standard deviation, and
+ [plenty of others](https://jakevdp.github.io/PythonDataScienceHandbook/02.04-computation-on-arrays-aggregates.html).

In [37]:
x = np.array([[1, 2, 3], [4, 5, 6]])
x.sum(axis=1)

array([ 6, 15])

In [None]:
print(np.sum(x))          # Sum of all elements; produce a value
print(np.sum(x, axis=0))  # Sum along axis 0 (column); produce a lower rank array
print(x.sum(axis=1))      # Sum along axis 1 (row); produce a lower rank array
# Try others!

Which array dimension does the axis number refer?

<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_axis.png" width="1000" align="center"/>

🤔 Let's say we have temperature data like this ⬇️. And axis=0: day, axis=1: latitude, axis=2: longitude.
<img src="https://github.com/Hem-W/python-climate-visuals/blob/HMW-tutor/assets/images/tutor/numpy_3dslice.png?raw=true" width="400" align="center"/>

In [38]:
Temp = np.array([[[111, 112, 113, 114, 115], [121, 122, 123, 124, 125], [131, 132, 133, 134, 135], [141, 142, 143, 144, 145]],
                 [[211, 212, 213, 214, 215], [221, 222, 223, 224, 225], [231, 232, 233, 234, 235], [241, 242, 243, 244, 245]],
                 [[311, 312, 313, 314, 315], [321, 322, 323, 324, 325], [331, 332, 333, 334, 335], [341, 342, 343, 344, 345]]])
Temp[0, :, :]

array([[111, 112, 113, 114, 115],
       [121, 122, 123, 124, 125],
       [131, 132, 133, 134, 135],
       [141, 142, 143, 144, 145]])

🤔 What if I wish to get the values for the first day?

➡️ time snapshot / grey-scale image of earth

🤔 What if I wish to get the values for the point in the second latitude and third longitude?

➡️ time series

## One more thing: Broadcasting

Broadcasting allows arrays of different shapes to work together. Let's see two examples.

In [39]:
x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
x / 10

array([[0.1, 0.2, 0.3, 0.4, 0.5],
       [0.6, 0.7, 0.8, 0.9, 1. ]])

In [41]:
x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
sign = np.array([-1, 1]) #.reshape((2, 1))  # `reshape` is important❗️
x * sign

ValueError: operands could not be broadcast together with shapes (2,5) (2,) 

<p>
<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_broadcast1-1.png" width="800" align="center"/>
</p>
<p>
<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_broadcast1-2.png" width="800" align="center"/>
</p>

### How can we know array shapes are compatible? 🧐

NumPy compares array shapes from back forward. For all dimensions, their sizes are compatible when
1. they are equal, or
2. one of them is 1

Let's say we operate between A and B having the following shapes.
``` python
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5 # also work for higher dimensions
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 1 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5
```

Here are examples not compatible.
``` python
A      (1d array):  3
B      (1d array):  4 # trailing dimension not match ❌

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 # second from last dimensions mismatch ❌
```

One successful and interesting broadcasting example is to calculate outer product of two vectors.

In [42]:
A = np.arange(1, 6)
B = np.arange(1, 3).reshape((2, -1))  # How? Please read tutorial!
A * B

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10]])


<img src="https://xiaoganghe.github.io/python-climate-visuals/_images/numpy_broadcast2.png" width="800" align="center"/>


Broadcasting typically makes your code more ***concise***, ***readable***, and more importantly, ***faster***.

# 🎉 Happy Coding 🎉