#  Review of Numpy Arrays <br> ATSC 405: Numerical Methods in Meteorology <br> University of North Dakota Atmospheric Sciences
# Written by: Aaron Scott (aaron.scott@und.edu) <br> Updated: 01 September 2022

____

## Here we are going to explore how to create Numpy arrays
[Click here for the Numpy Documentation](https://numpy.org/doc/stable/)
<p> There are multiple ways to create numpy arrays and we will look at a few. </p>
First we must import numpy 

In [1]:
import numpy as np #The standard alias is np 

<p> The array() function takes either a list or a tuple as its argument. Arrays can be multidimensional, but we will start with the simple 1-D case. </p>

In [2]:
temps_array = np.array([82,83,84,89,81])
temps_array

array([82, 83, 84, 89, 81])

<p>we can check the size and shape of an array by using the shape() and size() functions </p>

In [3]:
print('shape:',np.shape(temps_array)) #notice these functions come from the Numpy module (not built-in to Python directly)
print('size:',np.size(temps_array))

shape: (5,)
size: 5


The array also has attributes that can be used to find the shape and size. Note: In Python, attributes are variables of a given class. We won't go into details on classes within this course, but you should at least know that virtually everything in Python belongs to a certain class. For example, when we create a Python list, we create an instance of the list class. These instances are called objects so when the list is created, it is an object. <br>
The numpy array belongs to the ndarray class and has several attributes. These attributes can be thought of simply as variables that contain information within or about the object. So for temps_array, it is an instance (object) of the ndarray class and the size and shape variables are attributes.

In [4]:
print('shape:',temps_array.shape) #notice these attributes use the dot notation. 
print('size:',temps_array.size)   #object.attribute where object is temps_array and attribute is size

shape: (5,)
size: 5


Array indexing and slicing is similar to that of Python lists and tuples. <br> Note the syntax for slicing: [start:end:step] - defaults - start=0, end=length of array in that dimension, step=1

In [5]:
print('Entire array:',temps_array)
print('Second element:',temps_array[1])
print('First three elements:',temps_array[0:3]) #Note it goes up to but not including the end value
print('Every other element:',temps_array[::2]) #Note the defaults for slicing here 
print('The second and third:',temps_array[1:3])

Entire array: [82 83 84 89 81]
Second element: 83
First three elements: [82 83 84]
Every other element: [82 84 81]
The second and third: [83 84]


Now we are going to look at 2-D arrays and how we may use them in typical atmospheric sciences problems. <br> 
Assume we have a list of temperatures for Grand Forks, Fargo, and Bismarck, ND. Each list contains the temperatures for 00,01,02,03, and 04 UTC for each city.

In [6]:
ND_temps = np.array([[75,74,74,73,71],[81,79,78,76,73],[79,78,76,75,74]])
print(ND_temps) #3 rows and 5 columns 
print('Shape:',ND_temps.shape) #Notice it returns a tuple (num rows, num columns)

[[75 74 74 73 71]
 [81 79 78 76 73]
 [79 78 76 75 74]]
Shape: (3, 5)


Notice that each row is associated with the temperatures for each city. The first row is Grand Forks, second is Fargo, and the last row is Bismarck.  <br> For multidimensional arrays, you must supply the index for each dimension. Note that each dimension is called an axis in numpy. The row axis is 0 and the axis the runs along the columns is 1. For more information, [here](https://www.sharpsightlabs.com/blog/numpy-axes-explained/) is a good link on numpy axes. This understanding will come in handy when you are needing to do some operations like taking the sum of the data along a certain dimension (axis) of an array. <br> **How can we print out only the 00 UTC temperatures for all three cities?** <br> Remember the [start,end,step] syntax from above as it works the same in this case. If just the : is given, then it takes all of the data along that dimension (axis).


In [7]:
print('00 UTC temps:',ND_temps[:,0]) #this takes all rows and the first column of the array 

00 UTC temps: [75 81 79]


What about the temperatures from 01-03 UTC for just Fargo and Bismarck?

In [8]:
print('01,02,03 temps for Fargo and Bismarck:',ND_temps[1:,1:4]) #How else could this be written to get the same result?

01,02,03 temps for Fargo and Bismarck: [[79 78 76]
 [78 76 75]]


Numpy comes with several easy ways to compute statistics of the data within an array. This is one reason that numpy arrays are preferred over lists and tuples for number crunching. Many of the statistics calculations can be done with the methods from the arrays themselves. Remember that a method is just a function that belongs to a particular object (in this case the array object). Numpy also has many functions for computing statistics from arrays. In general, there is no major difference in which way you do it, but there are more options with the numpy functions. Some examples are given below using both ways. An important key here is the use of the **axis** argument to specify how to calculate the statistics. <br> Below uses the max method/function to find the max value for the array.

In [9]:
print('Using the max methods from the numpy array:',ND_temps.max())
print('Using the max function from numpy:',np.max(ND_temps))

Using the max methods from the numpy array: 81
Using the max function from numpy: 81


We can use the axis keyword argument to specify an axis to calculate the max along. For example, say we want to find the max temperature for each city during the 5 hour period (0-4 UTC). Axis will be 1 since we want to find the max value in each row. In other words, numpy will go row by row and calculate the max value along the columns. 

In [10]:
np.max(ND_temps,axis=1) #using the max numpy function 

array([75, 81, 79])

We can also calculate which city had the highest temperature each hour. To do this we want to find the max value in each column so we specify **axis=0** to have numpy calculate the max value in each column. In other words, numpy will go column by column and calculate the max value along the rows.

In [11]:
ND_temps.max(axis=0) #using the max method from the numpy array

array([81, 79, 78, 76, 74])

[Look here at some of the other arrays statistics that can be calculated with Numpy](https://numpy.org/doc/stable/reference/routines.statistics.html)

--------

Sometimes we don't have the data yet that we want to initialize an array with, but we must create the array and give it a shape before we can use the array. This is usually what happens when we are calculating some data in our code and as we calculate the data we want to place it within an array. There are a few ways to create a numpy array with a specified shape. We can use the zeros() and ones() functions from numpy to fill an array with zeros or ones, respectively. You can pass in a list or tuple to specify the shape of the array.

In [12]:
A_array = np.zeros((2,3)) #notice we pass in a tuple for the shape of the array (rows,columns)
A_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [13]:
B_array = np.ones((3,4))
B_array

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

There is also a numpy function called *empty* to create an uninitialized array of specified shape. This will not initialize the array with a specific value; However, it should be made clear that this array is not *really* empty as it will likely have whatever data was previously at that location in memory.

In [14]:
C_array = np.empty((4,2))
C_array

array([[1.49166815e-154, 1.49166815e-154],
       [1.97626258e-323, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000],
       [1.49166815e-154, 5.60608914e-309]])