<br>We import NumPy as a shortened nickname, `np`, which is commonly used for NumPy.

In [None]:
import numpy as np

<br>*If you are new to Jupyter notebooks, each gray cell is a piece of code. To run the code, click inside the gray cell and either click the triangle button up top, or press shift+return (or shift+enter) on your keyboard. If you are using Google Colab, shift+return should also work.*

# <br><br>An introduction to NumPy arrays

NumPy is a Python package that allows you to do fast operations on numerical data, including "mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more."

<br>The heart of NumPy is the **array** object.

<br>NumPy arrays are used behind-the-scenes of many Python packages, including:
- pandas, GeoPandas
- matplotlib, plotly, seaborn, bokeh
- SciPy, scikit-learn, statsmodels
- TensorFlow, PyTorch
- Jupyter
- Biopython
- many, many more

<br>Some packages will output an array, even if you gave the function a list or other object. ***This is one reason why you should learn to recognize and index an array, even if you don't think it is an object you need to create in your work.***

<br>**Topics for today:**
- What is a numpy array?
- recognizing arrays
- indexing arrays
- looping through arrays
- changing elements in arrays
- **Where to learn more:** https://numpy.org/devdocs/user/quickstart.html
- *Bonus section*: creating empty arrays

### <br><br>What is a numpy array and how do we recognize it?

<br>The array is a *multidimensional* object.

It can have up to *n* dimensions.

There are lots of applications for multidimensional objects in scientific research (see https://numpy.org/ for a few case studies), but let's think about just one case. Let's say you have a series of lat/long points for the state of Illinois - you would have 2-dimensional data. For each lat/long, you also have air quality data - layers of several different air polutants at each point - you now have 3-dimensional data. You also have a full set of these measurements for every hour of the day - a 4th dimension. 

<br>An array looks like a list of lists, but on its own a list of lists is not multidimensional, it is simply a collection of multiple one-dimensional objects. We can use the function `np.array()` to define our list of lists as an array:

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
print(x)

In [None]:
x

<br>You can see that an array looks different than a list when you print it and when you return it.

In [None]:
list_x = [[1, 2, 3], [4, 5, 6]]

In [None]:
print(list_x)

In [None]:
list_x

Take a few seconds to look at the difference between the two objects above, so that you will recognize an array when you see it.

<br><br><br>Each number in the array is called an **element**.

<br>***Unlike a list, all of the elements in an array must be the same data type.***

Let's run the same code only we'll make the last element a float instead of an integer:

In [None]:
x = np.array([[1, 2, 3], [4, 5, .6]])

In [None]:
print(x)

<br>Notice that it changed all of the elements to floats and added a decimal after each integer. 

We can check the data type of the array with the `dtype` attribute:

In [None]:
x.dtype

<br><br>***Another difference between a list and an array is that an array has a set size when it is created.*** You cannot add elements to an array - there is no append() function.

<br><br>Each dimension of an array is called an **axis**.

`x` is a 2-dimensional array with 2 axes:

In [None]:
print(x)

<br>The first axis has a length of **2**.
<br>The second axis has a length of **3**.

<br><br>Let's make an array with 3 axes. We use additional sets of square brackets to group our dimensions:

In [None]:
y = np.array([[[10, 20, 30, 40], [11, 21, 31, 41], [12, 22, 32, 42]], 
              [[50, 60, 70, 80], [51, 61, 71, 81], [52, 62, 72, 82]]])

In [None]:
print(y)

<br>The first axis has a length of **2**.
<br>The second axis has a length of **3**.
<br>The third axis has a length of **4**.


#### <br><br>Array attributes

**Try out these attributes, which are especially handy with large arrays:**

How many dimensions (axes) are in your array?

In [None]:
y.ndim

What are the lengths of each axis?

In [None]:
y.shape

How many total elements are in the array?

In [None]:
y.size

The `size` is equal to the product of the lengths of all the axes in the array.

<br><br>You can also use the `reshape()` function to change the shape of an array:

In [None]:
print(y)

In [None]:
y2 = y.reshape(4,1,6)

In [None]:
print(y2)

<br>You can only reshape an array to another array of the same size. Again, the size is the same as the product of the lengths of all the axes.

size y = 2 x 3 x 4 = 24
<br>size y2 = 4 x 1 x 6 = 24

### <br><br>Exercise 1

In [None]:
my_array = np.array([[.1, .2, .3, .4, .5, .6], [.01, .02, .03, .04, .05, .06]])

Run the cell above to store the array. Write code to return the length of all the axes in `my_array`:

Change `my_array` so that it has 3 axes of lengths 2, 3, and 2:

### <br><br>Indexing arrays

Arrays are indexed in a similar way to other Python objects. You can index individual points or a range of points on each axis. If you want all points in an axis, use `:`.

Let's take another look at the array `y`:

In [None]:
print(y)

#### <br>For each indexed array below, try to guess what will be returned before you run the code:

In [None]:
y[0, 0, 0]

In [None]:
y[0, 0]

In [None]:
y[0]

In [None]:
y[-1, 1, 2]

In [None]:
y[:, 0, 0]

Notice that if you index multiple elements in an array, the answer is returned as an array.

In [None]:
y[:, :, -1]

In [None]:
y[:, :, :]

In [None]:
y[0, 0, 1:3]

In [None]:
y[0, 0:2, 0:2]

### <br><br>Exercise 2

In [None]:
my_array = np.array([[.1, .2, .3, .4, .5, .6], [.01, .02, .03, .04, .05, .06]])

Run the line of code above to store the array. Write code to index the element .4:

Write code to index the elements .04, .05, and .06:

### <br><br>Looping through arrays

When you loop through an array, the default is to loop through the first axis.

Another reminder of array `y`:

In [None]:
print(y)

In [None]:
for i in y:
    print("A loop:")
    print(i)

<br>To loop through multiple levels, you have to write loops within loops:

In [None]:
for i in y:
    print("AN OUTER LOOP:")
    for j in i:
        print("An inner loop:")
        print(j)

<br>You can also index the part of the array that you want to loop through:

In [None]:
for i in y[1, 2]:
    print(i)

<br>To loop through every element in the array, you can use the attribute `.flat`:

In [None]:
for i in y.flat:
    print(i)

<br>The `.flat` attribute will also allow you to make a list out of all the elements in an array, if that is something you ever need to do:

In [None]:
list(y.flat)

### <br><br>Exercise 3

This sample array has 2 axes. The first axis has a length of three - latitude, longitude, and an air quality index score.

In [None]:
air_quality = np.array([[41.8781, 42.0451, 41.8850, 41.7606, 42.0324, 41.5250, 42.3636, 42.0884], 
                        [87.6298, 87.6877, 87.7845, 88.3201, 87.7416, 88.0817, 87.8448, 87.9806], 
                        [59, 80, 101, 92, 120, 153, 94, 110]])

Write code to loop through the array and print each air quality index score that is 101 or higher:

### <br><br>Assigning a new value to an element

You can use indexing to set the value of a particular element in an array.

In [None]:
print(y)

Let's change 70 to 700 by indexing 70 and setting it equal to 700:

In [None]:
y[1, 0, 2] = 700

In [None]:
print(y)

### <br><br>Exercise 4

Write code to change 42 in the `y` array to 4000:

In [None]:
print(y)

### <br><br>Bonus section: Creating an empty array

Sometimes you will need to create an empty array of a set shape, and then add numbers to certain positions in the array at a later time. There are several ways to do this.

The function `np.zeros()` will create an array of a given size with all elements equal to zero. `np.ones()` does the same thing, but all the elements will be equal to 1.

Both functions take an array shape as the argument, so it must be inside a second set of parantheses:

In [None]:
zero_array = np.zeros((4, 2, 2, 1))

In [None]:
print(zero_array)

In [None]:
one_array = np.ones((2, 3, 2, 7))

In [None]:
print(one_array)

<br><br>You can also create an array full of random data using the `np.random.rand()` function. It takes an array shape as an argument, but you do not include it in a second set of parentheses:

In [None]:
np.random.rand(2, 2, 1)

<br>By default, `np.random.rand()` fills in floats. However, you can also make an array of random integers using `np.random.randint()`. It takes three arguments - the lowest integer to be drawn from, one past the highest integer to be drawn from, and the shape of the array in parentheses. For example, to create an array of shape 2, 3 of random integers from 1 to 100:

In [None]:
np.random.randint(1, 101, (2,3))

### <br><br>Bonus Exercise

Create an array with three axes. The first axis should have a length of 2, the second axis should have a length of 2, and the third axis should have a list of 6. Fill the array with the integer 1.

Create an array of the same size as above, but fill the array with random floats.

### <br><br>Where to learn more

I'd recommend starting with the quickstart guide: https://numpy.org/devdocs/user/quickstart.html. The guide will review some of today's topics, but also covers using basic arthmetic operators and functions with arrays, as well as splitting and joining arrays.