# Introduction to numpy

Credit: Dr. Andrew Bennett

The numpy library is short for "numerical python" In this notebook I will motivate numpy and as demonstrate aspects of it's functionality Numpy is a huge library with tons of capabilities so there will be a lot we don't cover. The numpy documentation is excellent for when you need to dive in deeper: https://numpy.org/doc/stable/ 

There is also a really nice quickstart guide that covers many of the things we will see here and helps to reinforce key concepts in a concise way: https://numpy.org/doc/stable/user/quickstart.html 

To start with numpy you can import it. The  line `import numpy as np` is the standard way to import it. This just renames `numpy` as `np` to save you time on typing I'm also importing math so that we can compare and finally the function from the end of the [introduction to python](...) notebook from this course.


In [None]:
import time
import math
import numpy as np

def air_pressure_at_height(h):
    p0 = 101325      # reference pressure in pascals
    M = 0.02896968   # molar mass of air kg/mol
    g = 9.81         # gravity m/s2
    R0 = 8.314462618 # gas constant J/(mol·K) 
    T = 273          # temp in kelvin

    ratio = -(g * h * M) / (R0 * T)
    # NOTE: here I changed math.exp -> np.exp, 
    #       you will see why in a minute
    p_h = p0 * np.exp(ratio)
    return p_h

With the same function defined as before we can go ahead and calculate the pressure for many different altitudes. We'll do this for 20 thousand evenly spaced altitudes using a `for` loop. Additionally, we'll time how long this took to run by using the `time` library, which is included in your base python  installation. 

::: {note}
There are many ways to measure the performance of python programs, and simply timing how long certain portions take to run is the most basic. This is a very common "idiom" that you might find repeated in different context so it's worth pointing out the basic structure of this so that you can easily use it in your own code. Given that you have the `time` module imported, the way to measure the elapsed time for a section of code is simply:

```
# Possibly some setup code, that you don't want to measure

t0 = time.time()

# --> Your code here <--

t1 = time.time()
delta_t = t1 - t0
print(f"Your code took: {delta_t:06f} seconds")
```
:::

In [None]:
# Heights to calculate pressure at
start = 0
end = 20000
step = 1

In [None]:
t0 = time.time() 
h_list = range(start, end, step)
# for loop in one line
p_list = [air_pressure_at_height(height) for height in h_list]

t1 = time.time()
base_python_time = t1-t0
print(f"With plain python this took: {base_python_time:06f} seconds")

Okay, so you might be quite happy with that result, we ran a decently complex calculation many thousands of times and it took well under a second. But, the reality is python is generally considered a very slow programming language, and as your scripts become more advanced you will likely be interested in speeding up your calculations. This is one of the main reasons that you will want to become fluent in using `numpy`. Additionally, using `numpy` allows you to avoid writing loops or comprehensions, which makes your code even shorter and more easily understood. Below is the `numpy` version, which you will see is many times faster.

In [None]:
t0 = time.time()
h_array = np.arange(start, end, step)
p_array = air_pressure_at_height(h_array)

t1 = time.time()
numpy_time = t1-t0
print(f"With numpy this took: {numpy_time:06f} seconds")

print(f"Numpy version is  {base_python_time/numpy_time:06f} times faster")

Okay, so how did that work? Numpy is an "array-based" library, meaning it defines the "array" type. By printing out the `type` you can see we have `h_array` is an `ndarray`, which means  N-dimensional array. In our case N=1, which you may be familiar being referred to as a vector in math class. We can also look at the shape of our arrays using the `.ndim` and `.shape` attributes. These are very handle for getting summaries of what is stored in your arrays as your programs grow in complexity. Note that the length of the shape is always equal to the number of dimensions

In [None]:
print(type(h_array))
print(h_array.ndim)
print(h_array.shape) 
print(len(h_array.shape) == h_array.ndim)

## What else can you do with numpy? Basically anything with numbers!

As mentioned, `numpy` means `numerical python` and has become basically *the* base of the scientific python stack of tools. Almost everything we will use from here on out uses `numpy` under the hood, and in many cases as you write custom algorithms it will be useful for you to know how to "drop down" from more advanced tools into the `numpy` way of writing code. With that in mind, let's get on with our super-brief overview.

When creating new arrays with `numpy` you will probably be using a built-in feature where you specify what values to initialize your new array with, as well as the "shape". In this case the "shape" is specified by a sequence (i.e. either a list or tuple) of values where each of the values corresponds to a length along a particular dimension/axis. This is all rather abstract, so let's make a concrete example. If you wanted a 2 by 5 array of ones you would simple do the following:

In [None]:
array_shape = (2, 5)

# Create an array of all ones with a specific shape
ones_matrix = np.ones(array_shape)
print(ones_matrix)
print(ones_matrix.shape)
print()

Similarly, if you want a 3d cube of of zeros where each side has 3 values you could do the following:

In [None]:
array_shape = (3, 3, 3)
zero_cube = np.zeros(array_shape)
print(zero_cube)

With all of these numbers inside of our `numpy` arrays you might begin to worry how we are going to operate on all of these values without writing tons of complicated for loops and conditionals, but `numpy` also makes math easier to write with python. As you already saw in the very first example though, `numpy` does this by smartly applying functions and operations to all elements in an array. At it's most basic level this will happen element-wise, as we saw previously. To demonstrate this again, we can simply multilply our `ones_matrix` by a scalar value and it will automatically do the right thing. Operating on the arrays directly, rather than manually iterating over the values individually is commonly referred to as "vectorization" in the `numpy` world.

In [None]:
print(0.1 * ones_matrix)

Similarly, since we're talking about matrices, you can also perform common operations from linear algebra, such as computing the multiplication of the transpose of a matrix by itself (i.e. $A^TA$), and `numpy` will figure out the rest for you. 

In [None]:
ones_matrix.T @ ones_matrix

This is all figured out by `numpy` using rules for "broadcasting" computations along different shaped arrays. The topic of how broadcasting works is beyond the scope for this course, but as usual, [the documetnation](asdf) provides a good starting point for a deeper dive.

# TODO: Update link ^^^

The benefits of broadcasting and vectorization extend to many of the built in functions that `numpy` provides, making it fast and easy to perform numeric computations with python. We will show a quick sampler of some of the more commonly used functions so you can get an idea of what's available.

## Array creation functions

We've already seen the `np.ones` and `np.zeros` functions which create arrays filled with ones and zerose, respectively, but there are other handy functions for creating arrays that you might be interested in using:

 * `np.eye`: Creates the identity matrix.
 * `np.tri`: Creates a lower-tridiagonal matrix.
 * `np.arange`: A `numpy` counterpart to the built-in `range` function.
 * `np.linspace`: Similar to `np.arange`, except you can specify the number of elements, rather than the step between consecutive elements.

In [None]:
# Practice: use np.eye to create a 10*10 identity matrix


In [None]:
# Practice: use np.arange to create an array from 2 to 10 (10 included)


In [None]:
# Practice: use np.linspace to create an array, 
# starting in 0 and end in 50, with 51 numbers in total


## Array transformation functions

You may have noticed in the `air_pressure_at_height` function we replaced the `math.exp` function call with `np.exp` - this is what we'll call an "array transformation". These types of functions take a numpy array as an input and then return a new array with the same shape but different values. This includes many of the common mathematical operations you're already familiar with:

 * `np.exp`: Takes the exponential of the input array.
 * `np.log`: Takes the (natural) logarithm of the input array.
 * `np.sin`: Calculates the sine function on each element of the input array.
 * `np.abs`: Takes the absolute value of each element of the input array.

In [None]:
# Practice: calculate the exponential of number 10


In [None]:
# Practice: calculate the exponential of array x by using np.exp and math.exp?
x=np.array([10,5,3])


## Aggregation functions

We refer to functions which take an array as input and then returns either a scalar value, or another array with some reduced size (usually by removing one of the dimensions) as "aggregation functions". Often, the default of these types of functions is to aggregate over the full array and return a scalar value, and if you only want to apply the aggregation along a certain dimension/axis you provide that explicitly as an additional argument. We'll demonstrate this in a moment. For now, some commonly used aggregations are:

 * `np.sum`: Takes the sum of elements of a given array.
 * `np.max`: Takes the maximum value of elements of a given array.
 * `np.histogram`: Calculates a histogram from a given array using a "bin counting" strategy.

Of course, this is only a small sample of the available functionality within numpy and doesn't cover many of the specialized "submodules" such as `np.linalg` for linear algebra functions (such as matrix decompositions), or `np.fft` for calculating Fourier transforms, and `np.random` for random number generation.

Because we haven't seen aggregation functions yet, we will take a moment to highlight how that part works for aggregations over full arrays and specific axes. We will use a contrived array, which is a 3x3 matrix whose entries increase linearly from 1 to 9 as defined below.

In [None]:
sample_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])

Then, we can take the total sum of the array in the obvious way:

In [None]:
np.sum(sample_array)

But, as we mentioned, aggregation functions can take arguments which apply the aggregation to specific axes. For instance, if we wanted to take the totals along each of the columns we would specify the `axis=0` argument as below:

In [None]:
np.sum(sample_array, axis=0)

And similarly, we can take the sum across rows by specifying `axis=1`:

In [None]:
np.sum(sample_array, axis=1)

### Array indexing and slicing

This first taste of specifying axes to perform computation on is a key skill to have when using `numpy` for numerical calculations. From the above example in calculating sums we can see that `numpy` is a "row-major" array format. That is, the first index into a `numpy` array is along the rows, and the second is along columns. Of course, as you go into higher dimensions this intuition breaks down, but the key thing to keep in mind is to be careful which axes of your arrays you are operating on and being  diligent in making sure that your algorithms work as expected. Operating on array axes in the wrong order will often produce a result, but that doesn't guarantee it is the correct one.

To get more familiar with how indexing and slicing on array axes works let's go over some of the basics. Much of what you already know from the basic python sequence types (recall, lists and tuples) will still hold with numpy arrays. For example, getting the first and last element of a 1-d array works exactly as you would expect.

In [None]:
sequence_1 = np.arange(0, 11)
sequence_1

In [None]:
print("First element: ", sequence_1[0])
print(" Last element: ", sequence_1[-1])

And just like lists and tuples you can use the slicig operators either via the `start:stop:step` syntax or by creating the `slice` object explicitly. 

In [None]:
start = 0
stop = 8
step = 2
print(sequence_1[start:stop:step])

my_slice = slice(start, stop, step)
print(sequence_1[my_slice])

Indexing on multidimensional arrays, however, works similarly to indexing on sequences or 1-d arrays, but the various axes are separated by commas inside of the brackets. For instance, to get the first row and column from our 2-d array you can do the following (recall indexing with `:` just means "get everything"):

In [None]:
print('   First row: ', sample_array[0])
print('First column: ', sample_array[:, 0])

## Only ONE datatype is allowed in one array

In [None]:
my_list = ['Yifan', 31, 'want to be rich']

In [None]:
my_array = np.array(my_list)

### Even though we input a list with mixed data types to generate a numpy array, it did not report an error!
## But what does this array look like?

In [None]:
my_array

In [None]:
list(np.array(my_list)) == my_list