# Loops & Orbits &mdash; Week 2 &mdash; Day 4 &mdash; Numpy Tutorial

There is a [numpy Quickstart tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html). I borrowed from it to get started writing a tutorial for you. Personally, I think that what I have created for you emphasizes the concepts more clearly. However, theirs is longer and more complete and it is part of scipy's official documentation, so perhaps you'll want to look at it after working through my quickstart below.

## Rationale for Numpy

The Python List is not designed for the highest level of numerical performance.

You can tell this because things like this are possible:

In [None]:
my_list = [1, "Loki", 2.0, ("Physics", "Math", "Computer Science")]

my_list

As you can see a list can take inhomogeneous mixtures of items &mdash; items of different types. Above, we have an integer, a string, a float, and a tuple which in turn contains three strings.

If you are very familiar with how things are implemented on computers, you know this is not optimal for speed even though it is optimal for flexibility.

Anybody who does scientific computing switches over from lists to numpy's ndarrays in order to make their programs run faster and to work with all their colleagues who are using numpy.

## Numpy Basics

### Multi-Dimensional and High Performance

NumPy’s main object is called ndarray. It is homogeneous and multi-dimensional. It stands for n-dimensional array. Because it is homogeneous it is almost as high performance as C-language arrays. Fortran is still the gold standard for speed, but nobody knows Fortan nowadays, and fewer and fewer people know C. So numpy and ndarray exist because Python users need a high-performance array.

### Contrast with Ordinary List

By contrast with ndarray, Python lists are one-dimensional. That just means they take a single index:

In [None]:
my_list[1] ## a list takes one index

In [None]:
my_list[2] ## the index can be anything from 0 to len(my_list - 1)

In [None]:
my_list[-1] ## a special notation for getting the last item

### Example: An `ndarray` of Zeros

Let's make a numpy array that takes two indices and starts off with all zeros.

In [None]:
# You know that import brings in optional libraries.
# By bringing in the library "as np" our prefixes for library functions aren't as long.

import numpy as np

all_zeros = np.zeros((3, 4))  ## The np library function that creates an array of all zeros.

all_zeros[1, 2] # check that out -- it takes two indices!

Just like Python lists, the indices number from 0.

So in the `all_zeros` array, the first index goes from 0 to 2 and the second index goes from 0 to 3.

In [None]:
# Let's put the second index out of bounds:

all_zeros[1, 4] # should fail

### Rows and Columns

An array with two indices is said to be two-dimensional.

For two-dimensional arrays it is easy to have a mental picture of a grid of numbers. (For 3-dimensional and higher it is harder to have a mental picture.)

The rows in the grid are specified by the first axis (which is axis 0).

The columns of the grid are specified by the second axis (which is axis 1).

So our all_zeros array has 3 rows and 4 columns. *Don't picture this as 3 columns and 4 rows or you will not be in agreement with the convention.*

Here is how Python and Jupyter display our 3x4 all_zeros array:

In [None]:
all_zeros

### An `ndarray` of Ascending Integers

In [None]:
# The following is intentionally similar to the range function:
ascending = np.arange(0, 15)

ascending

## Array Shape and Reshaping

Apparently `ascending` is alot like a 1-axis list that runs from 0 to 14.

We can reshape it!

If we make it have three rows and five columns, will we get:
    
```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
```

or will we get

```
array([[0, 3, 6,  9, 12],
       [1, 4, 7, 10, 13],
       [2, 5, 8, 11, 14]])
```

We just try it:

In [None]:
ascending = ascending.reshape(3, 5)

ascending

This is very significant. It tells us that ndarray things of the last axis as the one that is changing 
most rapidly. It's just like digits when you count. The last digit changes most rapidly.

```
6458
6459
6460
6461
6462
```

That's just a convention. We could have had the first digit change the most rapidly

```
6458
7458
8458
9458
0558
```

That's just weird, but is just a convention. We put the least-significant digit on the right, not the left.

So does numpy. The elements in ascending order are.

```
ascending[0, 0]
ascending[0, 1]
ascending[0, 2]
ascending[0, 3]
ascending[0, 4]
ascending[1, 0]
ascending[1, 1]
ascending[1, 2]
ascending[1, 3]
ascending[1, 4]
ascending[2, 0]
ascending[2, 1]
ascending[2, 2]
ascending[2, 3]
ascending[2, 4]
```

So the sixth element is the [1, 0] element and it better be 5.

In [None]:
ascending[1, 0]

### Reshaping Makes a Copy

Let's make ascending have 5 rows and 3 columns

Then let's access element [4, 1]. Something will go wrong.

In [None]:
ascending.reshape(5, 3)

In [None]:
ascending[4, 1]

What went wrong?

```
ascending.reshape(5, 3)
```

seemed like it did what we wanted.

However, it made a copy of itself. It did not change itself. Instead

```
ascending.reshape(5, 3)
```

returned the copy and we did not save the copy anywhere.

Usually when you reshape you want the copy not the original, so you reshape and overwrite the original as follows:

In [None]:
ascending = ascending.reshape(5, 3)

ascending

Now we are able to access row 4, column 1 of ascending:

In [None]:
ascending[4, 1]

### Transposition

It seems a little annoying that we don't yet have a way of getting this array:

```
array([[0, 3, 6,  9, 12],
       [1, 4, 7, 10, 13],
       [2, 5, 8, 11, 14]])
```

Here is how to do it:

In [None]:
ascending = np.arange(0, 15)

ascending = ascending.reshape(5, 3)

ascending

In [None]:
np.transpose(ascending)

As with reshape, transpose made and returned a copy.

So we have lost our change:

In [None]:
ascending

Here is how to keep the copy:

In [None]:
ascending = np.transpose(ascending)

ascending

## Vectorized Operations

I haven't given you one of the main reasons why numpy is high-performance!

It supports vectorized operations, and this means that all the processors (also known as cores) on a multi-processor computer can work in parallel.

For the Python list, this is illegal:

In [None]:
times = [0.0, 0.1, 0.2, 0.3]

stopwatch_start_time = 5.0

times = times + stopwatch_start_time

You wish that would have returned

```
[5.0, 5.1, 5.2, 5.3]
```

Let's try the same thing with ndarray:

In [None]:
times = np.arange(0.0, 0.4, 0.1)

times

You can already tell that richer things than can happen in standard Python are happending.

This wasn't even legal with the ordinary Python library.

In [None]:
range(0.0, 0.4, 0.1)

Now let's see what else we can get away with using the numpy library:

In [None]:
times = times + stopwatch_start_time

times

That's exactly what we wanted.

Here's more stuff you can do that cannot be done with the standard Python lists:

In [None]:
# the following illustrates a new function that converts a list to an ndarray

some_numbers_to_cube = np.array([2.0, 4.0, 6.0, 8.0]) 
some_numbers_to_cube

In [None]:
some_numbers_to_cube**3

You can see that ndarray allows you to do things very cleanly and clearly that aren't even legal with the ordinary Python list.

As usual, you get a copy and the original is unmodified.

In [None]:
some_numbers_to_cube

## Random Numbers and Histograms

Almost anything you could want to do in science including statistical analysis is possible with numpy.

Let's make a lot of random numbers:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Sample a normal distribution standard deviation of 0.5 and mean of 2.0

mu = 2.0
sigma = 0.5

samples = np.random.normal(mu, sigma, 10000)

samples

We just made 10,000 samples. Fortunately Jupyter knows not to print them all. It omitted a lot with ... notation.

Let's plot the samples:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Use matplotlib histogram function
plt.hist(samples)
plt.show()

In [None]:
# Want more detail?
plt.hist(samples, bins=50)
plt.show()

In [None]:
# Want it normalized?
plt.hist(samples, bins=50, density=1)
plt.show()

## Conclusion


You have seen a ton of new stuff. Continuing to learn the features of numpy with the SciPy [numpy Quickstart tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html) would be one way to continue if you want to learn more.

**Stop here. Below is what I am going to discuss once everyone is done.**

### Library Functions, Objects, and the Dot Operator

You have seen the . operator being used in two quite different ways. This operator is the "dot" operator. It accesses attributes and methods of classes and objects. So we need to discuss what these examples are doing as our computer science topic for today:

* Library Functions
  * Example: `ascending = np.arange(0, 15)`
* Objects
  * Example: `ascending = ascending.reshape(3, 5)`
  
`np` is a module that was imported using:

```
import numpy as np
```

`ascending` is an object that was created using:

```
ascending = np.arange(0, 15)
```

`np.arange` is a library function.

`reshape` is a method. A method is a special type of function that can act on the object it is called with.