# NumPy

While base Python has many useful tools for a wide-variety purposes, open-source external libraries greatly expand Python's uses. NumPy (Numerical Python) is a critical library for manipulating numbers, performing matrix operations, and mathematics in general. 

To use this library, we first have to **import** it with the keyword `import`.

We now have access to the various tools and functions that NumPy has to offer. The foundation of NumPy is the array, a data structure for holding numbers designed for math. 

The simplest array is a single dimensional vector, essentially a Python list that we can do math with. To make an array, we tend to create a list and convert it to an array with `numpy.array()`.

You can always do simple math operations between a number (int or float) and an array.

If we have arrays of the same length, we can do the same operations on them between elements in the same positions.

You can also use numpy functions like `add()` and `multiply()` to do these actions, as well. 

#### Bonus
It is common practice in Python to use `import numpy as np` when importing NumPy. This allows you to only need to type `np.` (e.g., `np.add()`) when using a tool within NumPy, which is a bit less clunky and faster.

You could technically import NumPy as any variable name, but **DO NOT DO THIS** to avoid confusion.

#### Question: NumPy math

Create two NumPy arrays of the same length and subtract one from the other.

In [1]:
### your code here:

## 2D Array

NumPy arrays really come into their own when they're used as matrices. Let's first make a 3 x 3 array. To do this, we will call `numpy.array()` with a list that contains other lists, also called a **nested list**. 

In [None]:
numpy.array( 
            
            
)

We can get the dimensions of the array by checking the `.shape` attribute.

In [None]:
a = numpy.array( [[1, 1, 1],
                  [1, 1, 1],
                  [1, 1, 1]] )



Just like the single dimensional array, you can use the standard math operators between 2D arrays, though they have to be of the same shape.

In [None]:
b = numpy.array( [[10, 10, 10],
                  [10, 10, 10],
                  [10, 10, 10]] )



Like a 1D vector, you can also do math operations with a single number.

NumPy comes with many tools to do various more complicated math operations as well. For instance, `numpy.matmul` can be used for matrix multiplication. 

Here is non-exhaustive list of other useful operations you can calculate with NumPy. Many of them use the submodule `linalg` that specializes in linear algebra operations.
- Natural logarithm: `numpy.log()`
- Base 10 log: `numpy.log10()`
- Exponential ($e^x$): `numpy.exp()`
- Mean: `numpy.mean()`
- Median: `numpy.median()`
- Maximum: `numpy.max()`
- Minimum: `numpy.min()`
- Standard deviation: `numpy.std()`
- Variance: `numpy.var()`
- Dot product: `numpy.dot()`
- Determinant: `numpy.linalg.det()`
- Vector/matrix norm: `numpy.linalg.norm()`
- Matrix rank: `numpy.linalg.det()`
- Matrix inverse: `numpy.linalg.inv()`
- Eigenvalues/eigenvectors: `numpy.linalg.eig()`
- Solutions to linear equations: `numpy.linalg.solve()`

For full usage of these functions and more, please visit the [NumPy reference manual](https://numpy.org/doc/stable/reference/routines.linalg.html).

#### Question: NumPy operations

Create a 1D array called `a` with at least 5 values. Find its mean, median, min, max, and standard deviation.

Create another 1D array called `b` with the same length as `a`. Use `numpy.dot(a,b)` to find the dot product of `a` and `b`. 

In [2]:
### your code here:

### Question

A common task in data analysis is **normalizing** data. This consists subtracting the mean from data and dividing by the standard deviation. 

Define a function that takes in a numpy array and calculates its mean (`numpy.mean()`) and standard deviation (`numpy.std()`). Subtract the original array by the mean and divide by the standard deviation. Return the normalized array.

Test your function on a numpy array with several values. Compare the mean and standard deviation of the array before and after normalization. 

In [None]:
### your code here:


### Indexing and slicing in NumPy

Selecting a value in a 1D array is just like indexing in a Python list. If the array has a length of 4, indexes begin at 0 and end at 3. 

2D arrays can be indexed in a similar manner with separate column index and row index -> array[row, col]. Both column and row numbers begin with 0.

![array indexing](https://swcarpentry.github.io/python-novice-inflammation/fig/python-zero-index.svg)
*Credit to [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/02-numpy/index.html)*.

In [None]:
y = numpy.array([[1., 2., 3., 4.], # adding the decimal makes them floats
              [5., 6., 7., 8.]]) 



Also like lists, we can use **negative indexing** to get the last values of a column and/or row.

We can also using **slicing** to return portions of an array -> `array[i:j]`. Slicing is **inclusive** for the first index (`i`) and **exclusive** for the last index (`j`). `array[i:j]` returns values from `i` to `j-1`. 

We can use this for 2D arrays, as well. We can slice rows, columns, or both at once.

In [None]:
z = numpy.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9,10,11,12],
                 [13,14,15,16]])



### Question: Slicing

Using slicing, create a variable containing the first two columns of `data`, and another variable containing the last two columns. Subtract the two sets of columns from each other and square the difference. 

In [3]:
data = numpy.array([[0.37568486, 0.39360456, 0.83055883, 0.67256725],
                    [0.68017832, 0.90546118, 0.79336985, 0.80561814],
                    [0.31127419, 0.29518634, 0.48364838, 0.56015636],
                    [0.75994716, 0.01312868, 0.15958863, 0.98516761],
                    [0.76733493, 0.19900552, 0.03471678, 0.06886277]])

### your code here:




### NumPy constants

Math has many constants and important terms that are not present in vanilla Python. Here is a short list of some important ones:

- Positive infinity ($+\infty$): `numpy.Inf` or `numpy.inf` or `numpy.Infinity` or `numpy.PINF` or `numpy.infty`
- Negative infinity ($\infty$): `numpy.NINF`
- Euler's constant $e$: `numpy.e`
- Missing values/ Not a Number (NaN): `np.nan` or `np.NaN` or `np.NAN`
- pi ($\pi$): `np.pi`

## Numpy random

Numpy contains a submodule called `random`. It contains incredibly powerful tools for random sampling, randomizing list orders, and random number generation.

We'll go through a few examples of how to use `numpy.random`.

`np.random.rand()` generates random floats between 0 and 1.

We can provide one number for a 1D array output, or we can give a shape.

`np.random.randint()` gives us back a random integer between a low and a high number. It includes the low number and excludes the high number. The third argument is the shape of the output.

`np.random.uniform()` gives you random floats between two intervals. All values between those intervals are equally likely.

`np.random.normal()` gives numbers centered around a mean, which is the first value. The second number defines the spread, or how far from the mean the values can be. The last argument is the shape.

You can get dramatically different values by changing the spread, or **standard deviation**.

1/3 of all values will within 40 of 0 in the example below.

`np.random.shuffle()` randomly rearranges orders. Here we use a list. 

It re-generates the variable, overwriting the list we had.

In [2]:
my_list = [0,1,2,3,4,5,6,7,8]



`shuffle()` works on string lists, too.

In [1]:
string_list = ["first", "second", "third", "fourth", "fifth"]



`np.random.choice()` by default takes a random item from a list that we give it.

We can ask for more than one item, as well.

We take from the list **with replacement** by default, meaning that we don't remove future possibilities by sampling more.

If we say `replace=False`, then we can only get each value once.

### Question 

Define a function called `sample_size_testing()` with one parameter called `n`.

In this function, use `np.random.normal()` to make an array with a mean of 40, and a standard deviation of 20 with a sample size of `n`. 

Then, calculate the mean and standard deviation of the random array you have generated.

As you increase n, do you notice any change in the sample mean or standard deviation?

In [None]:
### Your code here: 

## Resources

- [NumPy docs](https://numpy.org/doc/stable/index.html)
- [NumPy getting started](https://numpy.org/doc/stable/user/quickstart.html)
- [Random samples with NumPy](https://numpy.org/doc/stable/reference/random/index.html)

This lesson is adapted from 
[Software Carpentry](http://swcarpentry.github.io/python-novice-gapminder/design/).