# Introduction to Numpy

One of the most popular libraries in python is `numpy`, which contains a number of useful classes and functions for mathematical and organizational purposes.  By convention, nearly all documentation and examples will use `np` as the shorthand for `numpy`.

In [None]:
import numpy as np

The base class in `numpy` is the array, which many `numpy` functions return when called.  The first function we will look at is `arange()`, which works similar to `range()` in the base python.

In [None]:
np.arange(10)

The next function is `linspace(X,Y,N)`, which returns an array of $N$ linearly-spaced values between two endpoints, $X$ and $Y$.  Note that while `range()` and `arange()` end before the actual endpoint value, `linspace()` uses inclusive endpoints in the array.

Another way to think of it is that `linspace()` provides the values which enclose $N-1$ equally-sized spaces between the endpoints.  In the example below, we can see the values between 0 and 1.

In [None]:
np.linspace(0,1,10)

Another powerful tool in the `numpy` toolbox is the `genfromtxt` function, which can be used to read in datafiles like `.csv` files or other formatted text files, converting the values automatically into multi-dimensional arrays with built-in organizational tools.  The next cell uses a special function for jupyter notebooks to directly execute a bash command and display the output.  I've done this just to show the first few lines of the datafile we're using to demonstrate.

In [None]:
!head ../Datafiles/rmsd.dat

We can store our loaded data into a variable, which lets us then use the built-in class functions more easily.

In [None]:
my_data = np.genfromtxt("../Datafiles/rmsd.dat")

In [None]:
my_data

If we look carefully, we can see that what we have is actually a list of lists.  That is, our 2-dimensional array of information from the file is actually just a list of smaller, two-element lists.  This means if we want one column only, we have to be careful how we slice the list.


In [None]:
my_data[0,:]

In [None]:
my_data[:,0]

The two cells above show us how the first index in our list slice corresponds to the major list (the 'outer' list), while the second index corresponds to the minor list ('inner' list).  The comma (`,`) separates these indices, allowing us to select a slice from the 2D array fairly easily.  In the first example, we've requested the 0-index of the major list, and all elements of the minor list.  In the second example, we've requested all elements of the major list, and only the first element of each minor list obtained. However, the `genfromtxt` function has some additional keywords that can make this much easier.  The first line of our datafile has the names of each column, which we can use during loading.  Then we can use these names similar to how we use keys in a dictionary.

In [None]:
my_data = np.genfromtxt("../Datafiles/rmsd.dat",names=True)

In [None]:
my_data["RMSD_00002"]

Using column names can be very helpful when working with larger datasets such as what we might obtain from analysis programs (like `cpptraj`).

We can also skip lines in a file (with `skip_header`) or load in only specific columns (with `usecols`).


In [None]:
my_data = np.genfromtxt("../Datafiles/rmsd.dat",skip_header=9)

In [None]:
my_data

In [None]:
my_data = np.genfromtxt("../Datafiles/rmsd.dat",usecols=1)

In [None]:
my_data

In the `skip_header` example, we instruct `numpy` to ignore the first nine lines in the file, which correspond to the column names and then the first eight value-pairs in the file.  In the `usecols` example, we are specifying which column (with the first column set as 0) we want to load.  We can also combine keywords to get more specific portions of the file, or modify the behavior in general.

In [None]:
my_data = np.genfromtxt("../Datafiles/rmsd.dat",names=True,skip_header=9)

In [None]:
my_data

Note that in the cell above, we no longer get column names of "Frame" and "RMSD_00002", but rather the values from the first non-skipped line.  It is important to be aware of which keywords take priority over others and to be careful to check your data as you go.

`Numpy` has a number of more complex and useful functions for math, such as found in the `.linalg` submodule.  

Let's consider a $3x3$ matrix and a 3-element vector.

What if we wanted the determinant of the matrix?  The dot-product of the matrix with the vector? The cross product? `Numpy` can help!  The following cells show the results of each of these different functions.  Note that the `det` function is found inside the `.linalg` submodule, whereas `.cross` and `.dot` are both in the main `numpy` module.

In [None]:
my_matrix = np.array([[3,1,2],
             [3,5,5],
             [6,7,4]])
my_vector = np.array([1,
             3,
             5])

In [None]:
# Determinant of a matrix is a scalar value.
np.linalg.det(my_matrix)

In [None]:
# Cross product of two matrices, assuming they obey MxN and NxP dimensionality, is NxN
np.cross(my_matrix,my_vector)

In [None]:
# Dot product of MxN and NxP matrices is MxP
np.dot(my_matrix,my_vector)

We can also perform more complicated arithmetic functions like matrix diagonalization.

In [None]:
# This is NOT diagonalization of the matrix, this simply returns the diagonal OF the matrix as it currently is.
np.diag(my_matrix)

Matrix diagonalization with `numpy` returns two datasets - the eigenvalues and their corresponding eigenvectors.

In chemistry, these are both pretty important sets of numbers to be able to obtain...

In [None]:
# To diagonalize the matrix, we use the linalg.eigh() function.
eigvals,eigvecs = np.linalg.eigh(my_matrix)

In [None]:
eigvals

In [None]:
eigvecs

We can also perform other matrix operations like transpose or inverse (only for square matrices).

In [None]:
my_matrix.T

In [None]:
np.linalg.inv(my_matrix)

Now, let's say we want to get an array of values, but we don't know what those values are to begin with.  We can simply initialize an array of zeroes of any length or even dimensional shape.

In [None]:
np.zeros(10)

In [None]:
np.zeros([3,4])

What if we have a list of numbers that should really be a 2D array?
We can `.reshape()` an array pretty easily.

In [None]:
my_array = np.arange(1,13)
print(my_array)

In [None]:
my_array.reshape(3,4)

Or, if we have a matrix that we need flattened into a 1-dimensional list?  Or reshaped from one 2D array to another?


In [None]:
my_matrix = np.array([[ 1,  2,  3,  4],
             [ 5,  6,  7,  8],
             [ 9, 10, 11, 12]])

In [None]:
my_matrix.flatten()

In [None]:
my_matrix.reshape([6,2])

Similar to the `zeros` function, there are the `ones` ($N$-length) and `full`($N$-length of $X$ value) functions as well.

In [None]:
np.ones(10)

In [None]:
np.full(10,5)

How about if we need some quick, random test data (perhaps for a future notebook demonstration)?

For $N$ random values between 0 and 1, we use

```python
np.random.rand(N)
```

In [None]:
np.random.rand(10)

For $N$ random integers between $X$ and $Y$, we use
```python
np.random.randint(X,Y,N)
```

In [None]:
np.random.randint(50,100,15)

We can get more complicated, too, such as requesting random numbers that fall inside a normal distribution (Bell curve).  A set of $N$ values with mean $A$ and standard deviation $S$ is called like this
```python
np.random.normal(A, S, N)
```
In the cell below, notice how most of the values lie within the range of $A\pm S$, though there may be some that lie outside of that range.  In fact, if you were to use a larger random set (say 100,000 elements), you should get pretty close to the established values for normal distributions, where 68% of values fall within $1S$, 95% fall within $2S$, and 99.7% fall within $3S$.

Testing that is left as an exercise for later.

In [None]:
my_normal_random = np.random.normal(10, 1.0, 10)
print(my_normal_random)

The `numpy` library is expansive and covers far more than what we can really get into over the course of a single day.  However, the library is well-documented and that documentation is readily accessible online

[Numpy Routines](https://numpy.org/doc/stable/reference/routines.html)

A good habit to form and follow with python programming is to check if `numpy` has a function to do the thing you need before you spend time developing something yourself.  You may be pleasantly surprised!