# Programming For Chemists: Basic NumPy and arrays 

**Importance for scientists:**
* Fast numerical computation.
* Handling and manipulation of scientific data.
* Pandas, a library we discuss in the next session is heavily relient on NumPy.
* Nearly all mathematical programs written in Python will inevitably use NumPy. 

NumPy stands for *numerical python* and is the universal standard for working with numerical data in Python. It is **largely responsible for the popularity of the Python language within the sciences** due to its multidimensional array and the ability to perform a wide variety of mathematical operations on arrays. 

* It supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices; implemented around very fast numerical C-libraries including [BLAS](http://www.netlib.org/blas/) and [LAPACK](http://www.netlib.org/lapack/). **Result:** You get the speed of C with the friendlier syntax of Python.

* NumPy is possibly **the** most important Python library we will cover in this course; unfortunately only scratching the surface. 
* We will cover some of the important and useful functionality, then apply what we learn to some worked examples. 
* We will focus mainly on arrays and how they relate to matrices; one of the most useful tools in the sciences. 

## Importing NumPy

In order to start using NumPy and all of the functions available in NumPy, you’ll need to import it which can be done in several ways:

* Import the library as-is:

In [None]:
import numpy

* This works fine, but when we reference functions from the library in the next section we will have to write `numpy.name_of_func` each time. 
* We can instead import the library and assign it to a different name:

In [None]:
import numpy as np

* Where we have assigned `numpy` to `np`. Now we can reference the methods as `np.name_of_func` which is faster to write and less prone to mis-typing! **This code line has to be run to load the NumPy library for the rest of the worksheet!**

* You can also import specific functions from a library using the following syntax, e.g. importing the dot product function:

In [None]:
from numpy import dot

## NumPy Arrays

An array is the central data structure of the NumPy library, representing a grid of values containing information about the raw data, how to locate an element, and how to interpret an element. 

* The elements are all of the same type, referred to as the array `dtype` (data type). 
* One way to initialize NumPy arrays is using Python lists, using nested lists for two- or higher-dimensional data. <font color='red'>Check the data type of the following array using the `type()` function:</font>

In [None]:
# Build a 1D array from a python list
a = np.array([1, 2, 3, 4, 5, 6])

print(a)

In [None]:
# Build a 2D array from nested python lists
b = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print(b)

* We can access the elements in the array using square brackets `[]`. 
    * Remember that **indexing in NumPy starts at 0.** That means that if you want to access the first element in your array, you’ll be accessing element `0` not `1`.

In [None]:
# Get the first element from the numpy array, a. 
print(a[0])

* To extract a specific element in a 2D array, we can use the notation `[row,column]`:

In [None]:
# Extract element from first row and second column
print(b[0,3])
# Extract element from third row and third column
print(b[2,2])

* You might occasionally hear an array referred to as a `ndarray`, which is shorthand for 'N-dimensional array.' 
    * An N-dimensional array is simply an array with any number of dimensions; 1-D, 2-D, 3-D etc... The NumPy `ndarray` class is used to *represent* vectors, matrices and more generally tensors.

### Exercise

<font color='red'>Turn the following 1D-array into a 2D-array and print to screen:</font>

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

### Useful Array Commands

**np.zeros():**

* Besides creating an array from a sequence of elements, you can easily create an array filled with zeros:

In [None]:
# Create 5 x 5 array of zeros
np.zeros([5,5])

**np.ones():**

* Or an array filled with 1’s:

In [None]:
np.ones([5,5])

**empty():**

* The `empty` function creates an array whose initial content is random and depends on the state of the memory. The reason you may want to use `empty` over `zeros` is speed:

In [None]:
np.empty([5,5])

* If using `empty` make sure to **fill all of the elements.** 
* You can also create an array with a range of elements: 

In [None]:
np.arange(0, 11, 2)

* NumPy arrays are not just limited to the default `np.float64` **(double precision)** data type; and you can explicitly specify which data type you want using the `dtype` keyword:

In [None]:
# Construct a 5 x 5 array of non-numeric 1's (string data type! str)
m = np.ones([5,5], dtype=np.str)

print(m)

**Array dimensions:**

* An important piece of information is the size and shape of our arrays. These can be extracted using the following:

    `ndarray.ndim` tells you the number of dimensions, of the array.
    
    `ndarray.size` tells you the total number of elements of the array.

    `ndarray.shape` displays a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 4 columns, the shape of your array is (2, 4):

In [None]:
m = np.array([[0, 1, 2],
              [3, 4, 5],
              [6, 7, 8]])

# Extract the number of dimensions of the array
print(m.ndim)

# Extract the number of elements in the array
print(m.size)

# Extract the shape of the array
print(m.shape)

* A very useful function is `np.reshape` which allows you to change the shape of the array without changing the data. When using the `reshape` method, the array you want to produce **needs** to have the same number of elements as the original array; you cannot lose or gain elements!

In [None]:
# Create a 1D numpy array (vector)
a = np.arange(10)

# Resize the array into 5 rows and 2 columns; assigning to variable b
b = a.reshape(5,2)

print(a)

print(b)

**Slicing:**

Slicing means taking elements from an iterable between two given indices. In Python this is given the syntax : `[start:end]` or `[start:end:step]`

* If you don't provide `start` it is considered 0. If you don't provide `end` it is considered to be the length of the iterable in that dimension, and if `step` is not provided it is considered 1. **Note:** The result includes the start index, but excludes the end index:

In [None]:
# create numpy array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# extract second element
print(arr[1],"\n")

# slice row starting from index 0 -> 2
print(arr[0:2], "\n") 

# slice rows starting from index 1 but do not provide the end index
print(arr[1:], "\n")

# slice rows starting from index = -2 and do not provide the end index
print(arr[-2:], "\n")

# slice columns starting from index =1 and do not provide the end index
print(arr[:2,1:], "\n")

# slice columns starting from index =1 and do not provide the end index
print(arr[:,:2])

Indexing and slicing operations can sometimes be difficult to visualise; the following figures hopefully can assist aid in their understanding:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/numpy_summary/1_trim.png" width="600" height="auto" /></center>
<br/><br/>
<br/><br/>
<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/numpy_summary/2_trim.png" width="600" height="auto" /></center>

**Conditional Slicing:**

* We can also select values from our array that fulfill certain conditions. Consider the following NumPy array:

In [None]:
m = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

* Lets print all the values that are less than 4:

In [None]:
print(m[m < 4])

* We can also apply multiple conditions using the `&` (and) and `|` (or):

In [None]:
print(m[(m < 6) & (m !=3)])

**hstack() and vstack():**

* It is often desirable to stack two arrays horizontally or vertically; done using `vstack()` and `hstack()`:

In [None]:
m1 = np.array([[1,2], 
               [10,12]])

print("m1=", m1)

m2 = np.array([[4,1], 
               [0,0]])

print("m2=", m2)

# horizontally stack the two arrays
horizontal_stack = np.hstack((m1, m2))

# vertically stack the two arrays
vertical_stack = np.vstack((m1, m2))

print("horizontal_stack:")
print(horizontal_stack)
print("vertical_stack:")
print(vertical_stack)

### Arithmetic Operations

We now get to the major scientific application of NumPy; the ability to apply numerous mathematical operations to `ndarray()`s. Large parts of physics and chemistry are dominated by matrix algebra, and NumPy is the perfect tool to computationally handle these matrices. An important point must be stressed before continuing; done using the following examples of arithmetic operations on arrays:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/numpy_summary/array_arithmetic.png" width="600" height="auto" /></center>

* The following code implements the above graphical examples:

In [None]:
# Create 2 numpy arrays
array1 = np.array([[1, 3],
                   [3, 1]])

array2 = np.array([[4, 5],
                   [5, 4]])

print("Addition:")
print(array1 + array2)

print("Subtraction:")
print(array1 - array2)

print("Multiplication:")
print(array1 * array2)

print("Division:")
print(array1 / array2)

* The results from addition and subtraction are expected, **but multiplication and division are incorrect?**
* Here is a quick reminder of matrix multiplication rules for multiplying two matrices `A` and `B` which the above example does not do:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/matrix-multiplication-0.jpg" width="400" height="400" /></center>

* This leads to the important conclusion: **2D NumPy arrays are not conventional matrices**, and the standard operations `*, +, -, /` work **element-wise** on arrays. 
* Correct matrix multiplication can be achieved using NumPy arrays in the following ways:
    * As of Python 3.5+ the `@` operator will perform the typical matrix multiplication.
    * `np.matmul()` is a separate function in NumPy which calculates the matrix product of two arrays. This is preferred over the older `np.dot()` which also does matrix multiplication.

In [None]:
m1 = np.array([[1, 3],
             [3, 1]])

m2 = np.array([[4, 5],
             [5, 4]])

print("@ operator:")
print(m1 @ m2)

print("np.matmul():")
print(np.matmul(m1, m2))

print("np.dot():")
print(np.dot(m1, m2))

### More useful array operations

In [None]:
data = np.array([[1],
                 [3],
                 [6],
                 [11]])

# calculate maximum value in array
print(data.max())

# calculate minimum value in array
print(data.min())

# calculate sum of elements in array
print(data.sum())

# calculate product of elements in array
print(data.prod())

# calculate average of elements in the array
print(data.mean())

# calculate standard deviation of array
print(data.std())

# calculate transpose of array
print(data.transpose())

# reverse order of elements in array
print(np.flip(data))

### Access Docstring for more information (proof they are useful!)

* Every object contains the reference to a string, which is known as the **docstring**, discussed in session 3.
* In most cases, this docstring contains a quick and concise summary of the object and how to use it. 
* Python has a built-in `help()` function that can help you access this information. 
    * This means that nearly any time you need more information, you can use `help()` to quickly find lookup the syntax and purpose of a function; very useful given the large number of NumPy functions:

In [None]:
help (np.dot)

### NumPy Linalg Module vs. SciPy

The NumPy linear algebra module, `linalg` relies on **BLAS** and **LAPACK** to provide efficient low level implementations of standard linear algebra algorithms. 

* Linear algebra is the study of linear sets of equations, built upon matrix algebra; and the `linalg` module contains many relevant and useful functions which we may want to use in physics and chemistry including:

| Function             | Purpose                                                             | 
|:---------------------|:--------------------------------------------------------------------|  
| [np.linalg.det()](https://numpy.org/doc/stable/reference/generated/numpy.linalg.det.html#numpy.linalg.det)      | Calculate matrix determinant                                        |
| [np.linalg.inv()](https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html?highlight=inv#numpy.linalg.inv)     | Calculate matrix inverse                                            |
| [np.linalg.trace()](https://numpy.org/doc/stable/reference/generated/numpy.trace.html#numpy.trace)    | Return the sum along diagonals of the array                         |
| [np.linalg.eig()](https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html?highlight=eig#numpy.linalg.eig)      | Compute the eigenvalues and right eigenvectors of a square array    | 
| [np.linalg.solve()](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html?highlight=solve#numpy.linalg.solve)    | Solve a linear matrix equation, or system of linear scalar equations| 

In [None]:
arr = np.array([[1, 2, 3],
                [2, 5, 6],
                [3, 6, 8]])

# calculate determinant
print("Determinant:")
print(np.linalg.det(arr))
print("\n")

print("Inverse:")
# calculate matrix inverse
print(np.linalg.inv(arr))
print("\n")

print("Trace:")
# calculate matrix trace
print(np.trace(arr))
print("\n")

print("Eigenvalues:")
# calculate eigenvalues, returns both eigenvalues and eigenvectors
print(np.linalg.eig(arr))

#### SciPy

* There is another incredibly useful library called [SciPy](https://www.scipy.org/); which is a Python-based ecosystem of open-source software for mathematics, science, and engineering. It contains modules for:

    * optimization, 
    * linear algebra, 
    * integration, 
    * interpolation, 
    * special functions, 
    * FFT, 
    * signal and image processing, 
    * ODE solvers and many more, 

* If you understand NumPy syntax then you **will understand Scipy syntax** as it is built ontop and extends NumPy for scientific applications.

* If you need access to any linear algebra routine **SciPy should be the first port of call** as it far more comprehensive than NumPy and is updated more regularly. See [here](https://docs.scipy.org/doc/scipy/reference/linalg.html) for a list of SciPy functions. These are nearly all built on fast C routines written in LAPACK and BLAS. We will learn more about SciPy in session 6.

# Worked Example: Price Comparison

* Curie, Dirac, Einstein and Franklin are looking to buy some lab equipment with two available suppliers, \\(S_1\\) and \\(S_2\\). Each
of them needs a differing amount of each item and due to the shops being far apart they will only visit a single supplier. Which shop will result in the lowest total bill for their lab equipment?

* Here is a table of required lab equipment by each person:

| Person   | Conical Flask  | Ph Probe  | Beaker  | Clamp  |
|:---------|:--------------:|:---------:|:-------:|:------:|  
| Curie    | 6              | 5         | 3       | 1      | 
| Dirac    | 3              | 6         | 2       | 2      |
| Einstein | 3              | 4         | 3       | 1      |
| Franklin | 6              | 2         | 1       | 3      |

* Here is the table of prices from each supplier:

| item           | Price \\(S_1\\) / £  | Price \\(S_2\\) / £  |
|:---------------|:--------------------:|:--------------------:|
| Conical Flask  |   1.50               | 1.00                 |
| Ph Probe       |   6.00               | 6.50                 |
| Beaker         |   3.00               | 2.50                 |
| Clamp          |   4.00               | 5.00                 |

* For example, the amount spent by Curie with supplier \\(S_1\\) is:

$$
( 6 \times £1.50) + (5 \times £6.00) + (3 \times £3.00) + (1 \times £4.00) = \text{£} 52
$$

* The solution to this problem can be represented as a matrix multiplication problem. The first matrix represents the **demand** matrix and the second the **cost** matrix:

$$
\textbf{demand} = 
\begin{pmatrix}
    6 & 5 & 3 & 1 \\
    3 & 6 & 2 & 2 \\
    3 & 4 & 3 & 1 \\
    6 & 2 & 1 & 3 
\end{pmatrix} \hspace{0.5cm}
\textbf{cost} = 
\begin{pmatrix}
    1.50 & 1.00 \\
    6.00 & 6.50 \\
    3.00 & 2.50 \\
    4.00 & 5.00 \\
\end{pmatrix}
$$

* The total cost for each individual can be calculated by multiplying these two matrices:

$$
    \textbf{T} = \textbf{demand} \times \textbf{cost}
$$

<font color='red'>Finish off the following program that multiplies these two arrays:</font>

In [None]:
import numpy as np

demand = np.array([[6, 5, 3, 1],
                   [3, 6, 2, 2],
                   [3, 4, 3, 1],
                   [6, 2, 1, 3]])
                   
cost = np.array([[1.50, 1.00],
                 [6.00, 6.50],
                 [3.00, 2.50],
                 [4.00, 5.00]])

Total_cost = demand 

print(Total_cost)

* The result is a matrix of total costs from each supplier (columns) for each person (rows). 
    * Dirac and Franklin will get the best price with supplier \\(S_1\\), 
    * Curie will get the best price from supplier \\(S_2\\),
    * Einstein has the same quote from both suppliers.

# Worked Example: Balancing Chemical Equations

* Consider the folowing reaction:

$$ 
\text{MgO} + \text{Fe} \rightarrow \text{Fe}_2\text{O}_3 + \text{Mg} 
$$

* This equation is not balanced:
    * We have 2 iron atoms on the right but only one on the left; and 1 oxygen on the left but 3 on the right. 

* This particular example can be done by inspection but for the purposes of this tutorial on NumPy, applying some of our accrued knowledge, we will convert it into a matrix problem and solve as the linear system

$$
Ax = b,
$$

* We can solve this using NumPy! We want to know the coefficient of each term in order to balance the equation which we can think of as the following equation:

$$
x_1\text{MgO} + x_2\text{Fe} \rightarrow x_3\text{Fe}_2\text{O}_3 + x_4\text{Mg}
$$

* where \\(x_1, x_2, x_3, x_4\\) are the coefficients to be found. We can now create a vector of the form

$$
\begin{pmatrix}
\text{number of Mg atoms} \\
\text{number of Fe atoms} \\
\text{number of O atoms}
\end{pmatrix},
$$

* and count the number of each element in each term in the chemical reaction:

$$
x_1\begin{pmatrix}
    1 \\
    0 \\
    1
\end{pmatrix} + 
x_2\begin{pmatrix}
    0 \\
    1 \\
    0
\end{pmatrix} \rightarrow 
x_3\begin{pmatrix}
    0 \\
    2 \\
    3
\end{pmatrix} + 
x_4\begin{pmatrix}
    1 \\
    0 \\
    0
\end{pmatrix}.
$$

* We can subtract the \\(x_3\\) term over to the LHS and set \\(x_4=1\\) on the RHS to form \\(Ax = b\\)

$$
\overbrace{
\begin{pmatrix}
    1 & 0 & 0   \\
    0 & 1 & -2  \\
    1 & 0 & -3  
\end{pmatrix}}^{A}
\overbrace{
\begin{pmatrix}
    x_1 \\
    x_2 \\
    x_3 \\
\end{pmatrix}}^{x} = 
\overbrace{
\begin{pmatrix}
    1\\
    0 \\
    0 \\
\end{pmatrix}}^{b}.
$$

In [None]:
# create the 4 vectors as numpy arrays
vec1 = np.array([[1],
                 [0],
                 [1]]) 

vec2 = np.array([[0],
                 [1],
                 [0]]) 

vec3 = np.array([[0],
                 [-2],
                 [-3]]) 

b = np.array([[1],
              [0],
              [0]]) 

# combine the x_1, x_2 and x_3 vectors into a 2D-array using hstack()
A = np.hstack((vec1, vec2, vec3))

print(A)

* We now have the LHS and RHS of our equation; and can solve by calculating the inverse of \\(A\\) and multiply it by \\(b\\) on the RHS to solve for \\(x\\)

$$
    x = A^{-1}b.
$$

In [None]:
# calculate the inverse of matrix A
Ainv = np.linalg.inv(A)

# multiply Ainv by b (vec4) to calculate the x_i values
xvec = Ainv @ b

# extract the coefficient values from the vector
for i in range(xvec.size):
    print("x{} = {}".format(i, xvec[i]))

* Our solution is thus:

$$
    1\text{MgO} + \frac{2}{3}\text{Fe} \rightarrow \frac{1}{3}\text{Fe}_2\text{O}_3 + 1\text{Mg} 
$$

* In the above Python code we explicitly calculated the matrix inverse matrix multiplication, but we can use the inbuilt [np.solve()](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html) function which solves linear systems, achieving the same result:

In [None]:
np.linalg.solve(A, b)

## Review

In this session we covered:

* The purpose of NumPy and its key type, the ndarray.
* Useful commands applicable to arrays along with how to index and slice arrays.
* Arithmetic operations on arrays and the difference between how NumPy treats arrays and matrices.
* Brief overview of some of the key functions in the linear algebra module, `linalg`.
* How to run Python code locally using the IDE Visual Studio Code.

## Exercises

1. Create a \\(6 \times 3\\) array of your choosing and print the number of rows and columns.

In [None]:
import numpy as np

2. Write a NumPy program to compute the multiplication of two  matrices of your choosing.

3. Write a NumPy program to compute the inverse of a matrix of your choosing.

4. Create a \\(10 \times 10\\) array with random values and find the minimum and maximum values.

## Running Python Locally

* So far we have been using Jupyter notebooks to do our programming but this is not a requirement to run Python code. 
* Python code is conventionally typed in a text editor and run locally on a users' computer without need for internet access. 
* As you apply your programming knowledge in your own projects and future studies it will be of great benefit to know how this is done, which is now demonstrated using the Integrated Development Environment (IDE) [Visual Studio Code](https://code.visualstudio.com/). 

* **IDEs** are software applications that provide comprehensive facilities to computer programmers. 
    * They contain a lot of useful features which assist you whilst you program, such as:
        * syntax autocomplete, 
        * function lookup, 
        * variable tracker, 
        * data type lists and many more. 
        
* The following are my recommended IDEs:

1. [Visual Studio Code](https://code.visualstudio.com/)
2. [Spyder](https://www.spyder-ide.org/)
3. [PyCharm](https://www.jetbrains.com/pycharm/)

* I personally use Visual Studio Code as it offers support for nearly every known programming language. 
* If you use multiple programming languages it can be cumbersome to have multiple IDEs corresponding to each language which you need to change to when needed. 
    * Visual Studio Code allows me to do all my programming in a single software. It is also available on all 3 main operating systems, Linux, Mac and Windows. 
* **I am not sponsored by Visual Studio Code. it is just my preferred IDE.**

### Install Visual Studio Code

1. Visit the following link and download the software for your respective operating system:

    [Visual Studio Code](https://code.visualstudio.com/)

2. Check out the [introductory video series](https://code.visualstudio.com/docs/getstarted/introvideos) on the basics of using vscode.

3. Install the Python extension doing the following:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/PythonInstall.gif" width="1000" height="1000" /></center>

### Running Python in vscode

1. Create file with extension `.py` symbolising a Python file.
2. Type your Python code and save the file.
3. Press the play button near the top right of the vscode screen.

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/RunningPython.gif" width="900" height="900" /></center>