# Programming For Chemists: Basic NumPy and arrays 

NumPy stands for *numerical python* and is the universal standard for working with numerical data in Python. It is largely responsible for the popularity of the Python language within the sciences due to its multidimensional array and the ability to perform a wide variety of mathematical operations on arrays. It supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices; implemented around very fast numerical C-libraries including [BLAS](http://www.netlib.org/blas/) and [LAPACK](http://www.netlib.org/lapack/). **Result:** You get the speed of C with the friendlier syntax of Python.

NumPy is possibly **the** most important Python library we will cover in this course; unfortunately only scratching the surface. We will cover some of the important and useful functionality, then apply what we learn to some worked examples.  

Before we begin we will briefly discuss running Python code locally using a text editor on your own computer.

to## Running Python Locally

So far we have been using Jupyter notebooks to do our programming but this is not a requirement to run Python code. Python code is conventionally typed in a text editor and run locally on a users' computer without need for internet access. As you apply your programming knowledge in your own projects and future studies it will be of great benefit to know how this is done, which is now demonstrated using the Integrated Development Environment (IDE) [Visual Studio Code](https://code.visualstudio.com/). IDEs are software applications that provide comprehensive facilities to computer programmers. They contain a lot of useful features which assist you whilst you program, such as syntax autocomplete, function lookup, variable tracker, data type lists and many more. The following are recommended IDEs:

1. [Visual Studio Code](https://code.visualstudio.com/)
2. [Spyder](https://www.spyder-ide.org/)
3. [PyCharm](https://www.jetbrains.com/pycharm/)

I personally use Visual Studio Code as it offers support for nearly every known programming language. If you use multiple programming languages it can be cumbersome to have multiple IDEs corresponding to each language which you need to change to when needed. Visual Studio Code allows me to do all my programming in a single software. It is also available on all 3 main operating systems, Linux, Mac and Windows. Follows is a step-by-step guide to downloading and installing Visual Studio Code.

The following is how to program and run a Python program locally:

1. Create file with extension `.py` symbolising a Python file.
2. Type your Python code and save the file.
3. Open your terminal/command prompt and navigate to the directory where your file is located.
4. Type `python name_of_file.py` and press enter.
    
![Python Example in Ubuntu](https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/PythonExampleUbuntu.gif)

## Installing NumPy

If NumPy is not installed on your system 

## Importing NumPy

In order to start using NumPy and all of the functions available in NumPy, you’ll need to import it. This is done using the import statement:

In [2]:
import numpy as np

where we have assigned `numpy` to `np` in order to save time.

## NumPy Arrays

An array is the central data structure of the NumPy library, representing a grid of values containing information about the raw data, how to locate an element, and how to interpret an element. The elements are all of the same type, referred to as the array `dtype` (data type). One way to initialize NumPy arrays is using Python lists, using nested lists for two- or higher-dimensional data. For example:

In [17]:
# Build a 1D array from a python list
a = np.array([1, 2, 3, 4, 5, 6])

print(a)

[1 2 3 4 5 6]


In [18]:
# Build a 2D array from nested python lists
b = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


We can access the elements in the array using square brackets. Remember that indexing in NumPy starts at 0. That means that if you want to access the first element in your array, you’ll be accessing element `0` not `1`.

In [21]:
# Get the first element from the numpy array, a. 
print(a[0])

1


To extract a specific matrix element, we can use the notation [row,column]:

In [24]:
# Extract element from first row and second column
print(b[0,3])
# Extract element from third row and third column
print(b[2,2])

4
11


You might occasionally hear an array referred to as a `ndarray`, which is shorthand for 'N-dimensional array.' An N-dimensional array is simply an array with any number of dimensions; 1-D, 2-D, 3-D etc... The NumPy `ndarray` class is used to represent both matrices and vectors, where a **vector** is an array with a single dimension (there’s no difference between row and column vectors in NumPy), while a **matrix** refers to an array with two dimensions. For 3-D or higher dimensional arrays, the term **tensor** is the preferred terminology; but these will not be covered in this introduction as vectors and matrices are far more commonplace.

### Useful Array Commands

**np.zeros():**

Besides creating an array from a sequence of elements, you can easily create an array filled with zeros:

In [26]:
# Create 5 x 5 array of zeros
np.zeros([5,5])

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

**np.ones():**

Or an array filled with 1’s:

In [27]:
np.ones([5,5])

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

Or even an empty array! 

**empty():**

The `empty` function creates an array whose initial content is random and depends on the state of the memory. The reason you may want to use `empty` over `zeros` is speed:

In [29]:
np.empty([5,5])

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

If using `empty` make sure to **fill all of the elements.** You can also create an array with a range of elements: 

In [34]:
np.arange(0, 11, 2)

array([ 0,  2,  4,  6,  8, 10])

NumPy arrays are not just limited to the default `np.float64` data type; and you can explicitly specify which data type you want using the `dtype` keyword:

In [38]:
# Construct a 5 x 5 array of non-numeric 1's (string data type! str)
m = np.ones([5,5], dtype=np.str)
print(m)

[['1' '1' '1' '1' '1']
 ['1' '1' '1' '1' '1']
 ['1' '1' '1' '1' '1']
 ['1' '1' '1' '1' '1']
 ['1' '1' '1' '1' '1']]


**Array dimensions:**

An important piece of information is the size and shape of our arrays. These can be extracted using the following:

`ndarray.ndim` tells you the number of dimensions, of the array.

`ndarray.size` tells you the total number of elements of the array.

`ndarray.shape` displays a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 4 columns, the shape of your array is (2, 4):

In [3]:
m = np.array([[0, 1, 2],
              [3, 4, 5],
              [6, 7, 8]])

# Extract the number of dimensions of the array
print(m.ndim)
# Extract the number of elements in the array
print(m.size)
# Extract the shape of the array
print(m.shape)

2
9
(3, 3)


An very useful function is `np.reshape` which allows you to change the shape of the array without changing the data. When using the `reshape` method, the array you want to produce **needs** to have the same number of elements as the original array; you cannot lose or gain elements!

In [52]:
# Create a 1D numpy array (vector)
a = np.arange(10)

# Resize the array into 5 rows and 2 columns; assigning to variable b
b = a.reshape(5,2)

print(a)

print(b)

[0 1 2 3 4 5 6 7 8 9]
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


**Slicing:**

Slicing means taking elements from an iterable between two given indices. In Python this is given the syntax : `[start:end]` or `[start:end:step]`

If you don't provide `start` it is considered 0. If you don't provide `end` it is considered to be the length of the iterable in that dimension, and if `step` is not provided it is considered 1. **Note:** The result includes the start index, but excludes the end index:

In [12]:
# create numpy array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# extract second element
print(arr[1],"\n")

# slice row starting from index 0 -> 2
print(arr[0:2], "\n") 

# slice rows starting from index 1 but do not provide the end index
print(arr[1:], "\n")

# slice rows starting from index = -2 and do not provide the end index
print(arr[-2:], "\n")

# slice columns starting from index =1 and do not provide the end index
print(arr[:2,1:], "\n")

# slice columns starting from index =1 and do not provide the end index
print(arr[:2,:1])

[4 5 6] 

[[1 2 3]
 [4 5 6]] 

[[4 5 6]
 [7 8 9]] 

[[4 5 6]
 [7 8 9]] 

[[2 3]
 [5 6]] 

[[1]
 [4]]


Indexing and slicing operations can sometimes be difficult to visualise; the following figures hopefully can assist aid in their understanding:

![Python Example in Ubuntu](https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/numpy_slicing_graphic.png)

**Conditional Slicing:**

We can also select values from our array that fulfill certain conditions. Consider the following NumPy array:

In [8]:
m = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

Lets print all the values that are less than 4:

In [9]:
print(m[m < 4])

[1 2 3]


We can also apply multiple conditions using the `&` (and) and `|` (or):

In [10]:
print(m[(m < 6) & (m !=3)])

[1 2 4 5]


**hstack() and vstack():**

It is often desirable to stack two arrays horizontally or vertically; done using `vstack()` and `hstack()`:

In [33]:
m1 = np.array([[1,2], 
               [10,12]])

print("m1=", m1)

m2 = np.array([[4,1], 
               [0,0]])

print("m2=", m2)

# horizontally stack the two arrays
horizontal_stack = np.hstack((m1, m2))

# vertically stack the two arrays
vertical_stack = np.vstack((m1, m2))

print("horizontal_stack:")
print(horizontal_stack)
print("vertical_stack:")
print(vertical_stack)

m1= [[ 1  2]
 [10 12]]
m2= [[4 1]
 [0 0]]
horizontal_stack:
[[ 1  2  4  1]
 [10 12  0  0]]
vertical_stack:
[[ 1  2]
 [10 12]
 [ 4  1]
 [ 0  0]]


### Array Operations

Mathematical operations on arrays ar

First, a reminder of matrix multiplication:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/matrix-multiplication-0.jpg" width="400" height="400" /></center>
