# 10. Numpy

## 10.1 A short introduction to NumPy

NumPy stands for "Numerical Python" and, along with some other libraries (SciPy, Pandas, etc.), is the core library used for scientific computing. It contains a big number of tools and functions that can be used to solve an array of common and not so common problems, some examples of which you'll see below. However, most importantly, NumPy contains the all powerful NumPy arrays!

For an example of how NumPy can be used in scientific computing, see the following code for calculating the Euclidean distance between two points:

In [None]:
import numpy
print(numpy.linalg.norm(numpy.array([1,2,3]) - numpy.array([1,2,3])))
print(numpy.linalg.norm(numpy.array([1,2,3]) - numpy.array([3,2,1])))

Many scientific computing libraries, e.g. MDAnalysis (see section 11), rely on NumPy to hold and operate on the data passed to them.

In this section, we will very briefly introduce the NumPy library, this will make work in Section 11 easier to understand.

## 10.2 The NumPy array

A numpy array is a bit like an improved version of a Python list,. It "is a high-performance multidimensional array object that is a powerful data structure for efficient computation of arrays and matrices. To work with these arrays, there’s a huge amount of high-level mathematical functions which operate on these matrices and arrays." 

In other words, an array is a good structure to store data from 1D, 2D, 3D, or higher dimensional matrices.

This means that arrays can have rows and columns. In a 2D array, rows can also be called as the "axis 0" while columns are the "axis 1". The number of axis will go up according to the dimentions of the arrays so a 3D array would also have an "axis 2". These axes are useful when it comes to manipulating the data in your arrays. 

We will see some hands on examples of arrays below.

First, let us build a 2D array.

NumPy has an `numpy.empty` object which will allow you to build any sized "empty" NumPy array. However, when doing this the array will be returned with array elements set to any random value. Instead, we often want to initialise our array to a set value to begin with.

There are two main methods for doing this in NumPy, `ones` and `zeros`. These create NumPy arrays filled with 0 and 1 respectively.

Here we will build an array filled with the value 1, using `numpy.ones`. We are passing the argument (5,3) which dictates the shape of the array. The first value details the size of axis 0 (5), and the second value the size of axis 1 (3).

In [None]:
# An example of a 2D array
my_2d_array = numpy.ones((5, 3))

print(my_2d_array)

NumPy arrays have many built-in methods, for example, the `shape` method will return the length of the array along each axis.

In [None]:
# Print out the shape of `my_array`
print(my_2d_array.shape)

NumPy arrays also have special data types (named dtypes) which describe the values held within the array.

Examples include; `numpy.int64`, `numpy.float64`, and `numpy.complex64`.

The type can be set when building the array using the `dtype` argument to methods such as; `numpy.ones`, `numpy.zeros`, and `numpy.empty`.

Each array will have a `dtype` attribute which will tell you what the array assumes the underlying data to be. In this case, we see that the `dtype` of `my_2d_array` is `numpy.float64` which is the default data type of `numpy.ones`:

In [None]:
# Print out the data type of `my_array`
# Similar to the type() function
print(my_2d_array.dtype)

Similarly, we could use the same idea to build a 3D array of shape (2,5,3), but this time filled with integers `dtype=numpy.int64`:

In [None]:
# This is an example of a 3D array
my_3d_array = numpy.ones((2, 5, 3), dtype=numpy.int64)

print(my_3d_array)

# Print out the shape of `my_array`
print(my_3d_array.shape)

# Print out the data type of `my_array`
print(my_3d_array.dtype)

**Note:** The number in the `numpy.int64` `dtype` declaration stands for the size of the data type in bits. In this case, we are asking for a 64 bit integer, which means that it can hold any integer numbers that range from `-9223372036854775808` to `9223372036854775808`.

## 10.3 Manipulating NumPy arrays

You can also add, subtract, multiply or divide your arrays.

In [None]:
my_new_array = my_2d_array + 1

print(my_new_array)

In [None]:
my_new_array = my_2d_array + my_3d_array

print(my_new_array)

In order to perform arithmetic, and other, operations on two, or more arrays, there are certain criteria that need to be fulfilled.

Firstly, their dimensions need to be compatible. This is the case when they are equal.

Secondly, two dimensions are compatible when one of them is 1.

Thirdly, the arrays need to be compatible in all directions.

Although standard use of `-`, `+`, `/` and `*` operators work, NumPy also provides a series of mathematical functions to achieve several types of array manipulations.

For example one could calculate the dot product of two arrays using:

In [None]:
# Here we use the numpy.array constructor to build arrays with pre-specified values.
array_1 = numpy.array([[1, 0], [0, 1]])
array_2 = numpy.array([[4, 1], [2, 2]])

numpy.dot(array_1, array_2)

NumPy also includes several functions to analyse/manipulate the data in NumPy arrays.

For example, we can histogram a 1D array in the following manner:

In [None]:
array_1D = numpy.array([1,2,1])

# We histogram the above 1 dimensional array with bins edges 0, 1, 2, and 3
# This means that we expect the histogram to return:
# * 0 in the 0->1 bin
# * 2 in the 1->2 bin
# * 1 in the 2->3 bin

histogram, bins = numpy.histogram(array_1D, bins=[0, 1, 2, 3])

print(histogram)

We could be spending hours talking about arrays and all the functionality of NumPy so this is where we will leave this part. If you are interested in learning more about NumPy, we recommend this very good tutorial from datacamp:

https://www.datacamp.com/community/tutorials/python-numpy-tutorial


## Review

In this section we covered the following:
- The basic concept of the NumPy library.
- How to create basic NumPy arrays.
- The basics of NumPy array manipulation.
- How NumPy functions can be used to analyse NumPy arrays.