# Working efficiently with numbers : numpy

In scientific computing we primarily deal with data consisting of numbers (either experimental measurements or the result of solving equations) so having an efficient and effective approach to working with numbers is vital. We saw in the python fundamentals section that python knows about standard numerical types, including complex numbers. Whilst one can work with pure python to do numerical operations this is not typically the most effective approach for many scientific applications. 

Consider, for example, running an experiment where you take six repeat measurements of a length and you want to do some calculations with this. Your first step might be to put the values into a list:

In [None]:
measurements = [10.0, 10.5, 10.4, 10.0, 9.9, 15.0]

Now you might realise that you've recorded the values in mm but need them in metres for the rest of your work. Fine, just multiply them by 0.001 right? Well if you try `measurements = measurements * 0.001` you'll get an error message:

~~~python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 measurements = measurements * 0.001

TypeError: can't multiply sequence by non-int of type 'float'
~~~

which is telling us python doesn't know how to multiply a list (`measurements`) by a floating point number (0.001). Recall from the python fundamentals that when we multiplied a list by 2 it gave us two copies of the list instead of multiplying each value by 2. There was a task to use a loop to do this, and your solution may look like the below, which now scales each of our values by 0.001. 

In [None]:
the_values = [10.0, 10.5, 10.4, 10.0, 9.9, 15.0]

for i, value in enumerate(the_values):
    the_values[i] = value * 0.001

OK so we've now scaled everything into metres, why do we need a different approach? Well, there are many reasons. Firstly always looping over lists of values leads to verbose code, it would be much nicer if we could just write `the_values = the_values * 0.001` here. Secondly for each operation python has to check the data types of the values involved to work out what to do -- recall that lists can store different data types, `value * 0.001` might have to do different things depending on what type value is. This means that looping over lists can be quite slow.

Fortunately, the [`numpy` module](https://numpy.org) provides data types and a wide range of functions for dealing with numbers efficiently. We saw in the python fundamentals section that `numpy` provides routines for providing a number of zeros, for example. When printing these they look just like a list full of zeros, importantly however they are not a list but rather an "array" which is a new type defined by numpy (`ndarray`) designed specifically for efficient numerical operation. We can create empty arrays but we can also convert lists to arrays. For example, we can convert our measurements to an array and then scale all values by 0.001:

In [None]:
from numpy import array
measurements_array = array(measurements) * 0.001
print(measurements_array)

There are many advanced features of arrays which can help you write efficient code and there are some lectures and exercises at [scipy-lectures.org](https://scipy-lectures.org/intro/numpy/index.html) which go into more detail. As a minimum we need to know how to create arrays and access values.

## Creating arrays

We've already seen that we can create an array by converting an existing list, however often we will want to work with arrays from the outset. There are several ways we can create arrays and some examples are:

In [None]:
import numpy as np
# Create an array of length 10 filled with floating point zeros
ten_zeros = np.zeros(10)
print(ten_zeros)
# Create an array of length 4 filled with complex one
four_ones = np.ones(4, dtype = 'complex')
print(four_ones)
# Create an empty array of length 3
two_but_empty = np.empty(3)
print(two_but_empty) #This will show some uninitialised values, it could be zero or something very small
# Create an array from a list
array_from_list = np.array([1,2,3,4,5,6])
print(array_from_list)

There are also a number of utilities for producing arrays with certain values in. A very common one is [`linspace`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) which gives an array containing evenly spaced points between start and end, e.g.

In [None]:
#Create an array containing 11 evenly spaced values between 0.0 and 1.0
values = np.linspace(0.0, 1.0, 11)
print(values)

Finally, it is possible to make arrays with more than one dimension (e.g. a 2D array might look quite similar to a matrix) as shown below. Note here we also print the `shape` property of the array to show size of the dimensions.

In [None]:
#Creates an array with two dimensions (of sizes 2 and 3 respectively)
a_2D_array_of_ones = np.ones([2,3])
print(a_2D_array_of_ones)
print(a_2D_array_of_ones.shape)

## Manipulating arrays

Once you have an array most standard mathematical operations will work as one might expect (addition, subtraction, multiplication, division, exponentiation, etc.) but there are [many other operations](https://numpy.org/doc/stable/reference/routines.array-manipulation.html) one can apply. Going through all of these options is not in scope here, but I'll mention a few particularly useful ones.

Much like lists, it is possible to add extra elements to an array using [`append`](https://numpy.org/doc/stable/reference/generated/numpy.append.html#numpy.append)

In [None]:
my_array = np.ones(4)
print(my_array)
my_array = np.append(my_array, [2,3,4])
print(my_array)

It's also possible to change the shape of an array using [`reshape`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape)

In [None]:
my_array = np.ones(6)
print(my_array, my_array.shape)
my_array = np.reshape(my_array, [2,3])
print(my_array, my_array.shape)

## Indexing arrays

Much like lists, we can access individual elements in an array using `[]`. We can also ask for a range or subset of the values, this is often known as a slice and in general looks like `[<start>:<end>:<step>]` where the range starts at the index given by `<start>` and goes up to but *doesn't include* the value with index `<end>` taking steps of `<step>`. Each of these three values are optional, if `<start>` is not given it defaults to the start of the array, if `<end>` isn't given it defaults to the end of the array and `<step>` defaults to 1.

In [None]:
my_array = np.array([0,1,2,3,4,5,6])
# Print the whole array
print(my_array)
# Print the third element
print(my_array[2])
# Print the second to fourth values
print(my_array[1:4])
# Print every other value starting at the first one
print(my_array[1::2])
# Print the whole array
print(my_array[:])
# Print the whole array backwards (taking a step of -1!)
print(my_array[::-1])


Whilst the example showed us simply printing subsets we can operate on these as well. Suppose we've got an array of six numbers and we just want to halve the first two values, we could do

In [None]:
my_array = np.array([0,1,2,3,4,5])
my_array[:2] = my_array[:2] / 2
print(my_array)

We can even use arrays to specify which indices we want to act on

In [None]:
my_array = np.array([0,1,2,3,4,5])
indices = np.array([0,3])
my_array[indices] = 10.0
print(my_array)

Finally, we can use logic to only set values where certain conditions are met. For example suppose we have some measurements and we want to zero values which are larger than 1.5 times the mean we could do the following

In [None]:
measurements = np.array([5.0,6.0,4.5,7.0,5.5,10.0,2.0,1.0,4.0,3.0,1.0,2.0,4.0])
the_mean = np.mean(measurements) #Yes there's a function to calculate the mean (and the standard deviation etc)
print(the_mean)
# Where the meausrements are larger than 1.5 *the_mean set them to zero)
measurements[ measurements > 1.5 * the_mean ] = 0.0
print(measurements)

When dealing with arrays with dimensions greater than 1 you simply index each dimension with a comma separating the indices. For example suppose we make a 3D array and we want to print all of the values along the first dimension for some point in the other two dimensions, this might look like:

In [None]:
my_array = np.linspace(0.0,23.0,24).reshape([2,3,4])
print(my_array.shape) #Suppose this is nx, ny and nz
# Print all of the first dimension for a y index of 1 and a z index of 2
print(my_array[:, 1, 2])

## Other useful numpy functionality.

Numpy offers a wide range of methods which we haven't covered here including methods for working with random numbers, linear algebra, polynomials etc. Here we just highlight a few especially useful tools which you are likely to want to use.

### Saving and loading data
So far in our examples we've worked assuming that you have typed in measurements/data etc. Clearly this is error prone and cumbersome. In reality, one will often use digital acquisition as a part of the experiment (i.e. measurements are automatically captured and stored directly on the computer) or record manual measurements in a spreadsheet or some other tabular form. The [pandas module](https://pandas.pydata.org/) provides many methods for reading different data formats and manipulating large amounts of data. For less complicated situations then numpy offers a [`loadtxt`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) method which is designed for reading comma (or tab, space etc.) separated tabular data from a file directly into numpy arrays. There's an equivalent [`savetxt`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html) method which is very helpful for saving data to file in a simple format. Here we save some data and then read it into a different array

In [None]:
my_array = np.array([0,1,2,3,4,5])
np.savetxt('saved.txt', my_array)
new_array = np.loadtxt('saved.txt')
print(new_array)

### Fitting polynomials

Whilst we will encounter more powerful approaches to fitting data numpy provides several methods for fitting data with simple polynomials. These are the [`polyfit`](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html#numpy.polyfit) function and it's more recent replacement the [`Polynomial type`](https://numpy.org/doc/stable/reference/generated/numpy.polynomial.polynomial.Polynomial.fit.html#numpy.polynomial.polynomial.Polynomial.fit). Here's an example of using both approaches to fit a quadratic to some data

In [None]:
#First lets define a function which gives us quadratic data
def quad(x, a, b, c):
    return a * x * x + b * x + c
#Now let's make some data
x = np.linspace(0.0,2.0,24) #x values
y = quad(x, 1.0, 0.5, 0.25) #y values

#Now we can try to fit a quadratic to this using polyfit
first_fit = np.polyfit(x, y, 2)
print(first_fit) # Expect [1, 0.5, 0.25] to match a, b, c values used

#Now with the polynomial class
second_fit = np.polynomial.Polynomial.fit(x, y, 2, window = [x[0], x[-1]])
print(second_fit.coef) # Expect [0.25, 0.5, 1] to match a, b, c values used

Note how both return the same (expected) fitting coefficients. The second approach looks a lot more complicated but it can offer a lot of advantages in some scenarios, so it's useful to know both forms!