# Python Modules - NumPy and Scipy

*Dr Chas Nelson and Mikolaj Kundegorski*

*Part of https://github.com/ChasNelson1990/python-zero-to-hero-beginners-course*

## Objectives

* Know about numerical functions provided by NumPy (`numpy`)
* Understand the concept of a NumPy array (`numpy.ndarry`)
* Know about scientific functions provided by SciPy (`scipy`)
* Know how to do linear regression with `scipy.stats.linregress`
* Know how to access the NumPy and SciPy documentation

## Numpy

NumPy (`numpy`) is a large and extremely well developed module focussed on simple and complex mathematical functions and datatypes in Python. NumPy is a large module and we will only introduce you to a couple of functions today.

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.1:</strong> Create a new code cell beneath this cell and import the <code>numpy</code> module. It is conventional to give <code>numpy</code> module the alias <code>np</code>.
<br/>
If you get stuck, see the video <a href='https://youtu.be/KXEYPE4ryAU'>here</a> for a walkthrough, which also covers the next task.</div>

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.2:</strong> Find the NumPy Documentation on-line. Can you easily navigate the documentation to find useful functions?
<br/>
When you've done this and the previous task, or if you get stuck, see the video <a href='https://youtu.be/KXEYPE4ryAU'>here</a> for a walkthrough. which also covers the previous task.</div>

One of the key features of `numpy` is the introduction of a new datatype: the `numpy.ndarray`.

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.3:</strong> Search the on-line NumPy documentation to find the <code>numpy.ndarray</code> page. Create a new Markdown cell beneath this one - list the four most important features of a <code>numpy.ndarray</code> as discussed.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/O_XCZCXYx-I'>here</a> for a walkthrough.</div>

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.4:</strong> Run the code cell beneath this one to see how to create a simple <code>numpy.ndarray</code>. Note how NumPy has easy methods for calculating things like the mean and standard deviation of an array without having to write loops.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/moIMPOXX4pY'>here</a> for a walkthrough.</div>

In [None]:
# Create a 3x3 array with the number 1 to 9
myArray = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(myArray)

# Calculate the mean and standard deviation of myArray
print(f"The mean of myArray is {myArray.mean():.2f} ...")
print(f"...and the standard deviation is {myArray.std():.2f}.")

# Calculate the mean of each column and of each row
print("The mean of each column of myArray is:")
print(myArray.mean(axis=0))
print("The mean of each row of myArray is:")
print(myArray.mean(axis=1))

The `numpy.ndarray` is particularly important if you plan to analyse images or 2D+ data, e.g. geological recordings.

However, in the next notebook we will introduce the pandas DataFrame. This is another new data type and is built upon the `numpy.ndarray`. Many `numpy.ndarray` methods are also defined for pandas DataFrames.

### Slicing (...continued)

Just like with a `list`, it is often useful to access particular elements of a `numpy.ndarray`, e.g. access specific pixels in an image.

Also just like a list, this is done with square brackets: `[...]`.

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.5:</strong> The code cell beneath this one has a 1-dimensional `numpy.ndarray`. How might you access the fifth element of this array?</div>

In [None]:
my_array = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(my_array)

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.6:</strong> How might you access the second, third, fourth and fifth elements of this array?</div>

Similarly, we can navigate a 2D `numpy.ndarray` by using row and column numbers to extract a single element.

Remember: Python starts counting at zero so our axes are the 0th and 1st axes.
  
![Acessing pixels using axes.](../assets/arrays.png)

*Adapted from https://github.com/elegant-scipy/elegant-scipy*

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.7:</strong> The code cell beneath this one has a 2-dimensional `numpy.ndarray`. How might you access the second element of the second row of this array?</div>

In [None]:
my_2d_array = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])

print(my_2d_array)

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.8:</strong> The code cell beneath this one has a 3-dimensional `numpy.ndarray`. How might you access the second element of the second row of the second slice of this array?</div>

In [None]:
my_3d_array = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])

print(my_3d_array)

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.9:</strong> Now run the following cell. Do you recognise the error? Create a new Markdown cell and describe this error in a way that's clear to you.</div>

In [None]:
print(my_3d_array[1, 1, 2])

## SciPy

SciPy (`scipy`) is another large and extremely well developed module but is focussed on mathematical, scientific and engineering functions and datatypes for Python. SciPy is also to large to cover in detail so we will only introduce you to one key function right now.

### Importing SciPy

Importing `scipy` is a little bit unusual. `scipy` has several large submodules and if you want to access functions in these submodules, they must be loaded as individual modules. For example, say you wanted to do some linear regression (which is in the `scipy.stats` submodule) and some image processing (using functions from `scipy.ndimage`) you need to import both sumodule. E.g.:

```python
import scipy.stats
import scipy.ndimage
```

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.10:</strong> Run the following cell. The error is a little unusual, but make a note that (for SciPy) this indicates that you've not imported that submodule. Correct the cell so that it runs without errors.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/519-f0lbudc'>here</a> for a walkthrough.</div>

In [None]:
import scipy

# Load data
x = np.arange(0, 9, 1)  # create an array of the numbers 0 to 9
y = np.arange(0, 18, 2)  # create an array of the numbers 0 to 18 in steps of 2
im = np.zeros([10, 10])  # create a 10 by 10 array of zeros, i.e. an empty image

# Linear Regression of x and y
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)

# Apply Gaussianm Filter to im
imBlurred = scipy.ndimage.gaussian_filter(im, sigma=5)

### Linear Regression

The SciPy module contains a lot of useful stats functions including t-tests and linear regressions. Due to time constraints we will only explain the linear regression function (which we've already used above).

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.11:</strong> Load the documentation for <code>scipy.stats.linregress</code>. Create a Markdown cell beneath this one and write, in simple English, what each of the two parameters and five outputs mean.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/yw6TEfzuujM'>here</a> for a walkthrough.</div>

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.12:</strong> The data below represents some simple experimental data. You have two arrays: <code>time_seconds</code>, which records the time in seconds that the data was taken and <code>distance_metres</code>`, which records the distance travelled at that time (in metres). As you might notice, the times at which the data was taken are unevenly distributed (let's pretend that your colleague came in with a box of doughnuts and distracted you!) - and so you want to interpolate your data to given you measurements at evenly distributed points.

Now, you could do this with a for loop and lots of maths... but that isn't the Python way (if you can help it).
    
Working as a team, find an appropriate function in SciPy, go through the documentation together and try and create interpolated data for `new_time_seconds`.
    
We will plot this data next week.

Question: What does `np.arange()` do? How is it different to the `range()` function we've seen before?</div>

In [None]:
from scipy import interpolate

time_seconds = [0, 1, 2, 3, 5, 7, 14, 15, 16, 17, 18, 19]
distance_metres = [0, 9, 22, 30, 48, 74, 130, 148, 160, 170, 181, 189]

f = interpolate.interp1d

new_time_seconds = np.arange(0, 20, 1)

## Key Points

* NumPy and SciPy increase the functionality of Python significantly
* NumPy and SciPy provide mathematical, statistical, scientific and engineering functions
* Whilst NumPy and SciPy documentation can look overwhelming, it can easily be interpreted

## Any Bugs/Issues/Comments?

If you've found a bug or have any comments about this notebook, please fill out this on-line form: https://forms.gle/tp2veeF8e7fbQMvY6.

Any feedback we get we will try to correct/implement as soon as possible.