# Introduction to Numpy

- **Prerequisites**:  Users of this notebook should have a basic understanding of:
    - How to run a [Jupyter notebook](01_jupyter_notebooks.ipynb)


## Background

Numpy is a Python library which adds support for large, multi-dimension arrays and metrics, along with a large
collection of high-level mathematical functions to operate on these arrays. More information about Numpy arrays can
be found [here](https://en.wikipedia.org/wiki/NumPy).


## Description

This notebook is designed to introduce users to Numpy arrays of using Python code in Jupyter Notebooks via JupyterLab.

Topics covered include:

* How to use Numpy functions in a Jupyter Notebook cell
* Using indexing to explore multi-dimensional Numpy array data
* Numpy data types, broadcasting and booleans
* Using Matplotlib to plot Numpy data


## Getting started

To run this notebook, run all the cells in the notebook starting with the "Load packages" cell. For help with running
notebook cells, refer back to the [Jupyter Notebooks notebook](01_Jupyter_notebooks.ipynb).


### Load packages

In order to be able to use numpy we need to import the library using the special word `import`. Also, to avoid
typing `numpy` every time we want to use one if its functions we can provide an alias using the special word `as`:

In [None]:
import numpy as np

### Introduction to Numpy

Now, we have access to all the functions available in `numpy` by typing `np.name_of_function`. For example, the
equivalent of `1 + 1` in Python can be done in `numpy`:

In [None]:
np.add(1,1)

Although this might not at first seem very useful, even simple operations like this one can be much quicker
in `numpy` than in standard Python when using lots of numbers (large arrays).

To access the documentation explaining how a function is used, its input parameters and output format we can
press `Shift+Tab` after the function name.  Try this in the cell below

In [None]:
np.add

By default the result of a function or operation is shown underneath the cell containing the code. If we want to
reuse this result for a later operation we can assign it to a variable:

In [None]:
a = np.add(2,3)

The contents of this variable can be displayed at any moment by typing the variable name in a new cell:

In [None]:
a

### Numpy arrays

The core concept in numpy is the `array` which is equivalent to lists of numbers but can be multidimensional. To
declare a numpy array we do:

In [None]:
np.array([1,2,3,4,5,6,7,8,9])

Most of the functions and operations defined in numpy can be applied to arrays. For example, with the
previous operation:

In [None]:
arr1 = np.array([1,2,3,4])
arr2 = np.array([3,4,5,6])

np.add(arr1, arr2)

But a more simple and convenient notation can also be used:

In [None]:
arr1 + arr2

#### Indexing

Arrays can be sliced and diced. We can get subsets of the arrays using the indexing notation which
is `[start:end:stride]`. Let's see what this means:

In [None]:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

print('6th element in the array:', arr[5])
print('6th element to the end of array', arr[5:])
print('start of array to the 5th element', arr[:5])
print('every second element', arr[::2])

Try experimenting with the indices to understand the meaning of `start`, `end` and `stride`. What happens if you
don't specify a start? What value does numpy uses instead? Note that numpy indexes start on `0`, the same convention
used in Python lists.

Indexes can also be negative, meaning that you start counting from the end. For example, to select the last 2 elements
in an array we can do:

In [None]:
arr[-2:]

### Multi-dimensional arrays

Numpy arrays can have multiple dimensions. For example, we define a 2-dimensional `(1,9)` array using nested
square bracket:

<img src="../Supplementary_data/06_Intro_to_numpy/numpy_array_t.png" alt="drawing" width="600" align="left"/>

In [None]:
np.array([[1,2,3,4,5,6,7,8,9]])

To visualise the shape or dimensions of a numpy array we can add the suffix `.shape`

In [None]:
print(np.array([1,2,3,4,5,6,7,8,9]).shape)
print(np.array([[1,2,3,4,5,6,7,8,9]]).shape)
print(np.array([[1],[2],[3],[4],[5],[6],[7],[8],[9]]).shape)

Any array can be reshaped into different shapes using the function `reshape`:

In [None]:
np.array([1,2,3,4,5,6,7,8]).reshape((2,4))

If you are concerned about having to type so many squared brackets, there are more simple and convenient ways of doing
the same:

In [None]:
print(np.array([1,2,3,4,5,6,7,8,9]).reshape(1,9).shape)
print(np.array([1,2,3,4,5,6,7,8,9]).reshape(9,1).shape)
print(np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3).shape)

Also there are shortcuts for declaring common arrays without having to type all their elements:

In [None]:
print(np.arange(9))
print(np.ones((3,3)))
print(np.zeros((2,2,2)))

### Arithmetic operations

Numpy has many useful arithmetic functions. Below we demonstrate a few of these, such as mean, standard deviation and
sum of the elements of an array. These operations can be performed either across the entire array, or across a
specified dimension.

In [None]:
arr = np.arange(9).reshape((3,3))
print(arr)

In [None]:
print('Mean of all elements in the array:', np.mean(arr))
print('Std dev of all elements in the array:', np.std(arr))
print('Sum of all elements in the array:', np.sum(arr))
print('Mean of elements in array axis 0:', np.mean(arr, axis=0))
print('Mean of elements in array axis 1:', np.mean(arr, axis=1))

### Numpy data types
Numpy arrays can contain numerical values of different types. These types can be divided in these groups:

 * Integers
    * Unsigned
        * 8 bits: `uint8`
        * 16 bits: `uint16`
        * 32 bits: `uint32`
        * 64 bits: `uint64`
    * Signed
        * 8 bits: `int8`
        * 16 bits: `int16`
        * 32 bits: `int32`
        * 64 bits: `int64`

* Floats
    * 32 bits: `float32`
    * 64 bits: `float64`
    
We can specify the type of an array when we declare it, or change the data type of an existing one with the following
expressions:

In [None]:
#set datatype when declaring array
arr = np.arange(5, dtype=np.uint8)
print('Integer datatype:', arr)

arr = arr.astype(np.float32)
print('Float datatype:', arr)

### Broadcasting

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject
to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. This
can make operations very fast.

In [None]:
a = np.zeros((3,3))
print(a)

a = a + 1

print(a)

a = np.arange(9).reshape((3,3))

b = np.arange(3)

a + b

### Booleans

There is a binary type in numpy called boolean which encodes `True` and `False` values. For example:

In [None]:
arr = (arr > 0)

print(arr)

arr.dtype

Boolean types are quite handy for indexing and selecting parts of images as we will see later. Many numpy functions
also work with Boolean types.

In [None]:
print("Number of 'Trues' in arr:", np.count_nonzero(arr))

#create two boolean arrays
a = np.array([1,1,0,0], dtype=bool)
b = np.array([1,0,0,1], dtype=bool)

#compare where they match
np.logical_and(a, b)

### Introduction to Matplotlib

This second part introduces matplotlib, a Python library for plotting numpy arrays as images. For the purposes of
this tutorial we are going to use a part of matplotlib called pyplot. We import it by doing:

In [None]:
%matplotlib inline

from matplotlib import pyplot as plt

An image can be seen as a 2-dimensional array. To visualise the contents of a numpy array:

In [None]:
arr = np.arange(100).reshape(10,10)

print(arr)

plt.imshow(arr)

We can use the Pyplot library to load an image using the function `imread`:

In [None]:
im = np.copy(plt.imread('../Supplementary_data/06_Intro_to_numpy/africa.png'))

#### Let's display this image using the `imshow` function.

In [None]:
plt.imshow(im)

This is a [free stock photo](https://depositphotos.com/42725091/stock-photo-kilimanjaro.html) of Mount Kilimanjaro,
Tanzania. A colour image is normally composed of three layers containing the values of the red, green and blue pixels.
When we display an image we see all three colours combined.

Let's use the indexing functionality of numpy to select a slice of this image. For example to select the top
right corner:

In [None]:
plt.imshow(im[:100,-200:,:])

We can also replace values in the 'red' layer with the value 255, making the image 'reddish'. Give it a try:

In [None]:
im[:,:,0] = 255
plt.imshow(im)

## Recommended next steps

For more advanced information about working with Jupyter Notebooks or JupyterLab, you can
explore [JupyterLab documentation page](https://jupyterlab.readthedocs.io/en/stable/user/notebook.html).

To continue working through the notebooks in this beginner's guide, the following notebooks are designed to be worked through in the following order:

1. [Jupyter Notebooks](01_Jupyter_notebooks.ipynb)
2. [Products and Measurements](02_Products_and_measurements.ipynb)
3. [Loading data](03_Loading_data.ipynb)
4. [Plotting](04_Plotting.ipynb)
5. [Performing a basic analysis](05_Basic_analysis.ipynb)
6. **Introduction to numpy (this notebook)**
7. [Introduction to xarray](07_Intro_to_xarray.ipynb)
8. [Parallel processing with Dask](08_Parallel_processing_with_dask.ipynb)