# Python Tutorial 03: Numerical Python and Conditions

This script provides you with an introduction tp numerical Python using the widely known package 'NumPy'. We will also introduce loops and conditions which make your data wrangling easier. If you spot any mistakes or issues, please report them to christoph.renkl@dal.ca.

## Numerical Python using NumPy

So far, we have learned about high-level number data types and containers. NumPy offers you an extension for multidimensional arrays and it also comes with a lot of useful functions which are written in a way that you can apply operations to each element in the array very efficiently.

In [None]:
# Import the NumPy package
import numpy as np

Create a 1D array a.k.a. vector. We do this by providing a list to the np.array() function

In [None]:
vec = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
vec

Let's do some math

In [None]:
vec + 3   # add 3 to each element in array

In [None]:
vec - 7   # subtract 7 from each element in array

In [None]:
vec * 10  # multiply each element by 10

In [None]:
vec / 10  # divide each element by 10

In [None]:
vec ** 2  # raise each element to the power of 2

Python itself does not know about square roots - in order to compute the square root of a particular number, you could raise it to the power of 1/2 which is mathematically the same. Note that you have to set parantheses around the exponent to compute it before the exponentiation.

In [None]:
# compute square root of 16 using native Python
16 ** (1/2)

NumPy has a function for this computation:

In [None]:
np.sqrt(16) # the `np.` at the front indicates that we use the function from the function `sqrt` of the `numpy` package

This function also works for arrays

In [None]:
np.sqrt(vec)  # calculate square root of each element

We can also create arrays with higher dimensions providing a list with one list per dimension

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]]) # 2D array (matrix), each list becomes a
                                       # row in the array
arr

To get information about arrays we can use the attributes `ndim` and `shape`

In [None]:
arr.ndim # number of dimensions = 2, rows and columns

In [None]:
arr.shape # number of elements in each dimension

There are more ways of creating arrays

In [None]:
arr1 = np.arange(1, 11, 1)   # start, end (exclusive), increment
arr1

In [None]:
arr2 = np.linspace(0, 1, 6)  # start, end, number of elements
arr2

In [None]:
arr3 = np.ones((3, 4))       # 2D array with each element equal to 1
arr3

In [None]:
arr4 = np.zeros((3, 3, 2))   # 3D array with each element equal to 0
arr4

Indexing works just like before. Remember Python is zero based.

In [None]:
arr1[3] # 4th element of arr1

In [None]:
arr2[-3:] # the last three elements of arr2

In [None]:
arr3[1, :2] # element in first two columns (:2) in the second row (1) of arr3

In [None]:
arr4[..., 1] # all rows and columns of the 3rd dimension of arr4

## Conditions and if-Statements

With conditions you can let Python answer simple yes/no questions. Generally, you compare two objects with an operator. The basic ones are

* `==`:   equal to  
*  `!=`:   not equal to  
*  `>=`:   greater or equal to  
*  `<=`:   less than or equal to  
*  `>`:    greate than  
*  `<`:    less than  
*  `in`:   within/part of  

 A comparison always returns `True` or `False`

In [None]:
7 == 5     # is 7 equal to 5

In [None]:
arr1 != 8  # is arr1 not equal to 8

In [None]:
arr1 >= 5  # is arr1 greater or equal to 5

In [None]:
arr1 > 5   # is arr1 greater than 5

In [None]:
arr1 <= 5  # is arr1 smaller or equal to 5

In [None]:
arr1 < 5   # is arr1 smaller than 5

In [None]:
17 in arr1 # is 17 within/part of arr1

You can use the answer for logical accessing of parts of arrays

In [None]:
arr1[arr1 != 8] # return all elements of arr1 that are not equal to 8

In [None]:
arr1[arr1 > 5] # return all elements of arr1 that are larger than 5

Conditions are very useful when you want to execute certain parts of your script only when a certain criterion is met. This is done through if statements.

In [None]:
# Define a short vector
vec = np.array([23, 566, 8, 4646, 78, 51664, np.nan, 763, 6, 0, 123, 42, 9999])

# define a value of interest
val = 42

# Set up the if-statement. You can read is as: " if [val] is an element of
# [vec], then print the message to the terminal
if val in vec:
    print(f"Hooray, {val} is in the vector!")

Note that the indentation of four spaces after the is crucial! The condition you want to check is written after the word `if`. Everything that is indented will only be executed if the condition is met.

In [None]:
# change the variable val    
val = 17
    
# Now extend the if-statement for the case that the condition is not met:
if val in vec:
    print(f"Hooray, {val} is in the vector!")
    
else:       
    print(f"Sorry, {val} is NOT in the vector!")

Play around with the value of `val` and see how the message changes.

We can also allow for a hierachy of conditions. If the first condition is not met, try another condition and use the else-statement as a last resort. 

First calculate the mean of our vector:

In [None]:
# we use the NumPy function `nanmean`
vec_mean = np.nanmean(vec)

In [None]:
# check if `val` is an element of the vector `vec`
if val in vec:
    print(f"Hooray, {val} is in the vector!")

# if the first condition is not met, check if `val` is at least smaller
# than the mean value of the vector
elif val < vec_mean:   
    print(f"Too bad, {val} is not in the vector, but it is smaller",
          "than the mean value of the vector!")

# if neither condition is met 
else:       
    print(f"Sorry, {val} is NOT in the vector!")

## Arrays and Conditions in Data Analysis

We will now apply some of the concepts we have covered so far to real data.

In [None]:
# Import all packages which we will need for the remainder this tutorial
from pathlib import Path
import cmocean.cm as cmo
import matplotlib.pyplot as plt
import xarray as xr # This is a great package which applies most of pandas
                    # functionalities to multidimensional arrays. It is also a
                    # great tool for working with NetCDF data files.

In [None]:
# path to DISP directory
dispdir = Path("/home/chrenkl/Projects/DISP/python_tutorial")

# full file name including path
fname = dispdir / "data" / "raw" / "bedford_basin_monitoring_program.nc"

# Read data - these are all CTD casts from the Bedford Basin Monitoring Program
ds = xr.open_dataset(fname)

# The variable 'ds' holds a xarray Dataset. Let's have a look at the content
ds

As you can see, it has four data variables which are 2D arrays sharing the same coordinates time and pressure. Under the hood, the data variables are NumPy arrays and you can use them in the same way. The benefit of the `xarray.Dataset` is that it assigns names and labels to the rows and columns of the arrays which makes accessing certain values much easier and more explicit.

It is good practice to keep your code flexible and less repetitive. The goal is to plot one of the data variables with appropriate labels showing the units of the variables. Wouldn't it be nice if we could write the code in a way that it knows which unit and color scheme to use just based on the name of the variable?

In [None]:
# Define a variable with the name of the temperature data in the `xarray.Dataset`
vname = "temperature"

# Create an if-statement that defines parameters according to the chosen variable

# set parameters if variable is temperature
if vname == "temperature":
    units = r"[$^\circ$C]" # units of temperature are degree Celsius
    name = "Temperature" 
    cmap = cmo.thermal     # we choose the `thermal` colormap of the `cmocean` package
    vmin = -2.             # the minimum value we want to show
    vmax = 20.             # the maximum value we want to show
    nlevels = 23           # number of contour levels

# set parameters if variable is salinity
elif vname == "salinity":
    units = "[-]"          # salinity has no units
    name = "Salinity"
    cmap = cmo.haline      # we choose the `haline` colormap of the `cmocean` package
    vmin = 28.
    vmax = 32.
    nlevels = 33           # number of contour levels

# set parameters if variable is sigmaTheta (potential density anomaly of sea water)
elif vname == "sigmaTheta":
    units = r"[kg m$^{-3}$]"  # the units of of density are kg/m^3
    name = r"Potential Density Anomaly $\sigma_{{\theta}}$"
    cmap = cmo.dense          # we choose the `dense` colormap of the `cmocean` package
    vmin = 21.
    vmax = 25.5
    nlevels = 37           # number of contour levels

# set a default
else:
    units = ''
    name = vname.capitalize()
    cmap = "viridis"
    vmin = ds[vname].min()
    vmax = ds[vname].max()
    nlevels = 31           # number of contour levels

# Create title string
title = f"Bedford Basin Monitoring Program - {name}"

Now we create the plot.

In [None]:
# Set up a figure of a certain size with one subplot
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(11., 4.5))

# Plot the variable as a function of time (x-axis) and pressure (y-axis)
cs = ax.contourf(
    ds["time"].values,                       # values on x-axis
    ds["pressure"],                          # values on y-axis
    ds[vname].transpose(),                   # variable we want to plot   
    levels=np.linspace(vmin, vmax, nlevels), # specify contour levels
    cmap=cmap                                # colormap
)  

# invert y-axis
ax.invert_yaxis()

# colorbar
cb = fig.colorbar(cs, ax=ax)

# The colorbar has its own axis, put the units in the title
cb.ax.set_title(units)

# set label and title
ax.set_ylabel("Depth [m]")
ax.set_title(title)

# make sure the plot takes up all space on the figure
fig.tight_layout()

We want to save the plot in a dedicated directory. It is important to keep the the figures separately from your data. The idea is that we can always delete the folder with the figures and recreate the exact same plot with our code, just from the raw data. That is the beauty of reprodicible research!

Note that we have specified the variable `dispdir` above when we read the data. Based on that directory we can specify a subdirectory `figures`.

In [None]:
# create subdirectory for figures
figdir = dispdir / "figures"

# the following command creates the directory `figdir` if it does not yet exist.
# (the if-statement is buried in the `mkdir` function)
figdir.mkdir(parents=True, exist_ok=True)

# save figure as a PDF
fig.savefig(figdir / f"bbmp_{vname}.pdf")

Play around by changing `vname` and see how the plot changes.

All Variables show a seasonal cycle, but clearly, there is some interannual variability. With xarray, it is very easy to compute a monthly climatology:

In [None]:
# collect values of all Januaries, Februaries, Marches, ... and compute the mean along the `time` dimension.
clim = ds.groupby('time.month').mean(dim='time')

We can subtract the climatology from our original data to get anomalies

In [None]:
anom = ds.groupby('time.month') - clim

Now, let's have a look at one particular year.

In [None]:
# Define a variable with the year of interest defined as a string
year = "2018"

It is straightforward to create a subset with anomalies for this year

In [None]:
# select anomalies for the chosen year
dsyear = anom.sel(time=year)
dsyear

Note that we now have a smaller subset of the original data. Now we want to plot this subset.

We can use almost the same code as above to create a time series of nomalies: 

In [None]:
# redefine vmin, vmax, and nlevels
vmax = np.floor(abs(anom[vname]).max())
vmin = -vmax
nlevels = int((vmax - vmin)) * 2 + 1

# Set up a figure of a certain size with one subplot
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(11., 4.5))

# Plot the variable as a function of time and pressure
cs = ax.contourf(
    dsyear['time'].values,
    dsyear['pressure'],
    dsyear[vname].transpose(),
    np.linspace(vmin, vmax, nlevels),
    cmap=cmo.balance
)

# invert y-axis
ax.invert_yaxis()

# colorbar
cb = fig.colorbar(cs, ax=ax)

# The colorbar has its own axis, put the units in the title
cb.ax.set_title(units)

# set label and title
ax.set_ylabel('Depth [m]')
ax.set_title(title + ' Anomalies')

# make sure the plot takes up all space on the figure
fig.tight_layout()

# save figure as a PDF, note that we already defined and created `figdir` above
fig.savefig(figdir / f"bbmp_{vname}_anomalies.pdf")

When you are done analyzing a `xarray.Dataset`, you should close it so it does not take up any memory

In [None]:
# close datasets
dsyear.close()
ds.close()