# Worksheet 2: Using the Python Iris library for analysis and visualisation

In this worksheet, sample CORDEX output over Southeast Asia is compared with observations for validation purposes. Validating model results by comparing with observed data is an essential step - this is the measure by which we can assess the quality of the model and it informs appropriate uses of the data.


Here, we validate CORDEX output driven by two different GCMs (HadGEM2-ES and MPI-ESM-LR) created with the REMO2015 Regional Climate Model. Using data from both experiments will give us two representations of present day climate and two possible climate scenarios. For more details on multimodel approaches, refer to the workshop lecture on climate model ensembles.

The following exercises provide examples of how analysis can be undertaken as part of a model validation. The methods shown are not necessarily the only way to proceed and are intended to demonstrate the use of Iris in model validation, and provide a starting point for your own analyses.

<div class="alert alert-block alert-warning">
<b>By the end of this worksheet you should be able to:</b><br> 
- Apply basic statistical operations to Iris cubes. <br>
- Plot information from Iris cubes.<br>
</div>

## Contents
### [2.1: Inspecting the data](#2.1) 
### [2.2: Converting units](#2.2)
### [2.3: Climatological mean calculation](#2.3)
### [2.4: IRIS quick plotting and visualising data](#2.4)

## Preamble
Run the code preamble below to import the necessary libraries for this worksheet.

To run the code, click in the box below and press <kbd>Ctrl</kbd> + <kbd>Enter</kbd>.

In [None]:
# Code preamble - these libraries will be used in this worksheet.
# This code block needs to be re-run every time you restart this worksheet!
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.dates as mdates
import calendar
import iris
import iris.coord_categorisation
import iris.quickplot as qplt
from iris.experimental.equalise_cubes import equalise_attributes
from iris.time import PartialDateTime
import cartopy.crs as ccrs
from mpl_toolkits.axes_grid1 import AxesGrid
from cartopy.mpl.geoaxes import GeoAxes
from utils import copy_s3_files, flush_data

<a id='2.1'></a>
# 2.1 Inspecting the data

The datasets used here contain daily and monthly data from two REMO2015 runs carried out over Southeast Asia, one driven by HadGEM2-ES and the other driven by MPI-ESM-LR. The observations used for comparison are from the CHIRPS gridded observational data set (https://chc.ucsb.edu/data/chirps).

Remember, in Iris, data are read into an object called a **cube**. A single cube describes only one variable; it is not possible for a cube to contain both temperature and rainfall, for example. A cube always has a name, a unit and an n-dimensional data array to represent the cube’s data. Additionally, the cube contains collections of coordinates.  Coordinate types can include spatial information (latitude, longitude, altitude), a time dimension, or other information, e.g., an ensemble number.

<p><img src="img/multi_array_to_cube.png" alt="Example Iris cube" style="float: center; height: 300px;"/></p>


__a) Load the NetCDF file for the HadGEM2-ES and MPI-ESM-LR model data and the CHIRPS rainfall observation data and print the cube output__

A cube has coordinates (e.g. time, longitude, latitude, model levels) which can be accessed with Python commands. In the following exercise we find the latitude and longitude covered by the CHIRPS data. This can be done either by printing the latitude and longitude coordinates (`.points`), noting the first and last values in the array, or in the case of the CHIRPS data, printing the cube and looking through the attributes. A similar example can be found in the [Iris documentation](https://scitools.org.uk/iris/docs/v2.4.0/userguide/navigating_a_cube.html#accessing-coordinates-on-the-cube). 

Before running the code, take a look at it line-by-line to understand what steps are being taken. Then click in the box and press <kbd>ctrl</kbd> + <kbd>enter</kbd> to run the code.

In [None]:
# we first need to download APHRODITE data
copy_s3_files('s3://ias-pyprecis/data/APHRODITE/*.nc', 'data/APHRODITE/')

In [None]:
# Provide the names of the directories where the netCDF model files are stored
DATADIR = 'data_v2/'

# Load and concatenate the HadGEM2-ES model cube data
infile = os.path.join(DATADIR, 'EAS-22/pr_EAS-22_MOHC-HadGEM2-ES_historical_r1i1p1_GERICS-REMO2015_v1_mon_*.nc')
cubes = iris.load(infile)
equalise_attributes(cubes)
hadgem2 = cubes.concatenate_cube()

# Load and concatenate the MPI-ESM-LR model cube data
infile = os.path.join(DATADIR, 'EAS-22/pr_EAS-22_MPI-M-MPI-ESM-LR_historical_r1i1p1_GERICS-REMO2015_v1_mon_*.nc')
cubes = iris.load(infile)
equalise_attributes(cubes)
mpiesm = cubes.concatenate_cube()

# finally load the CHIRPS data and print the cube 
infile = os.path.join(DATADIR, 'CHIRPS', 'chirps-v2.0.monthly.global.1981_2018.nc')
chirpsData = iris.load_cube(infile)
print(chirpsData)

# print the first and last latitude points
print(chirpsData.coord('latitude').points[0], chirpsData.coord('latitude').points[-1])
# print the first and last longitude points
print(chirpsData.coord('longitude').points[0], chirpsData.coord('longitude').points[-1])

---
<div class="alert alert-block alert-success">
    <b>Question:</b> How many years of data does the CHIRPS dataset contain? What longitudes and latitudes are covered by the CHIRPS data?
</div>

---

<b>Answer:</b>*(Double click here to fill in the answers)*<br>

Number of years:

Latitude and longitude covered: 
<br><br>

__b) Extract a subset of the data within a cube__

Data extraction is an important function in Iris. The extraction of a subset of data is called **slicing**.  For example, it could be necessary to extract data over all latitude and longitude grid points on the first time step. For more information around subsetting cubes please read the [Iris documentation on slicing](https://scitools.org.uk/iris/docs/v2.4.0/userguide/subsetting_a_cube.html#cube-indexing).

__Using the HadGEM2-ES data, the example below shows how to subset a cube for the first and last timesteps. This method will be used later for plotting data from a cube.__ 

Work through the example below line by line then click in the box and press <kbd>Ctrl</kbd> + <kbd>Enter</kbd> to run the code.

First, print the HadGEM2-ES cube, containing all the time steps:

In [None]:
# Use the print() command to display a summary of the HadGEM2-ES cube



---
<div class="alert alert-block alert-success">
<b>Question:</b> What dimensions does this cube have? <br>
    t = number of timesteps <br>
    y = number of latitude steps <br>
    x = number of longitude steps <br>
    Write your answer in the form `[ t, y, x ]`
</div>

<b>Answer:</b> *(Double click here to fill in the answers)*<br> 
HadGEM2-ES dimensions: [ t, y, x]

---

<div class="alert alert-block alert-info">
<b>Note:</b> When indexing a cube dimension, you either can specify a single coordinate, e.g. <code>0</code> is the first (zeroth) item, <code>-1</code> is the last item, or you can use <b><code>:</code></b> to include <b>all items</b>.<br>
</div>

Display the cube's **first** timestep and check the associated `time` value:

In [None]:
# Use the print() funiction with slicing notation



Display the cube's **last** timestep and check the associated `time` value:

In [None]:
# Use the print() function with slicing notation



For the analysis here, we will use data from 1986 to 2005 inclusive as used in the AR5 WG1 atlas. To do this we will use the same approach as in Worksheet 1 by using time constraints

In [None]:
time_constraint = iris.Constraint(time=lambda cell: PartialDateTime(year=1986) 
                            <= cell.point <= PartialDateTime(year=2005))
hadgem2 = hadgem2.extract(time_constraint)

Now apply the same time constraint to the mpiesm data to restrict to the same period

In [None]:
mpiesm = mpiesm.extract(time_constraint)

<a id='2.2'></a>
## 2.2 Converting units

__c) Convert the precipitation units from kg/m2/s (equivalent to mm/s) to mm/day__

To convert to mm/day, we could just multiply the raw data (in mm/s) by 86400 seconds, but a clearer way is to use the __`.convert_units()`__ method with the name of the units we want to convert the data into.

Let's do this for the __HadGEM2-ES__ historical data first and break down the steps as follows:

* Print the units and summary statistic about the data
* Convert the unit and print the information again
* Rename the `.units` value in the cube and save it as a new netCDF file

In [None]:
print(hadgem2)

In [None]:
# print the unit
print('The current unit for data is: ' + hadgem2.units)
# print the summary statistic (maximum monthly precipitation)
maxpr = np.max(hadgem2.data)
print('This is an example rainfall rate (kg m-2 s-1) prior to conversion: ' + maxpr)

In [None]:
# Convert units to kg m-2 day-1 (same as multiplying by 86400 seconds)
hadgem2.convert_units('kg m-2 day-1')
# Print cube.units to view new units for precipitation
print('The new rainfall units are: ' + hadgem2.units)
maxpr = np.max(hadgem2.data)
# print the summary statistic (maximum monthly precipitation) after the unit conversion
print('This is the same rainfall rate but now in (kg m-2 day-1): ' + maxpr)

Rename the new cube units for consistency, then save the converted cube:

In [None]:
# Rename cube units
hadgem2.units = 'mm day-1'

# Save the new cube as a new netCDF file
HISTDIR = 'data_v2/EAS-22/historical'

# Check to see if this directory exists, if not create it
if not os.path.isdir(HISTDIR):
    # Make directory
    os.mkdir(HISTDIR)
    # Set directory permissions 
    os.chmod(HISTDIR, 0o776)

outfile = os.path.join(HISTDIR, 'hadgem2-es.mon.1986_2005.GERICS-REMO2015.pr.mmday-1.nc')
iris.save(hadgem2, outfile)

Complete the following code block to repeat the same procedure for MPI-ESM-LR:

In [None]:
# Print the current MPI-ESM-LR cube units


# convert units to kg m-2 day-1


# Rename the units to mm day-1. Recall that 1 kg m-2 is equivalent to 1 mm of rain


# Save the new cube as a new netCDF file using the `outfile` filename we've provided below!
outfile = os.path.join(HISTDIR, 'mpi-esm-lr.mon.1986_2005.GERICS-REMO2015.pr.mmday-1.nc')



<a id='2.3'></a>
## 2.3 Climatological seasonal mean calculation

__d) Calculate the 1986-2005 seasonal mean precipitation__ field for October-December (OND) from both the HadGEM2-ES and MPI-ESM-LR driven runs.

Work through the example below line by line then click in the box and press <kbd>ctrl</kbd> + <kbd>enter</kbd> to run the code.

In [None]:
# Set up directory for the climatology
CLIMDIR = 'data_v2/EAS-22/climatology'

# Check to see if this directory exists, if not create it
if not os.path.isdir(CLIMDIR):
    # Make directory
    os.mkdir(CLIMDIR)
    # Set directory permissions 
    os.chmod(CLIMDIR, 0o776)

In [None]:
# Loop through two model runs
for gcmid in ['hadgem2-es', 'mpi-esm-lr']:
    infile = os.path.join(HISTDIR, gcmid + '.mon.1986_2005.GERICS-REMO2015.pr.mmday-1.nc')

    # Load the data
    data = iris.load_cube(infile)

    # In order to calculate OND mean, we use the command below to add a season membership coordinate.
    # The seasons can be any sequence of months, identified by the first letters of the names of the months.
    # Here, we define two seasons, jfmamjjas (the months we are not interested in) and ond (October, November and
    # December); the months we do want.
    iris.coord_categorisation.add_season(data, 'time', name='seasons', seasons=('jfmamjjas','ond'))

    # This command extracts data for the OND season using a constraint
    data_ond = data.extract(iris.Constraint(seasons='ond'))

    # The cube 'data_ond' contains data from October-December for all years. 
    # The command below calculates the mean over all years.
    seasonal_mean = data_ond.collapsed('time', iris.analysis.MEAN)
    
    # Save the OND seasonal mean as a netCDF
    outfile = os.path.join(CLIMDIR, gcmid + '.OND.mean.1986_2005.pr.mmday-1.nc')
    iris.save(seasonal_mean, outfile)

---
<div class="alert alert-block alert-success">
<b>Question:</b> What dimensions does this cube have now? <br>
    t = number of timesteps <br>
    y = number of latitude steps <br>
    x = number of longitude steps <br>
    Write your answer in the form `[ t, y, x ]` <br>
    Compare your answer to the answer you found in <strong> (b)</strong>. Which dimensions have changed?
</div>

<b>Answer:</b> *(Double click here to fill in the answers)*<br>
Seasonal mean dimensions: [ t, y, x]

---

__e) Calculate the 1986-2005 seasonal mean for OND from the CHIRPS observation data__

CHIRPS is a daily high resolution (0.05 degree) data set for 1981 to almost the present day. See https://chc.ucsb.edu/data/chirps for more information.

Follow step d) and complete the code yourself.  The file name to load is: `chirps-v2.0.monthly.global.1981_2018.nc`. We've given you the infile and outfile names to make sure you load and save it in the right place for later!

In [None]:
# we first need to download APHRODITE data
copy_s3_files('s3://ias-pyprecis/data/climatology/*.nc', 'data/climatology/')

In [None]:
# Directory names where data is read from and stored to
infile = os.path.join(DATADIR, 'CHIRPS', 'chirps-v2.0.monthly.global.1981_2018.nc')

# Load the CHIRPS data, only for the period 1986-2005 
chirps = iris.load_cube(infile, constraint=time_constraint)

# convert the units to mm day^-1 
chirps.convert_units('mm day-1')

# In order to calculate OND mean, we need to a add season membership coordinate
iris.coord_categorisation.add_season(chirps, 'time', name='seasons', seasons=('jfmamjjas','ond'))

# Then constrain the cube just for the OND season
chirps_ond = chirps.extract(iris.Constraint(seasons='ond'))

# Now calculate the climatological mean for this season
seasonal_mean = chirps_ond.collapsed('time', iris.analysis.MEAN)

# save the seasonal mean cube as a NetCDF file
outfile = os.path.join(CLIMDIR, 'chirps.OND.mean.1986_2005.pr.mmday-1.nc')
iris.save(seasonal_mean, outfile)

# print the CHIRPS seasonal mean cube
print(seasonal_mean)

---
<div class="alert alert-block alert-success">
<b>Question:</b> How would you calculate the standard deviation of mean rainfall?  How about annual maximum rainfall?
</div>

<b>Answer:</b> Write the line of code required to calculate CHIRPS's (a) standard deviation and (b) annual maximum rainfall in the code block below. <br>
<b>Hint</b>: How could you adapt <code>chirps_ond.aggregated_by(['seasons'], iris.analysis.MEAN)</code> from above? You can refer to the [Iris documentation](https://scitools.org.uk/iris/docs/v2.4.0/iris/iris/analysis.html) if needed.

In [None]:
# From chirps, calculate: 
# (a) chirps_std 


# (b) chirps_max



---

<a id='2.4'></a>
## 2.4 IRIS quick plotting and visualising data

Now we will plot the output to take a first look at what climatological winter precipitation (1986-2005 OND seasonal mean) looks like for each dataset. This section provides an initial introduction to visualising data quickly using iris, for further reading and instructions please visit: https://scitools.org.uk/iris/docs/v2.4.0/userguide/plotting_a_cube.html


**f) Plot and compare** the climatological winter preciptation over South East Asia for three datasets.

<div class="alert alert-block alert-success">
<b>Question:</b> Work through the code block below line by line. Think about what you expect the plot setup to look like: <br> 

* Which lines of code specify the layout of sub-plots?<br>
* Will the plots have a common colour scale or separate ones?<br>
* What are the maximum and minimum precipitation values that will be displayed? <br>
</div>

Think about your answers, then click in the box and press <kbd>ctrl</kbd> + <kbd>enter</kbd> to run the code and create the plots.

In [None]:
# load hadgem2-es model data
infile = os.path.join(CLIMDIR, 'hadgem2-es.OND.mean.1986_2005.pr.mmday-1.nc')
hadgem_cube = iris.load_cube(infile)

# load mpi-esm model data
infile = os.path.join(CLIMDIR, 'mpi-esm-lr.OND.mean.1986_2005.pr.mmday-1.nc')
mpi_cube = iris.load_cube(infile)

# load CHIRPS data
infile = os.path.join(CLIMDIR, 'chirps.OND.mean.1986_2005.pr.mmday-1.nc')
obs_cube   = iris.load_cube(infile)

# Do some plotting!
# Create a figure of the size 12x10 inches
plt.figure(figsize=(12, 10))

plt.subplot(1, 3, 1)           # Create a new subplot for the model data; 1 row x 3 columns, 1st plot
levels = range(0, 22, 2)       # Define the contour levels for all plots

# Note this is where cube slicing is needed as you can only plot 2-coordinate
# dimensions with qplt.contourf, so here we have selected time[0] as there is only
# one timestep (the baseline 1986-2005 mean)
qplt.contourf(hadgem_cube, levels=levels, cmap=cm.RdBu, extend='max')
                               

plt.title('HadGEM2-ES model')  # plots a title for the plot
ax = plt.gca()                 # gca function that returns the current axes
ax.coastlines()                # adds coastlines defined by the axes of the plot

plt.subplot(1, 3, 2)           # Create a new subplot for the model data; 1 row x 3 columns, 2nd plot
qplt.contourf(mpi_cube, levels=levels, cmap=cm.RdBu, extend='max')

plt.title('MPI-ESM-LR model')  # plots a title for the plot
ax = plt.gca()                 # gca function that returns the current axes
ax.coastlines()                # adds coastlines defined by the axes of the plot

plt.subplot(1, 3, 3)           # Create a new subplot for the observed data 1 row x 3 columns, 3rd plot
                               # This plot will be centred and below the two model plots
qplt.contourf(obs_cube, levels=levels, cmap=cm.RdBu, extend='max')

plt.title('CHIRPS obs')        # plots a title for the plot
ax = plt.gca()                 # gca function that returns the current axes
ax.coastlines()                # adds coastlines defined by the axes of the plot
ax.set_extent((65.0, 155.0, 0.0, 40.0))

plt.show()

---
<div class="alert alert-block alert-success">
<b>Question:</b> 
    <br>What are the differences between the following plots for HadGEM2-ES, MPI-ESM and CHIRPS? Note the colour bars. 
    <br>Where are the largest daily rainfall rates distributed?
    <br>Why do you think this is happening?
</div>

<b>Answer:</b> *(Double click here to fill in the answers)*<br>

<b>What differences do you see between the three plots?</b>


<b>Location of greatest rainfall</b>
<br> *HadGEM2-ES*: 
<br> *MPI-ESM*: 
<br> *CHIRPS*:


<b>What is happening and why?</b>


<b>How could comparison be made easier?</b>

---

<center>
<div class="alert alert-block alert-warning">
<b>This completes worksheet 2.</b> <br>You have covered converting units, created seasonal means and visualised your results.<br>
In worksheet 3, you will start to consider more advanced analysis, extract regional means, look at annule cycles, work with ensemble data and produce difference plots.
</div>
</center>

<p><img src="img/MO_MASTER_black_mono_for_light_backg_RBG.png" alt="python + iris logo" style="float: center; height: 100px;"/></p>
<center>© Crown Copyright 2022, Met Office</center>