<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/LPS%20Training%20-%20Exercise%202.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LPS Hands-on Training Session - CoMet Toolkit: Uncertainties made easy**

#Exercise 2: Multi-Dimension Datasets

## Objectives

In this exercise we will cover:

* How to use [**obsarray**](https://obsarray.readthedocs.io/en/latest/) to store error-correlation information for multi-dimensional measurement datasets - such as from Earth Observation.
* Propagating uncertainties from these datasets through measurement functions using [**punpy**](https://punpy.readthedocs.io/en/latest/).

## *Step 1* - Environment Setup

As in Exercise 1, we start by collecting test data and installing and importing the required CoMet Toolkit packages --- [**obsarray**](https://obsarray.readthedocs.io/en/latest/) and [**punpy**](https://punpy.readthedocs.io/en/latest/).

In [None]:
# Clone the CoMet Training repository to access training data
!git clone https://github.com/comet-toolkit/comet_training.git

# Install & import CoMet Toolkit packages
!pip install obsarray>=1.0.1
!pip install punpy>=1.0.4

import obsarray
import punpy
import xarray as xr

CoMet Toolkit's [**obsarray**](https://obsarray.readthedocs.io/en/latest/) package is an extension to the popular [xarray](https://docs.xarray.dev/en/stable/) package.

[`xarray.Dataset`](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset)'s are objects in python that represent the data model of the [netCDF file format](https://www.unidata.ucar.edu/software/netcdf/). **obsarray** allows you to assign uncertainties to variables in xarray Datasets, with their associated error-correlation.

This is achieved by using the CoMet Toolkit's draft [UNC Specification](https://comet-toolkit.github.io/unc_website/) metadata standard for dataset variable attributes. So such objects are portable, and can be stored to and from netCDF files on disc.


## *Step 2* - Interfacing with a Measurement Dataset using **obsarray**

In this step of the exercise, we will explore how to define and interact with the *uncertainty variables* (i.e., uncertainty components) associated with *observation variables* in measurement datasets using **obsarray**.

Our example will be a multi-spectral dataset of Level 1 (L1) [Brightness Temperatures](https://en.wikipedia.org/wiki/Brightness_temperature) (BT) from the AVHRR sensor on MetOp-A. This [dataset](https://catalogue.ceda.ac.uk/uuid/14a8098d70714cc1bf38f9dbcb82e5ed/) was created as part of the [FIDUCEO](https://research.reading.ac.uk/fiduceo/) project.

Here we open an extract contained in a [netCDF](https://www.unidata.ucar.edu/software/netcdf/) file, which has an observation variable -- `brightness_temperature` -- with the following dimensions:

* $x$, along track -- 400 pixels at 1 km resolution
* $y$, across track -- 400 pixels at 1 km resolution
* $band$, spectral bands -- 2 bands centred on $\sim$11 μm and 12 μm


In [None]:
# open xarray.Dataset from netCDF file
dataset_path = "avhrr_dataset.nc"
avhrr_ds = xr.open_dataset(dataset_path)

# inspect dataset
print(avhrr_ds)
avhrr_ds["brightness_temperature"][0].plot()

After import, **obsarray** functionality is accessed throught the `unc` "[accessor](https://docs.xarray.dev/en/stable/internals/extending-xarray.html)" -- which looks like a new method that becomes available on xarray Datasets.

We can use this to [assign an *uncertainty variable*](https://obsarray.readthedocs.io/en/latest/content/user/unc_accessor.html#adding-removing-uncertainty-components) to the `brightness_temperature`, in a very similar way to adding a normal variable to an xarray Dataset:

In [None]:
# define u_noise values - set as 1%
u_cal_values = avhrr_ds["brightness_temperature"].values * 0.01

# add an uncertainty component associated with noise error to the brightness temperature
avhrr_ds.unc["brightness_temperature"]["u_noise"] = (["x", "y", "band"], u_noise_values)

print("Dataset Variables: " + str(avhrr_ds.keys())

Uncertainty variables have an assocaited error-correlation structure -- since we didn't define this for `u_noise`, it is assumed to be random (i.e., errors are uncorrelated between pixels).

Next let's add a calibration uncertainty component, `u_cal`, with a more complicated error-correlation structure using the `err_corr` attribute. This uses the [error-correlation parameterisations](https://comet-toolkit.github.io/unc_website/specification/draft-v0.1/unc_specification.html#appendix-a-error-correlation-parameterisations) defined by the draft UNC Specfication (it is also possible to add custom error-correlation parameterisations).

Let's set the pixel errors associated with `u_cal` to be systematic (i.e., the same/common) in the `x` and `y` dimension and defined by a custom matrix in the `band` dimension.

In [None]:
# create cross-channel error-correlation matrix
chl_err_corr_matrix = np.array([[1.0, 0.7],[0.7, 1.0]])
avhrr_ds["chl_err_corr_matrix"] = (("band1", "band2"), chl_err_corr_matrix)

# use this to define error-correlation parameterisation attribute
err_corr_def = [
    # fully systematic in the x and y dimension
    {
        "dim": ["x", "y"],
        "form": "systematic",
        "params": [],
        "units": []
    },
    # defined by err-corr matrix var in band dimension
    {
        "dim": ["band"],
        "form": "err_corr_matrix",
        "params": ["chl_err_corr_matrix"],  # defines link to err-corr matrix var
        "units": []
    }
]

# define u_cal values - set as 5%
u_cal_values = avhrr_ds["brightness_temperature"].values * 0.05

# add an uncertainty component associated with calibration error to the brightness temperature
avhrr_ds.unc["brightness_temperature"]["u_cal"] = (["x", "y", "band"], u_cal_values, {"err_corr": err_corr_def})

We can now interface with this information

# *Step 3* - Uncertainty Propagation

Thermal Infrared multi-spectral data, like our example AVHRR dataset, is used to develop Level 2 (L2) Climate Data Records (CDRs) such as Sea or Land Surface Temperature (SST or LST). SST/LST retriavals account for the atmosphere to evaluate the surface temperature from the top of atmosphere L1 brightness temperature.

A widely approach for this is called the "split window" method. A simplified form of this algorithm could be represented as,

$SST = a T_{11} - b T_{12}$

where:

* $T_{11}$​ is the brightness temperature in the 11 μm band
* $T_{12}$​ is the brightness temperature in the 12 μm band
* $a$ & $b$ are empirically derived retrieval coefficients

For the purpose of the exercise, set $a=2$ and $b=1$.

## **Exercise**

Using what we learned in [Exercise 1](https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/LPS_training_exercise1.ipynb), create a measurement function to apply the SST retrieval to our AVHRR dataset and propagate the uncertainties using **punpy**.

In [None]:
# Enter your code here

# *Extension* - Propagating Dataset Uncertainties with the `MeasurementFunction` Class

**punpy**'s [`MeasurementFunction`](https://punpy.readthedocs.io/en/latest/content/punpy_digital_effects_table.html#measurement-function) class enables a much simpler method for propagating the uncertainties of measurement datasets defined using **obsarray**. It is an alternative interface to the **punpy** propagation functions we used in Step 3.

For this approach instead defining the measurement function as a python function, we define a measurement function class which should be a subclass of the punpy `MeasurementFunction` class. We can then use the method [`propagate_ds`](https://punpy.readthedocs.io/en/latest/content/punpy_digital_effects_table.html#functions-for-propagating-uncertainties) to propagate all dataset uncertainties in one go!

In [None]:
class SplitWindowSST(punpy.MeasurementFunction):
  #
  # define primary method of class - the measurement function
  #
  def meas_function(self, BT: np.ndarray) -> np.ndarray:
  """
  Returns SST from input L1 BTs using split window method

  :param BT: brightness temperature datacube
  :returns: evaluated SST
  """

  # set parameter values
  a = 2
  b = 1

  # evaluate SST
  sst = a * BT[0,:,:] - b * BT[1,:,:]

  return sst

  #
  # define helper methods to configure class
  #
  def get_measurand_name_and_unit(self) -> tuple(str, str):
    """
    For dataset evaluate by measurement function, returns a tuple of
    measurand variable name and units

    :returns: measurand name, measurand unit name
    """
    return "sst", "K"

  def get_argument_names(self) -> list[str]:
    """
    Returns orders list input dataset variables names associated with
    meas_function arguments

    :returns: input dataset variable names
    """
    return ["brightness_temperature"]


# create punpy propagation object
prop = MCPropagation(1000, dtype="float32", verbose=False, parallel_cores=4)

# Instatiate measurement function object with prop
sst_ret = SplitWindowSST(prop=prop)

# run uncertainty propagatoin
sst_ds = sst_ret.propagate_ds(avhrr_ds)

# **Exercise**

Adapt the `MeasurementFunction` class approach above to include error-covariance for the set of parameters $a$ and $b$.

In [None]:
# Enter your code here