<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/vhroda_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LPS hands-on training session: CoMet Toolkit: Uncertainties made easy**

The aim of this training is to get you familiar with the CoMet toolkit, and uncertainties in general. We have already run through some theoretical background. Next, we will use a set of jupyter notebook, hosted on google colab, to run through three exercises on the use of the CoMet toolkit.
- In the first exercise, we will run together through some of the basic functionality of the punpy tool.
- In the second exercise, you will independently run through the jupyter notebook we have prepared. This will introduce some of the most important features of the obsarray and punpy tools. 
- In the third exercise, you will either try to apply these tools to propagate uncertainties for your own usecase, or alternatively try to implement uncertainty propagation for a usecase we have provided. 

We first install the obsarray package (flag handling and accessing uncertainties), the punpy package (uncertainty propagation) and the matheo package (for band integration).

In [None]:
!pip install obsarray>=1.0.1
!pip install punpy>=1.0.4
!pip install matheo

We also import all the relevant python packages we use in this training:

In [None]:
import xarray as xr
import numpy as np
import punpy
from matheo.band_integration import band_integration
from obsarray.templater.dataset_util import DatasetUtil
import matplotlib.pyplot as plt
import os

If this import fails, it is likely because the pip installation has not properly updated in the Google colab session. Please restart session (in runtime tab above).

Next, we clone the comet_training repository, which contains all the datasets used in this training.

In [None]:
!git clone https://github.com/comet-toolkit/comet_training.git

**Exercise 1: simple sensor calibration example**

In this exercise, the aim is to get familiar with the basic functionality of punpy. Here, punpy will be used as a standalone tool (i.e. without combining it with obsarray functionality). We will use an example of a very basic sensor calibration, where we have some digital numbers for the signal (referred to as L0), digital numbers for the dark measurement, and the gains (typically obtained from a lab calibration) to convert these to a physical quantity (referred to as L1). This could e.g. be for a radiance measurement by an insitu instrument.  

First, we define our measurement function. For use in punpy, this measurement function needs to be written as a Python function that takes the input quantities (on which we have uncertainties available) as arhuments and the measurand (to which we want to propagate the uncertainties) as return. 

In [None]:
# your measurement function
def calibrate(L0,gains,dark):
   return (L0-dark)*gains

Here, the measurement function is a very simple analytical function. However in practise, this measurement function can contain as much complexity (including calls to other packages/external software, ...) as needed. The measurement function is to some extend treated as a black box, as long as the input quantities and measurand are structured as expected.

Next, we define some example input data. For your own usecase, you would need to have this information available from other sources (i.e. the uncertainties on your inputs needs to be understood prior to using punpy). 

In [None]:
# your data
wavs = np.array([350,450,550,650,750])
L0 = np.array([0.43,0.8,0.7,0.65,0.9])
dark = np.array([0.05,0.03,0.04,0.05,0.06])
gains = np.array([23,26,28,29,31])

# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
L0_us = np.ones(5)*0.03  # systematic uncertainty of 0.03
                         # (common between bands)
gains_ur = np.array([0.5,0.7,0.6,0.4,0.1])  # random uncertainty
gains_us = np.array([0.1,0.2,0.1,0.4,0.3])  # systematic uncertainty
# (different for each band but fully correlated)
dark_ur = np.array([0.01,0.002,0.006,0.002,0.015])  # random uncertainty

After defining the data, the resulting uncertainty budget can then be calculated with punpy using the MC methods as:

In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,np.zeros(5)])
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us)
L1_corr=punpy.correlation_from_covariance(L1_cov)
print("L1:    ",L1)
print("L1_ur: ",L1_ur)
print("L1_us: ",L1_us)
print("L1_ut: ",L1_ut)
print("L1_cov:\n",L1_cov)
print("L1_corr:\n",L1_corr)

We then define some plots to inspect the results:

In [None]:
def make_plots_L1(L1,L1_ur=None,L1_us=None,L1_ut=None,L1_corr=None):
  if L1_cov is not None:
    fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,5))
  else:
    fig,ax1 = plt.subplots(1,figsize=(5,5))

  ax1.plot(wavs,L1,"o")
  if L1_ur is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ur,label="random uncertainty",capsize=5)
  if L1_us is not None:
    ax1.errorbar(wavs,L1,yerr=L1_us,label="systematic uncertainty",capsize=5)
  if L1_ut is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ut,label="total uncertainty",capsize=5)
  ax1.legend()
  ax1.set_xlabel("wavelength (nm)")
  ax1.set_ylabel("radiance")
  ax1.set_title("L1 uncertainties")
  if L1_cov is not None:
    ax2.set_title("L1 correlation")
    cov_plot=ax2.imshow(L1_corr)
    plt.colorbar(cov_plot,ax=ax2)
  plt.show()

and make the plots for the L1 data:

In [None]:
make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr)

Instead of separately propagating the random and systematic uncertainties, we can also achieve the same result by first combining the random and systematic uncertainties on the input, and then propagating the total uncertainties. In this case, the error correlation needs to be explicitely passed to the `propagate_standard' function.

In [None]:
L0_tot=(L0_ur**2+L0_us**2)**0.5
L0_cov=punpy.convert_corr_to_cov(np.eye(len(L0_ur)),L0_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L0_us),len(L0_us))),L0_us)
L0_corr=punpy.correlation_from_covariance(L0_cov)
gains_tot=(gains_ur**2+gains_us**2)**0.5
gains_cov=punpy.convert_corr_to_cov(np.eye(len(gains_ur)),gains_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(gains_us),len(gains_us))),gains_us)
gains_corr=punpy.correlation_from_covariance(gains_cov)
L1_ut, L1_corr=prop.propagate_standard(calibrate,[L0,gains,dark],
      [L0_ut,gains_tot,dark_ur],[L0_corr,gains_corr,"rand"], return_corr=True)

print("L1:    ",L1)
print("L1_ut: ",L1_ut)
print("L1_corr:\n",L1_corr)
make_plots_L1(L1,L1_ut=L1_ut,L1_corr=L1_corr)

In addition to propagating random (uncorrelated) and systematic (fully correlated) uncertainties it is also possible to propagate uncertainties associated with structured errors. If we know the covariance matrix for each of the input quantities, it is straigtforward to propagate these. In the below example we assume the L0 data and dark data to be uncorrelated (their covariance matrix is a, diagonal matrix) and gains to be a custom covariance:

In [None]:
# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
dark_ur = np.array([0.01,0.002,0.006,0.002,0.015])  # random uncertainty

L0_cov=punpy.convert_corr_to_cov(np.eye(len(L0_ur)),L0_ur)
dark_cov=punpy.convert_corr_to_cov(np.eye(len(dark_ur)),dark_ur )
gains_cov= np.array([[0.45,0.35,0.30,0.20,0.05],
                    [0.35,0.57,0.32,0.30,0.07],
                    [0.30,0.32,0.56,0.24,0.06],
                    [0.20,0.30,0.24,0.44,0.04],
                    [0.05,0.07,0.06,0.04,0.21]])


In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ut,L1_corr=prop.propagate_cov(calibrate,[L0,gains,dark],
                                 [L0_cov,gains_cov,dark_cov],return_corr=True)

make_plots_L1(L1,L1_ut=L1_ut,L1_corr=L1_corr)

In addition to having a correlation along one or more dimensions of a given variable, it is also possible two variables are correlated. This can be specified in punpy by using the corr_between keyword. In the example below, the systematic errors in the darks and L0 data are fully correlated.

In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)

corr_var=np.array([[1,0,1],[0,1,0],[1,0,1]])

L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,L0_us],corr_between=corr_var)
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us)
L1_corr=punpy.correlation_from_covariance(L1_cov)
make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr)

There are many keywords that can be passed to the punpy functions to control the detailed behaviour. For a full description we refer to the punpy documentation (see e.g. https://punpy.readthedocs.io/en/latest/content/generated/punpy.mc.mc_propagation.MCPropagation.propagate_standard.html). Two features we would like to highlight is the ability to return the MC samples that were used, for manual inspection:

In [None]:
L1_ut, L1_corr, MCsamples_L1, MCsamples_L0=prop.propagate_standard(calibrate,[L0,gains,dark],
      [L0_ut,gains_tot,dark_ur],[L0_corr,gains_corr,"rand"], return_corr=True, return_samples=True)
print(MCsamples_L0,MCsamples_L1)

It is also possible to use different probability density functions (PDF) instead of the default Gaussian PDF. E.g. it is possible to set a lower boundary on the values in the MCsamples of the inputs:

In [None]:
L1_ut=prop.propagate_standard(calibrate,[L0,gains,dark],
      [L0_ut,gains_tot,dark_ur],[L0_corr,gains_corr,"rand"], return_corr=False, pdf_shape="truncated_gaussian", pdf_params={"min":0.})


We have now finished running through the first exercise going over the basic functionalities of punpy. Next, please run through the second exercise yourself (https://github.com/comet-toolkit/comet_training/blob/main/LPS_training_exercise2.ipynb), where some of the most important features of the obsarray and punpy tools will be introduced. 