<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/LPS_training_exercise1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LPS Hands-on Training Session - CoMet Toolkit: Uncertainties made easy**

#Exercise 1: Explore some of the basic functionality of the punpy tool.

## Objectives

In this exercise we will cover:

* Get familiar with the [**punpy**](https://punpy.readthedocs.io/en/latest/) tool.
* Propagating uncertainties on manually provided input data through a simple measurement functions using [**punpy**](https://punpy.readthedocs.io/en/latest/).
* Explore the various ways uncertainties with different error correlations can be propagated.


## *Step 1* - Environment Setup

We first install the obsarray package (flag handling and accessing uncertainties), the punpy package (uncertainty propagation) and the matheo package (for band integration).

In [None]:
!pip install obsarray>=1.0.1
!pip install punpy>=1.0.4

We then import the relevant python packages we use in this training:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import punpy

If this import fails, it is likely because the pip installation has not properly updated in the Google colab session. Please restart session (in runtime tab above).

## *Step 2* - define measurement function and input data

In this exercise, the aim is to get familiar with the basic functionality of punpy. Here, punpy will be used as a standalone tool (i.e. without combining it with obsarray functionality). We will use an example of a very basic sensor calibration, where we have some digital numbers for the signal (referred to as L0) and the gains (typically obtained from a lab calibration) to convert these to a physical quantity (referred to as L1). This could e.g. be for a radiance measurement by an insitu instrument.  

First, we define our measurement function. For use in punpy, this measurement function needs to be written as a Python function that takes the input quantities (on which we have uncertainties available) as arguments and the measurand (to which we want to propagate the uncertainties) as return. 

In [None]:
# your measurement function
def calibrate(L0,gains):
   return L0*gains

Here, the measurement function is a very simple analytical function. However in practise, this measurement function can contain as much complexity (including calls to other packages/external software, ...) as needed. The measurement function is to some extend treated as a black box, as long as the input quantities and measurand are structured as expected.

Next, we define some example input data. For your own usecase, you would need to have this information available from other sources (i.e. the uncertainties on your inputs needs to be understood prior to using punpy). 

In [None]:
# your data
wavs = np.array([350,450,550,650,750])
L0 = np.array([0.43,0.8,0.7,0.65,0.9])
gains = np.array([23,26,28,29,31])

# your uncertainties
L0_ur = L0*0.05                             # 5% random uncertainty
L0_us = np.ones(5)*0.03                     # systematic uncertainty of 0.03
                                            # (common between bands)
gains_ur = np.array([0.5,0.7,0.6,0.4,0.1])  # random uncertainty
gains_us = np.array([0.1,0.2,0.1,0.4,0.3])  # systematic uncertainty
                                            # (different for each band but fully correlated)


## *Step 3* - Propagate the random and systematic uncertainties separately 

After defining the data, the resulting uncertainty budget can then be calculated with punpy using the MC method. First, we separately propagate the random and systematic uncertainties, and then combine the resulting L1 uncertainties. The error correlations are also combined using some helper functions.

In [None]:
# initialise a punpy MCpropagation object with 10000 MC samples
prop=punpy.MCPropagation(10000)     

# apply the measuremnet function to calculate the measurand from the input quantities
L1=calibrate(L0,gains)

#propagate random uncertainties
L1_ur=prop.propagate_random(calibrate,[L0,gains],
      [L0_ur,gains_ur])

#propagate systematic uncertainties
L1_us=prop.propagate_systematic(calibrate,[L0,gains],
      [L0_us,gains_us])

#combine random and systematic uncertainties 
L1_ut=(L1_ur**2+L1_us**2)**0.5

#calculate random and systematic error correlation matrices (this is done by first combining covariances)
L1_cov=(punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)                   # random uncertainties have an identity matrix as the error correlation 
        + punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us))  # systematic uncertainties have a matrix full of ones as the error correlation
L1_corr=punpy.correlation_from_covariance(L1_cov)

#print the results
print("L1:    ",L1)
print("L1_ur: ",L1_ur)
print("L1_us: ",L1_us)
print("L1_ut: ",L1_ut)
print("L1_cov:\n",L1_cov)
print("L1_corr:\n",L1_corr)

We then define some plots to inspect the results:

In [None]:
#define plot to show results 
def make_plots_L1(L1,L1_ur=None,L1_us=None,L1_ut=None,L1_corr=None):
  if L1_cov is not None:
    fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,5))
  else:
    fig,ax1 = plt.subplots(1,figsize=(5,5))

  ax1.plot(wavs,L1,"o")
  if L1_ur is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ur,label="random uncertainty",capsize=5)
  if L1_us is not None:
    ax1.errorbar(wavs,L1,yerr=L1_us,label="systematic uncertainty",capsize=5)
  if L1_ut is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ut,label="total uncertainty",capsize=5)
  ax1.legend()
  ax1.set_xlabel("wavelength (nm)")
  ax1.set_ylabel("radiance")
  ax1.set_title("L1 uncertainties")
  if L1_cov is not None:
    ax2.set_title("L1 correlation")
    cov_plot=ax2.imshow(L1_corr)
    plt.colorbar(cov_plot,ax=ax2)
  plt.show()

and make the plots for the L1 data:

In [None]:
make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr) # make and display plot

## *Step 4* - Propagate uncertainties with an error correlation matrix 

Instead of separately propagating the random and systematic uncertainties, we can also achieve the same result by first combining the random and systematic uncertainties on the input, and then propagating the total uncertainties and their error correlaiton. In this case, the error correlation needs to be explicitely passed to the `propagate_standard' function.

In [None]:
# first combine the random and systemtic uncertainties on the inputs
L0_ut=(L0_ur**2+L0_us**2)**0.5
gains_tot=(gains_ur**2+gains_us**2)**0.5

#combine the error correlation matrices on the inputs (by combining the error covariances)
L0_cov=punpy.convert_corr_to_cov(np.eye(len(L0_ur)),L0_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L0_us),len(L0_us))),L0_us)
L0_corr=punpy.correlation_from_covariance(L0_cov)

gains_cov=punpy.convert_corr_to_cov(np.eye(len(gains_ur)),gains_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(gains_us),len(gains_us))),gains_us)
gains_corr=punpy.correlation_from_covariance(gains_cov)

#propagate the combined uncertainties and error correlation
L1_ut, L1_corr=prop.propagate_standard(calibrate,[L0,gains],
      [L0_ut,gains_tot],[L0_corr,gains_corr], return_corr=True)

#print results
print("L1:    ",L1)
print("L1_ut: ",L1_ut)
print("L1_corr:\n",L1_corr)
make_plots_L1(L1,L1_ut=L1_ut,L1_corr=L1_corr)

## **Exercise**

As an exercise, let's add an additional variable to the measurement function, and propagate uncertainties. 
Starting from the above example of calibrating an in-situ instrument by applying gains to the digital numbers, 
let's now add some dark measurements, which are subtracted from the digital numbers.

We here provide the updated measurement function and additional input data

In [None]:
# updated measurement function
def calibrate(L0,gains,dark):
   return (L0-dark)*gains

# additional input quantity
dark = np.array([0.05,0.03,0.04,0.05,0.06])
dark_ur = np.array([0.02,0.02,0.02,0.02,0.02])  # random uncertainty

Now try yourself to propagate the uncertainties through this measurement function, by adapting the examples above. (Note that there are here no systematic uncertainties on the darks, so set these to zero if you need them).

In [None]:
# Enter your code here

## *Step 5* - Error correlation between variables

In addition to having a correlation along one or more dimensions of a given variable, it is also possible two variables are correlated (for example because they are measured using the same sensor). This can be specified in punpy by using the corr_between keyword. In the example below, the systematic errors in the darks and L0 data are fully correlated.

In [None]:
# We here define some systematic uncertainties for the darks, which are the same as for the digital numbers
dark_us = L0_us

# We then define how the errors for the different variables are correlated
corr_var=np.array([[1,0,1],   # here a 1 means the variables are fully correlated, and a 0 means uncorrelated
                   [0,1,0],   # on the diagonal there are 1's because each variable is fully correlated with itself
                   [1,0,1]])  # there are also 1's on the (0,2) and (2,0) locations, indicating the 1st and last variable (i.e. L0 and dark) are correlated 

# We then recalculate the uncertainties and make a plot
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,L0_us],corr_between=corr_var)

L1_ut=(L1_ur**2+L1_us**2)**0.5

L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us)
L1_corr=punpy.correlation_from_covariance(L1_cov)

make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr)

## *Step 6* - Punpy keywords

There are many keywords that can be passed to the punpy functions to control the detailed behaviour. For a full description we refer to the [punpy documentation](https://punpy.readthedocs.io/en/latest/content/generated/punpy.mc.mc_propagation.MCPropagation.propagate_standard.html). Two features we would like to highlight is the ability to return the MC samples that were used, for manual inspection:

In [None]:
L1_ut, L1_corr, MCsamples_L1, MCsamples_L0=prop.propagate_standard(calibrate,[L0,gains,dark],
      [L0_ut,gains_tot,dark_ur],[L0_corr,gains_corr,"rand"], return_corr=True, return_samples=True)  #the return_samples keyword is set to True
print(MCsamples_L0,MCsamples_L1)

In these samples, we can see that there are some negative values in the dark samples. Depending on the use case, this might be considered unphysical. 

It is possible to use different probability density functions (PDF) instead of the default Gaussian PDF. E.g. it is possible to set a lower boundary on the values in the MCsamples of the inputs, and thus avoid these negative valued:

In [None]:
L1_ut, MCsamples_L1, MCsamples_L0 = prop.propagate_standard(calibrate,[L0,gains,dark],
      [L0_ut,gains_tot,dark_ur],[L0_corr,gains_corr,"rand"], return_corr=False, return_samples=True, pdf_shape="truncated_gaussian", pdf_params={"min":0.})  #the pdf shape is set to truncated gaussian, and pdf_param is a dictionary that allows to set the minimum and maximum value
print(MCsamples_L0,MCsamples_L1)

There are now no negative values anymore. (Note that this does reduce the uncertainties somewhat, so use with caution!)

# **link to Next Exercise**
We have now finished running through the first exercise going over the basic functionalities of punpy. 

In [Exercise 2](https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/LPS_training_exercise2.ipynb), the [**obsarray**](https://obsarray.readthedocs.io/en/latest/) and [**punpy**](https://punpy.readthedocs.io/en/latest/) functionalities for dealing with multi-dimension datasets will be showcased. 