<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/vhroda_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LPS hands-on training session: CoMet Toolkit: Uncertainties made easy**

Within this training session, we will run through 

intro

We first install the obsarray package (flag handling and accessing uncertainties), the punpy package (uncertainty propagation) and the matheo package (for band integration).

In [None]:
!pip install numpy==2.2.4
!pip install obsarray>=1.0.1
!pip install punpy>=1.0.4
!pip install matheo

We also import all the relevant python packages we use in this training:

In [None]:
import xarray as xr
import numpy as np
from punpy import MeasurementFunction, MCPropagation
from matheo.band_integration import band_integration
from obsarray.templater.dataset_util import DatasetUtil
import matplotlib.pyplot as plt
import os

Next, we clone the comet_training repository, which contains all the datasets used in this training. Some instructions on downloading LANDHYPERNET and RadCalNet data will be provided at the end of this notebook.   

In [None]:
!git clone https://github.com/comet-toolkit/comet_training.git

**Exercise 1: simple sensor calibration example**

Explain purpose of exercise

In [None]:
# your measurement function
def calibrate(L0,gains,dark):
   return (L0-dark)*gains

# your data
wavs = np.array([350,450,550,650,750])
L0 = np.array([0.43,0.8,0.7,0.65,0.9])
dark = np.array([0.05,0.03,0.04,0.05,0.06])
gains = np.array([23,26,28,29,31])

# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
L0_us = np.ones(5)*0.03  # systematic uncertainty of 0.03
                         # (common between bands)
gains_ur = np.array([0.5,0.7,0.6,0.4,0.1])  # random uncertainty
gains_us = np.array([0.1,0.2,0.1,0.4,0.3])  # systematic uncertainty
# (different for each band but fully correlated)
dark_ur = np.array([0.01,0.002,0.006,0.002,0.015])  # random uncertainty

We note that here we use a very simple analytical function as our measurement function, but this could be replaced by a complex processing chain, without any change to the way punpy is used.
After defining the data, the resulting uncertainty budget can then be calculated with punpy using the MC methods as:

In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,np.zeros(5)])
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us)
L1_corr=punpy.correlation_from_covariance(L1_cov)
print("L1:    ",L1)
print("L1_ur: ",L1_ur)
print("L1_us: ",L1_us)
print("L1_ut: ",L1_ut)
print("L1_cov:\n",L1_cov)
print("L1_corr:\n",L1_corr)

We then define some plots to inspect the results:

In [None]:
def make_plots_L1(L1,L1_ur=None,L1_us=None,L1_ut=None,L1_corr=None):
  if L1_cov is not None:
    fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,5))
  else:
    fig,ax1 = plt.subplots(1,figsize=(5,5))

  ax1.plot(wavs,L1,"o")
  if L1_ur is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ur,label="random uncertainty",capsize=5)
  if L1_us is not None:
    ax1.errorbar(wavs,L1,yerr=L1_us,label="systematic uncertainty",capsize=5)
  if L1_ut is not None:
    ax1.errorbar(wavs,L1,yerr=L1_ut,label="total uncertainty",capsize=5)
  ax1.legend()
  ax1.set_xlabel("wavelength (nm)")
  ax1.set_ylabel("radiance")
  ax1.set_title("L1 uncertainties")
  if L1_cov is not None:
    ax2.set_title("L1 correlation")
    cov_plot=ax2.imshow(L1_corr)
    plt.colorbar(cov_plot,ax=ax2)
  plt.show()

and make the plots for the L1 data:

In [None]:
make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr)

*Correlated errors*

In addition to propagating random (uncorrelated) and systematic (fully correlated) uncertainties it is also possible to propagate uncertainties associated with structured errors. If we know the covariance matrix for each of the input quantities, it is straigtforward to propagate these. In the below example we assume the L0 data and dark data to be uncorrelated (their covariance matrix is a, diagonal matrix) and gains to be a custom covariance:

In [None]:
# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
dark_ur = np.array([0.01,0.002,0.006,0.002,0.015])  # random uncertainty

L0_cov=punpy.convert_corr_to_cov(np.eye(len(L0_ur)),L0_ur)
dark_cov=punpy.convert_corr_to_cov(np.eye(len(dark_ur)),dark_ur )
gains_cov= np.array([[0.45,0.35,0.30,0.20,0.05],
                    [0.35,0.57,0.32,0.30,0.07],
                    [0.30,0.32,0.56,0.24,0.06],
                    [0.20,0.30,0.24,0.44,0.04],
                    [0.05,0.07,0.06,0.04,0.21]])


In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ut,L1_corr=prop.propagate_cov(calibrate,[L0,gains,dark],
                                 [L0_cov,gains_cov,dark_cov],return_corr=True)

make_plots_L1(L1,L1_ut=L1_ut,L1_corr=L1_corr)

In addition to having a correlation along one or more dimensions of a given variable, it is also possible two variables are correlated. This can be specified in punpy by using the corr_between keyword. In the example below, the systematic errors in the darks and L0 data are fully correlated.

In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)

corr_var=np.array([[1,0,1],[0,1,0],[1,0,1]])

L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,L0_us],corr_between=corr_var)
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
       punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_us)
L1_corr=punpy.correlation_from_covariance(L1_cov)
make_plots_L1(L1,L1_ur,L1_us,L1_ut,L1_corr)

**punpy for data with more dimensions**

In reality, propagation of uncertainty in EO is applied to larger datasets with higher dimensionality. Instead of the above 5 datapoints, we might have 5 wavelengths but 100 by 50 pixel images for each of these wavelengths. These can offcourse also be handled by punpy.

In [None]:
# your data
wavs = np.array([350,450,550,650,750])

L0 = np.tile([0.43,0.8,0.7,0.65,0.9],(50,100,1)).T
L0 = L0 + np.random.normal(0.0,0.05,L0.shape)

dark = np.tile([0.05,0.03,0.04,0.05,0.06],(50,100,1)).T
gains = np.tile([23,26,28,29,31],(50,100,1)).T

# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
L0_us = np.ones((5,100,50))*0.03  # systematic uncertainty of 0.03
                         # (common between bands)

gains_ur = np.tile(np.array([0.5,0.7,0.6,0.4,0.1]),(50,100,1)).T  # random uncertainty
gains_us = np.tile(np.array([0.1,0.2,0.1,0.4,0.3]),(50,100,1)).T  # systematic uncertainty
# (different for each band but fully correlated)
dark_ur = np.tile(np.array([0.01,0.002,0.006,0.002,0.015]),(50,100,1)).T  # random uncertainty

In [None]:
prop=punpy.MCPropagation(1000,)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur],repeat_dims=[1])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,None],repeat_dims=[1])
L1_ut=(L1_ur**2+L1_us**2)**0.5


We then define a new function to plot images of the relative uncertainties in each band:

In [None]:
def make_plots_L1_image(wavs,L1,L1_u=None,c_range=[0,0.1]):
  fig, axs = plt.subplots(1,len(wavs),figsize=(20,5))
  
  for i,ax in enumerate(axs):
    ax.set_xlabel("x_pix")
    ax.set_ylabel("y_pix")
    ax.set_title("%s nm rel uncertainties"%(wavs[i]))
    im_plot=ax.imshow(L1_u[i]/L1[i],vmin=c_range[0],vmax=c_range[1])

  plt.colorbar(im_plot)
  plt.show()

In [None]:
make_plots_L1_image(wavs,L1,L1_ur)
make_plots_L1_image(wavs,L1,L1_us)

For multidimensional input quantities, it is often the case that a certain correlation structure is known along one of the dimensions, and that the other dimensions are either completely independent (random) or fully correlated (systematic). For example below, we know the correlation structure for the systematic uncertainties on the gains wrt wavelength, and consider each of the measurements to be fully correlted wrt the spatial dimensions.

In [None]:
gains_corr=np.array([[1.,0.14123392,0.12198785,0.07234254,0.01968095],
 [0.14123392,1.,0.1350783,0.12524757,0.0095603 ],
 [0.12198785,0.1350783,1.,0.1041107,0.02890266],
 [0.07234254,0.12524757,0.1041107,1.,0.01041678],
 [0.01968095,0.0095603,0.02890266,0.01041678,1.]])

L1_us,L1_us_corr=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [None,gains_us,None],repeat_dims=[1,2],corr_x=[None,gains_corr,None],return_corr=True)

make_plots_L1_image(wavs,L1,L1_us)
make_plots_L1(np.mean(L1,axis=(1,2)),L1_us=np.mean(L1_us,axis=(1,2)),L1_corr=L1_us_corr)

summary statement