<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/punpy_standalone_example_MCdetail.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Examples on how to use the punpy package with MC method
========================================================

1D input quantities and measurand
----------------------------------
Imagine you are trying to calibrate some L0 data to L1 and you have:

-  A measurement function that uses L0 data, gains, and a dark signal in 5 wavelength bands
-  Random uncertainties and systematic uncertainties on the L0 data;
-  Random and systematic uncertainties on the gains;
-  Random uncertainties on the dark signal.s

After defining the data, the resulting uncertainty budget can then be calculated with punpy using the MC methods as:

In [None]:
!pip install punpy>=0.43.1

In [None]:
import punpy
import numpy as np

# your measurement function
def calibrate(L0,gains,dark):
   return (L0-dark)*gains

# your data
L0 = np.array([0.43,0.8,0.7,0.65,0.9])
dark = np.array([0.05,0.03,0.04,0.05,0.06])
gains = np.array([23,26,28,29,31])

# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
L0_us = np.ones(5)*0.03  # systematic uncertainty of 0.03 
                           # (common between bands)
gains_ur = np.array([0.5,0.7,0.6,0.4,0.1])  # random uncertainty
gains_us = np.array([0.1,0.2,0.1,0.4,0.3])  # systematic uncertainty 
# (different for each band but fully correlated)
dark_ur = np.array([0.01,0.002,0.006,0.002,0.015])  # random uncertainty

prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,np.zeros(5)])
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(np.eye(len(L1_ur)),L1_ur)+\
         punpy.convert_corr_to_cov(np.ones((len(L1_us),len(L1_us))),L1_ur)

print(L1)
print(L1_ur)
print(L1_us)
print(L1_ut)
print(L1_cov)

We now have for each band the random uncertainties in L1, systematic uncertainties in L1, total uncertainty in L1 and the covariance matrix between bands.
Here we have manually specified a diagonal correlation matrix (no correlation, np.eye) for the random component and a correlation matrix of ones (fully correlated, np.ones).
It would also have been possible to use the keyword `return_corr` to get the measured correlation matrix. In the next example we use the `return_corr` keyword:


In [None]:
prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ur,L1_corr_r=prop.propagate_random(calibrate,[L0,gains,dark],
                  [L0_ur,gains_ur,dark_ur],return_corr=True)
L1_us,L1_corr_s=prop.propagate_systematic(calibrate,[L0,gains,dark],
                  [L0_us,gains_us,np.zeros(5)],return_corr=True)
L1_ut=(L1_ur**2+L1_us**2)**0.5
L1_cov=punpy.convert_corr_to_cov(L1_corr_r,L1_ur)+\
       punpy.convert_corr_to_cov(L1_corr_s,L1_ur)

print(L1)
print(L1_ur)
print(L1_us)
print(L1_ut)
print(L1_cov)                                            

This will give nearly the same results other than a small error due to MC noise.

Next we give an example where we try out a measurement function with multiple outputs.
In order to process a measurement function with multiple outputs, it is necessary to set the keyword `output_vars` to the number of outputs:

In [None]:
# your measurement function
def calibrate_2output(L0,gains,dark):
    return (L0-dark)*gains,(L0*gains-dark)


prop=punpy.MCPropagation(10000)
L1=calibrate_2output(L0,gains,dark)
L1_ur,L1_corr_r,L1_corr_r_between=prop.propagate_random(
                                  calibrate_2output,[L0,gains,dark],
                                  [L0_ur,gains_ur,dark_ur],
                                  return_corr=True,output_vars=2)
L1_us,L1_corr_s,L1_corr_s_between=prop.propagate_systematic(
                                  calibrate_2output,[L0,gains,dark],
                                  [L0_us,gains_us,np.zeros(5)],
                                  return_corr=True,output_vars=2)

print(L1)
print(L1_ur)
print(L1_us)                 

Due to the multiple vars, L1_ur now has the shape (2,5) so L1_ur\[0] now has the same uncertainties as 
the previous example, L1_corr_r\[0] is the same as L1_corr_r before. Analogously, L1_ur\[1] and L1_corr_r\[0]
give the random uncertainty and correlation matrix for the second output of the measurand.
There is now also a L1_corr_r_between which gives the correlation matrix between the two output variables 
of the measurment function (averaged over all wavelengths).

In addition to propagating random (uncorrelated) and systematic (fully correlated) uncertainties 
it is also possible to propagate uncertainties associated with structured errors.
If we know the covariance matrix for each of the input quantities, it is straigtforward to propagate these.
In the below example we assume the L0 data and dark data to be uncorrelated (their covariance matrix is a, 
diagonal matrix) and gains to be a custom covariance:

In [None]:
L0_cov=punpy.convert_corr_to_cov(np.eye(len(L0_ur)),L0_ur)
dark_cov=punpy.convert_corr_to_cov(np.eye(len(dark_ur)),dark_ur )
gains_cov= np.array([[0.45,0.35,0.30,0.20,0.05],
                    [0.35,0.57,0.32,0.30,0.07],
                    [0.30,0.32,0.56,0.24,0.06],
                    [0.20,0.30,0.24,0.44,0.04],
                    [0.05,0.07,0.06,0.04,0.21]])


prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ut,L1_corr=prop.propagate_cov(calibrate,[L0,gains,dark],
                                [L0_cov,gains_cov,dark_cov])
L1_cov=punpy.convert_corr_to_cov(L1_corr,L1_ut)

print(L1)
print(L1_ut)
print(L1_cov)

It is also possible to include covariance between the input variables. E.g. consider an example similar to the first one but where 
now the dark signal also has systematic uncertainties, which are entirely correlated with the systematic uncertainties on the L0 data 
(quite commonly the same detector is used for dark and L0). After defining this correlation matrix between the systematic uncertainties 
on the input variables, the resulting uncertainty budget can then be calculated with punpy as:

In [None]:
# correlation matrix between the input variables:
corr_input_syst=np.array([[1,0,1],[0,1,0],[1,0,1]])  # Here the correlation is
# between the first and the third variable, following the order of 
# the arguments in the measurement function

prop=punpy.MCPropagation(10000)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
                            [L0_ur,gains_ur,dark_ur])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
        [L0_us,gains_us,dark_us],corr_between=corr_input_syst)

print(L1)
print(L1_ur)
print(L1_us)

This gives us the random and systematic uncertainties, which can be combined to get the total uncertainty. 

Since within python it is possible to do array operation using arrays of any size (as long as shapes of different arrays match up), 
it is often possible to process all 10000 MCsteps in our example at the same time.
For the measurand function we defined L0, gains and dark can be processed using (5,10000) arrays rather than the normal (5,1) arrays that were defined above.
The returned measurand will now also be a (5,10000) array in our example.
This makes the processing of the MC steps as efficient as possible. However, not every measurement function will allow to do this. For example, a radiative 
transfer model cannot process 10000 model inputs at the same time. In this case we can force punpy to process the MC steps one-by-one by setting `parallel_cores` to 1.:

In [None]:
import time

# your measurement function
def calibrate_slow(L0,gains,dark):
    y2=np.repeat((L0-dark)*gains,30000)
    y2=y2+np.random.random(len(y2))
    y2=y2.sort()
    return (L0-dark)*gains

prop=punpy.MCPropagation(1000,parallel_cores=1)
L1=calibrate_slow(L0,gains,dark)
t1=time.time()
L1_ur = prop.propagate_random(calibrate_slow,[L0,gains,dark],
                                [L0_ur,gains_ur,dark_ur])
t2=time.time()
L1_us = prop.propagate_systematic(calibrate_slow,[L0,gains,dark],
                                    [L0_us,gains_us,np.zeros(5)])

print(L1)
print(L1_ur)
print(L1_us)
print("propogate_random took: ",t2-t1," s")

To speed up this slow process, it is also possible to use parallel processing. E.g. if we wanted to do parallel processing using 4 cores:

In [None]:
if __name__ == "__main__":
    prop=punpy.MCPropagation(1000,parallel_cores=6)
    L1=calibrate_slow(L0,gains,dark)
    t1=time.time()
    L1_ur = prop.propagate_random(calibrate_slow,[L0,gains,dark],
                                [L0_ur,gains_ur,dark_ur])
    t2=time.time()
    L1_us = prop.propagate_systematic(calibrate_slow,[L0,gains,dark],
                                    [L0_us,gains_us,np.zeros(5)])
    
    print(L1)
    print(L1_ur)
    print(L1_us)
    print("propogate_random took: ",t2-t1," s")

By using 6 cores, Propagate_random should now be significantly faster than when processing them in serial (setting parallel_cores=1).


**punpy for data with more dimensions**

We can expand the previous example to showcase the processing of 2D input quantities.
Often when taking L0 data, it is good practice to take more than a single set of data.
Now we assume we have 10 repeated measurements of the L0 data, darks and gains and still the same measurement function as before,
and random uncertainties on the L0, dark, and gains which all have the same (10,5) shape, and systematic uncertainties on the gains only (same shape).
In this case, other than the input arrays, very little changes in the propagation method and the uncertainties could be propagates as follows:

In [None]:
# your data
wavs = np.array([350,450,550,650,750])

L0 = np.tile([0.43,0.8,0.7,0.65,0.9],(50,100,1)).T
L0 = L0 + np.random.normal(0.0,0.05,L0.shape)

dark = np.tile([0.05,0.03,0.04,0.05,0.06],(50,100,1)).T
gains = np.tile([23,26,28,29,31],(50,100,1)).T

# your uncertainties
L0_ur = L0*0.05  # 5% random uncertainty
L0_us = np.ones((5,100,50))*0.03  # systematic uncertainty of 0.03
                         # (common between bands)

gains_ur = np.tile(np.array([0.5,0.7,0.6,0.4,0.1]),(50,100,1)).T  # random uncertainty
gains_us = np.tile(np.array([0.1,0.2,0.1,0.4,0.3]),(50,100,1)).T  # systematic uncertainty
# (different for each band but fully correlated)
dark_ur = np.tile(np.array([0.01,0.002,0.006,0.002,0.015]),(50,100,1)).T  # random uncertainty

In [None]:
prop=punpy.MCPropagation(1000,)
L1=calibrate(L0,gains,dark)
L1_ur=prop.propagate_random(calibrate,[L0,gains,dark],
      [L0_ur,gains_ur,dark_ur],repeat_dims=[1])
L1_us=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,None],repeat_dims=[1])
L1_ut=(L1_ur**2+L1_us**2)**0.5


We then define a new function to plot images of the relative uncertainties in each band:

In [None]:
import matplotlib.pyplot as plt

def make_plots_L1_image(wavs,L1,L1_u=None,c_range=[0,0.1]):
  fig, axs = plt.subplots(1,len(wavs),figsize=(20,5))
  
  for i,ax in enumerate(axs):
    ax.set_xlabel("x_pix")
    ax.set_ylabel("y_pix")
    ax.set_title("%s nm rel uncertainties"%(wavs[i]))
    im_plot=ax.imshow(L1_u[i]/L1[i],vmin=c_range[0],vmax=c_range[1])

  plt.colorbar(im_plot)
  plt.show()

In [None]:
make_plots_L1_image(wavs,L1,L1_ur)
make_plots_L1_image(wavs,L1,L1_us)

For multidimensional input quantities, it is often the case that a certain correlation structure is known along one of the dimensions, and that the other dimensions are either completely independent (random) or fully correlated (systematic). For example below, we know the correlation structure for the systematic uncertainties on the gains wrt wavelength, and consider each of the measurements to be fully correlted wrt the spatial dimensions.

In [None]:
gains_corr=np.array([[1.,0.14123392,0.12198785,0.07234254,0.01968095],
 [0.14123392,1.,0.1350783,0.12524757,0.0095603 ],
 [0.12198785,0.1350783,1.,0.1041107,0.02890266],
 [0.07234254,0.12524757,0.1041107,1.,0.01041678],
 [0.01968095,0.0095603,0.02890266,0.01041678,1.]])

L1_us,L1_us_corr=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [None,gains_us,None],repeat_dims=[1,2],corr_x=[None,gains_corr,None],return_corr=True)

make_plots_L1_image(wavs,L1,L1_us)
make_plots_L1(np.mean(L1,axis=(1,2)),L1_us=np.mean(L1_us,axis=(1,2)),L1_corr=L1_us_corr)

In this case, the returned correlation matrix is again wrt wavelength, and the correlation structure of the repeated measurements is the same as it was in the inputs. In the above example, the uncertainties on the L0 and darks are set to None, and are thus not included. However, it is possible to include these, even if they have a different correlation structure than the uncertainties on the gains. In the example below, we repeat the same, but now include systematic uncertainties on the L0, that are fully correlated. It can be seem in this case we can just set corr_x to None, in which case it will default to a full correlation (because we are using the propagate_systematic function). If we were using propagate_random, it would default to independent errors.

In [None]:
L1_us,L1_us_corr=prop.propagate_systematic(calibrate,[L0,gains,dark],
      [L0_us,gains_us,None],repeat_dims=[1,2],corr_x=[None,gains_corr,None],return_corr=True)

make_plots_L1_image(wavs,L1,L1_us)
make_plots_L1(np.mean(L1,axis=(1,2)),L1_us=np.mean(L1_us,axis=(1,2)),L1_corr=L1_us_corr)