<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/hypernets_surface_reflectance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Flags and Uncertainties for HYPERNETS**

This is an example of how the CoMet toolkit can be used handle flags and propagate uncertainties for the HYPERNETS products.
The HYPERNETS products used obsarray to store flags and uncertainties as digital effects tables which include a wide range of quality flags (De Vis et al. 2024a).
As such, the uncertainties in the HYPERNETS products can easily be propagated. Here we show a use case of band integrating the publicly distributed HYPERNETS L2B surface reflectance products over the Sentinel-2 spectral response functions. 

We first install the obsarray package (flag handling and accessing uncertainties), the punpy package (uncertainty propagation) and the matheo package (for band integration).

In [None]:
!pip install obsarray>=1.0.0
!pip install punpy>=0.44.2
!pip install matheo

Next, we open the HYPERNETS L2B data. An example for Gobabeb is used and available from the comet_training repository (which is first cloned). 

In [None]:
!git clone https://github.com/comet-toolkit/comet_training.git

In [None]:
import xarray as xr
import numpy as np

ds_HYP = xr.open_dataset("comet_training/HYPERNETS_L_GHNA_L2A_REF_20240112T0901_20240315T1804_v2.0.nc")  # read digital effects table

**Flags**

The flags that are present in this dataset can be accessed in few different ways.
First, there is the basic ways of accesing the flags using xarray:

In [None]:
print(ds_HYP["quality_flag"].values)
print(ds_HYP["quality_flag"].attrs["flag_meanings"])
data_flagged_bool=(ds_HYP["quality_flag"] > 0)
flagged_reflectance=ds_HYP["reflectance"].values[:,np.where(data_flagged_bool)[0]]
print(flagged_reflectance.shape)

This allows the user to access the flags by converting the quality flag integer to a binary number, and assigning each bit to a specific quality flag. In the attributes of the "quality_flag" data, the flag meanings for each bit are listed. 
Rather than manually converting the quality_flag values to binary and then working out which flags were set, obsarray can be used to make this easier.
After importing obsarray, the datasets have a .flag attribute which can be used to access the flag variables.
These can be used to get whether a certain flag (e.g. outliers), is set for each of the series.


In [None]:
import obsarray
from obsarray.templater.dataset_util import DatasetUtil

print(ds_HYP.flag["quality_flag"])
print(ds_HYP.flag["quality_flag"]["outliers"].value.values)


Obsarray also had a DataSetUtil module, which adds two very useful functions to access the flag information.
The get_set_flags() function allows to easily convert the quality_flag values into a list with the names of each set flag. This can also be done for the flags in each series by looping through each of them (see example below).
Next, there is a function which allows to pass a list of flags, and the function checks which any of these flags are set for each series, and returns a list of corresponding bools. 
There is also a function that allows to do the same but requires all of the provided flags to be set in order to return True. 

In [None]:
print([DatasetUtil.get_set_flags(flag) for flag in ds_HYP["quality_flag"]])
print(DatasetUtil.get_flags_mask_or(ds_HYP["quality_flag"], ["outliers", "series_missing"]))
print(DatasetUtil.get_flags_mask_and(ds_HYP["quality_flag"], ["outliers", "series_missing"]))

The get_flags_mask_or() is probably one of the most useful functions, as it enables to quickly remove data which has certain flags. 

In [None]:
bad_flags=["pt_ref_invalid", "half_of_scans_masked", "not_enough_dark_scans", "not_enough_rad_scans",
           "not_enough_irr_scans", "no_clear_sky_irradiance", "variable_irradiance",
           "half_of_uncertainties_too_big", "discontinuity_VNIR_SWIR", "single_irradiance_used"]
flagged = DatasetUtil.get_flags_mask_or(ds_HYP["quality_flag"], bad_flags)
id_series_valid = np.where(~flagged)[0]
ds_HYP = ds_HYP.isel(series=id_series_valid)

**Uncertainties**

Next we move on to propagating uncertainties. To do this we start by defining the measurement function class. Here we implement a measurement function that does the band integration over the S2A spectral response function.

In [None]:
from punpy import MeasurementFunction
from matheo.band_integration import band_integration

import time

class BandIntegrateS2A(MeasurementFunction):
    # your measurement function
    def meas_function(self, reflectance, wavelength):
        """
        Function to perform S2A band integration on reflectance

        :param reflectance: reflectance spectrum
        :param wavelength: wavelengths
        """
        refl_band, band_centres = band_integration.spectral_band_int_sensor(
            d=reflectance,
            wl=wavelength,
            platform_name="Sentinel-2A",
            sensor_name="MSI",
            u_d=None,
        )
        return refl_band
    
    def get_argument_names(self):
        """
        Function that returns the argument names of the meas_func, as they appear in the digital effects table (used to find right variable in input data).  

        :return: List of argument names
        """
        return ["reflectance", "wavelength"]
    
    def get_measurand_name_and_unit(self):
        """
        Function that returns the measurand name and unit of the meas_func. These will be used to store in the output dataset.  

        :return: tuple(measurand name, measurand unit)
        """
        return "band_reflectance", ""                                          

Next, we open the HYPERNETS L2B data. An example for Gobabeb is used and available from the comet_training repository (which is first cloned). 

We select a single series from the HYPERNETS data by finding the series for which the angles are nearest to the requested ones:

In [None]:
vza=0
vaa=90
vzadiff=(ds_HYP["viewing_zenith_angle"].values - vza)
vaadiff=(np.abs(ds_HYP["viewing_azimuth_angle"].values - vaa%360))
angledif_series = vzadiff** 2 + vaadiff ** 2
id_series = np.where(angledif_series == np.min(angledif_series))[0]
ds_HYP = ds_HYP.isel(series=id_series)

What remains is to create an object of our MeasurementFunction class and propagate the uncertainties in ds_HYP.

In [None]:
from punpy import MCPropagation

prop = MCPropagation(100,parallel_cores=1)

band_int_S2 = BandIntegrateS2A(prop)
ds_HYP_S2 = band_int_S2.propagate_ds(ds_HYP)
print(ds_HYP_S2)

We note that this process can also be performed on all the series together. 
One issue here is that, when generating the correlated samples of reflectance, punpy needs to calculate the error correlation matrix of the full dataset.
Typically, this will be done by calculating the full error correlation matrix (wavelength.series,wavelength.series), which is a very large matrix. 
In order to avoid the large RAM requirements of this, we can tell punpy to use error correlation dictionaries (separated by dimension) which take much less memory.
To apply the S2 SRF to the full HYPERNETS file we do:

In [None]:
ds_HYP_full = xr.open_dataset("comet_training/HYPERNETS_L_GHNA_L2A_REF_20240112T0901_20240315T1804_v2.0.nc")  # read digital effects table
prop = MCPropagation(100,parallel_cores=1)
band_int_S2 = BandIntegrateS2A(prop, use_err_corr_dict=True)
ds_HYP_full_S2 = band_int_S2.propagate_ds(ds_HYP_full)
print(ds_HYP_full_S2)