# Add sensor array metadata to Sentinel-2 data catalog 

In this notebook we:
* read a [previously created STAC catalog](./01_sentinel-2_CopernicusHub.ipynb) with Sentinel-2 L1C data from the SURF dCache storage system. 
* add links to the sensor array metadata, publicly available on [Google Cloud Storage](https://cloud.google.com/storage/docs/public-datasets/sentinel-2) (GCS), to the items of the catalog. 
* save the catalog (metadata only, with links to the GCS data) to the SURF dCache storage system. 

## Load catalog with Sentinel-2 L1C data

In [1]:
catalog_id = "red-glacier_copernicushub-gcp"
url = (f"https://webdav.grid.surfsara.nl:2880/pnfs/"
       f"grid.sara.nl/data/eratosthenes/disk/{catalog_id}/catalog.json")

In [2]:
import stac2dcache
# configure PySTAC to read from/write to dCache
stac2dcache.configure(
    token_filename="macaroon.dat"
)

In [3]:
import pystac
catalog = pystac.Catalog.from_file(
    url, 
    stac_io=stac2dcache.stac_io
)
catalog

<Catalog id=red-glacier_copernicushub-gcp>

## Add sensor array metadata from Google Cloud Storage (GCS) as assets to the items

For more information on Sentinel-2 data on GCS see this [README file](./README.md).

In [4]:
SENTINEL2_BUCKET = "gcp-public-data-sentinel-2"
BASE_URL = "http://storage.googleapis.com"

In [5]:
import pathlib
import xml.etree.ElementTree as ET


EXT_TO_MEDIATYPE = {
    ".jp2": pystac.MediaType.JPEG2000,
    ".gml": pystac.MediaType.XML,
}


def make_url(item, path_rel):
    """Construct URL with full GCS path from relative path."""
    tile_id = item.properties["s2:mgrs_tile"]
    path = (
        f"{SENTINEL2_BUCKET}/tiles/"
        f"{tile_id[:2]}/{tile_id[2]}/{tile_id[3:]}/"
        f"{item.id}/"
        f"{path_rel}"
    )
    return f"{BASE_URL}/{path}"

    
def add_sensor_array_metadata_as_assets(item):
    """Use metadata to construct pystac.Asset objects and add them to the input Item."""
    
    # get granule metadata file
    granule_metadata_txt = stac2dcache.fs.cat(
        item.assets["granule-metadata"].get_absolute_href()
    )

    root = ET.fromstring(granule_metadata_txt.decode())
    for el in root[-1][1]:  # n1:Quality_Indicators_Info metadataLevel, Pixel_Level_QI  
        if el.attrib["type"] == "MSK_DETFOO":
            path = pathlib.Path(el.text)
            href = make_url(item, path)
            band_key = path.stem.split("_")[-1]
            media_type = EXT_TO_MEDIATYPE[path.suffix]
            asset = pystac.Asset(
                href=href,
                media_type=media_type,
                roles=["metadata"]
            )
            item.add_asset(
                f"sensor-metadata-{band_key}", 
                asset
            )

In [6]:
for item in catalog.get_all_items():
    add_sensor_array_metadata_as_assets(item)

## Save the catalog

In [8]:
url = (f"https://webdav.grid.surfsara.nl:2880/pnfs/"
       f"grid.sara.nl/data/eratosthenes/disk/{catalog_id}")

In [9]:
# save catalog to storage
catalog.normalize_and_save(url, catalog_type=catalog.catalog_type)