# Sentinel-2 data from the Copernicus Open Access Hub and Google Cloud Storage - Level 2A

In this notebook we:
* query the [Copernicus Open Access Hub API](https://scihub.copernicus.eu) for the Sentinel-2 assets corresponding to one MGRS tile;
* convert the returned metadata to STAC objects, adding links to the corresponding assets publicly available on [Google Cloud Storage](https://cloud.google.com/storage/docs/public-datasets/sentinel-2) (GCS). 
* add the returned items to a new catalog;
* save the catalog (metadata only, with links to the GCS data) to the SURF dCache storage system. 

## Copernicus Open Access Hub

We set the credentials to access Copernicus Open Access Hub (register at [this link](https://scihub.copernicus.eu/userguide/SelfRegistration)):

In [1]:
import getpass
username = getpass.getpass("username")
password = getpass.getpass("password")

username ········
password ···········


The API endpoint is available at the following link:

In [2]:
api_url = "https://apihub.copernicus.eu/apihub"

## Search for available assets

We look for the Sentinel-2 scenes that are available for the Red Glacier (Alaska). We define the area-of interest using its MGRS tile:

In [3]:
utm_zone = 5
latitude_band = "V"
grid_square = "MG"

We use [Sentinelsat](https://github.com/sentinelsat/sentinelsat) to query the Copernicus API:

In [4]:
import sentinelsat
api = sentinelsat.SentinelAPI(user=username, password=password, api_url=api_url)
api

<sentinelsat.sentinel.SentinelAPI at 0x7fead410a130>

We look for data processed at **Level-2A** (bottom-of-atmosphere reflectance):

In [5]:
query = dict(
    platformname="Sentinel-2",
    producttype="S2MSI2A",
)

In [6]:
# NOTE: the argument `tileid`, which is available for the Level-1C data is not available for Level-2A, thus we query on the filename
filename = f"*_T{utm_zone:02d}{latitude_band}{grid_square}_*"
products = api.query(filename=filename, **query)  # OpenSearch API

Querying products:  16%|#5        | 100/638 [00:00<?, ?product/s]

In [7]:
len(products)

638

In [8]:
query = dict(
    platformname="Sentinel-2",
    producttype="S2MSI1C",
)
tileid = f"{utm_zone:02d}{latitude_band}{grid_square}"
products_l1c = api.query(tileid=tileid, **query)  # OpenSearch API
product_l1c = next((p for p in products_l1c.values()))
len(products_l1c)

Querying products:  11%|#         | 100/937 [00:00<?, ?product/s]

937

## Convert metadata to STAC

Starting from the products' metadata, we create STAC items (`pystac.Item` objects). We try to follow as much as possible the structure employed in the [stactools sentinel-2 package](https://github.com/stactools-packages/sentinel2) (install via `pip install git+https://github.com/stactools-packages/sentinel2.git`):  

In [8]:
import pystac

from pystac.extensions.sat import OrbitState, SatExtension
from pystac.extensions.eo import EOExtension
from pystac.extensions.projection import ProjectionExtension

from shapely import geometry, wkt

from stactools.sentinel2.constants import \
    BANDS_TO_RESOLUTIONS, \
    SENTINEL_PROVIDER, \
    SENTINEL_LICENSE, \
    SENTINEL_BANDS, \
    SENTINEL_INSTRUMENTS, \
    SENTINEL_CONSTELLATION, \
    SAFE_MANIFEST_ASSET_KEY, \
    INSPIRE_METADATA_ASSET_KEY, \
    PRODUCT_METADATA_ASSET_KEY, \
    DATASTRIP_METADATA_ASSET_KEY, \
    GRANULE_METADATA_ASSET_KEY

In [9]:
def create_sentinel2_item(product):
    """Create a pystac.Item from a Copernicus Hub OpenSearch API product."""

    # footprint
    footprint = wkt.loads(product["footprint"])
    footprint = footprint.geoms[0] \
        if isinstance(footprint, geometry.MultiPolygon) \
        else footprint

    # Sentinel-2 properties
    properties = {
        "s2:product_uri": product["filename"],
        # "s2:generation_time": ?,
        "s2:processingbaseline": product["processingbaseline"],
        "s2:product_type": product["producttype"],
        "s2:datatake_id": product["s2datatakeid"],
        # "s2:datatake_type": product["sensoroperationalmode"],
        # "s2:datastrip_id": product["datastripidentifier"],
        # "s2:granule_id": product["granuleidentifier"],
            "s2:mgrs_tile": product["identifier"].split("_")[-2][1:],
        # "s2:reflectance_conversion_factor": ?,
    }

    # create Item
    item = pystac.Item(
        id=product["filename"],
        geometry=geometry.mapping(footprint),
        datetime=product["beginposition"],
        bbox=footprint.bounds,
        properties=properties
    )

    # common metadata
    item.common_metadata.constellation = SENTINEL_CONSTELLATION
    item.common_metadata.instruments = SENTINEL_INSTRUMENTS
    item.common_metadata.providers = [SENTINEL_PROVIDER]
    item.common_metadata.platform = product["platformserialidentifier"].lower()

    # EO extension
    item_eo = EOExtension.ext(item, add_if_missing=True)
    item_eo.cloud_cover = product["cloudcoverpercentage"]

    # sat extension
    item_sat = SatExtension.ext(item, add_if_missing=True)
    item_sat.platform_international_designator = product["platformidentifier"]
    item_sat.orbit_state = OrbitState(product["orbitdirection"].lower())
    item_sat.absolute_orbit = product["orbitnumber"]
    item_sat.relative_orbit = product["relativeorbitnumber"]

    # # projection extension 
    # item_projection = ProjectionExtension.ext(item, add_if_missing=True)
    # item_projection.epsg = ?

    # links
    item.links.append(SENTINEL_LICENSE)
    return item

In [10]:
items = (create_sentinel2_item(product) for product in products.values()) 

## Create catalog with search results

We create a STAC catalog:

In [11]:
catalog_id = "red-glacier_copernicushub-gcp_l2a"

In [12]:
catalog = pystac.Catalog(
    id=catalog_id,
    description=("This catalog contains Sentinel-2 MGRS tiles "
                 "that include the Red Glacier (Alaska). Metadata "
                 "have been retrieved from the Copernicus Open "
                 "Access Hub and linked to assets from Google Cloud Storage.")
)
catalog

<Catalog id=red-glacier_copernicushub-gcp_l2a>

In [13]:
# add search results to catalog
catalog.add_items(items)

We reorganize the catalog using the following template:

In [14]:
template = "${year}/${month}/${day}"
catalog.generate_subcatalogs(template)

[<Catalog id=2022>,
 <Catalog id=3>,
 <Catalog id=28>,
 <Catalog id=26>,
 <Catalog id=24>,
 <Catalog id=23>,
 <Catalog id=21>,
 <Catalog id=19>,
 <Catalog id=18>,
 <Catalog id=16>,
 <Catalog id=14>,
 <Catalog id=13>,
 <Catalog id=11>,
 <Catalog id=9>,
 <Catalog id=8>,
 <Catalog id=6>,
 <Catalog id=4>,
 <Catalog id=3>,
 <Catalog id=1>,
 <Catalog id=2>,
 <Catalog id=27>,
 <Catalog id=26>,
 <Catalog id=24>,
 <Catalog id=22>,
 <Catalog id=21>,
 <Catalog id=19>,
 <Catalog id=17>,
 <Catalog id=16>,
 <Catalog id=14>,
 <Catalog id=12>,
 <Catalog id=11>,
 <Catalog id=7>,
 <Catalog id=6>,
 <Catalog id=4>,
 <Catalog id=2>,
 <Catalog id=1>,
 <Catalog id=27>,
 <Catalog id=1>,
 <Catalog id=30>,
 <Catalog id=28>,
 <Catalog id=25>,
 <Catalog id=23>,
 <Catalog id=22>,
 <Catalog id=20>,
 <Catalog id=18>,
 <Catalog id=17>,
 <Catalog id=15>,
 <Catalog id=13>,
 <Catalog id=12>,
 <Catalog id=2021>,
 <Catalog id=12>,
 <Catalog id=3>,
 <Catalog id=1>,
 <Catalog id=11>,
 <Catalog id=29>,
 <Catalog id=28>,
 <Ca

Display current catalog structure:

In [15]:
catalog.describe()

* <Catalog id=red-glacier_copernicushub-gcp_l2a>
    * <Catalog id=2022>
        * <Catalog id=3>
            * <Catalog id=28>
              * <Item id=S2B_MSIL2A_20220328T213529_N0400_R086_T05VMG_20220328T230808.SAFE>
            * <Catalog id=26>
              * <Item id=S2A_MSIL2A_20220326T214531_N0400_R129_T05VMG_20220327T004619.SAFE>
            * <Catalog id=24>
              * <Item id=S2B_MSIL2A_20220324T215529_N0400_R029_T05VMG_20220324T231451.SAFE>
            * <Catalog id=23>
              * <Item id=S2A_MSIL2A_20220323T213531_N0400_R086_T05VMG_20220324T011128.SAFE>
            * <Catalog id=21>
              * <Item id=S2B_MSIL2A_20220321T214529_N0400_R129_T05VMG_20220321T232203.SAFE>
            * <Catalog id=19>
              * <Item id=S2A_MSIL2A_20220319T215531_N0400_R029_T05VMG_20220319T232624.SAFE>
            * <Catalog id=18>
              * <Item id=S2B_MSIL2A_20220318T213529_N0400_R086_T05VMG_20220318T224829.SAFE>
            * <Catalog id=16>
              * <I

## Add Google Cloud Storage (GCS) assets to the items

### Sentinel-2 data on GCS

Sentinel-2 data processed at level 2A are publicly available in the following [Google Cloud Storage (GCS) bucket](https://cloud.google.com/storage/docs/public-datasets/sentinel-2) (subfolder `L2`):

In [16]:
SENTINEL2_BUCKET = "gcp-public-data-sentinel-2"

Here data are organized according to the following directory structure:
```shell
/L2/tiles/<UTM_ZONE>/<LATITUDE_BAND>/<GRID_SQUARE>/<GRANULE_ID>/...
```

Public URLs can be formed by prepending the following base URL to the bucket path:

In [17]:
BASE_URL = "http://storage.googleapis.com"

### Access GCS using GCSFS

We can access the public Sentinel-2 bucket via the [GCSFS package](https://gcsfs.readthedocs.io). To install and configure this tool you need to:
* have an account on Google Cloud Platform;
* download and uncompress the [*Google Cloud SDK*](https://cloud.google.com/sdk/docs/install) tarball;
* run `./google-cloud-sdk/bin/gcloud auth login` and login via the browser;
* create a project either via the Google Cloud Console or via `./google-cloud-sdk/bin/gcloud projects create <PROJECT_ID>`, where `<PROJECT_ID>` must be a unique identifier;
* run `./google-cloud-sdk/bin/gcloud init --no-launch-browser` and follow the instructions.
* install GCSFS (with `pip`)

We can setup the GCS file system:

In [18]:
# enter Google account
google_account = getpass.getpass()

 ··················


In [19]:
import gcsfs
# setup filesystem using credentials created via `gcloud`
# ${HOME}/.config/gcloud/legacy_credentials/{google_account}@gmail.com/adc.json
gcs = gcsfs.GCSFileSystem(token="adc.json")

### Add assets to the catalog

In [37]:
import pathlib


# Band 10 (cirrus) is not present in L2A - it does not have surface information
SENTINEL_BANDS.pop("B10")


def make_url(item, path_rel, expand_path=False):
    """Construct URL with full GCS path from relative granule path."""
    tile_id = item.properties["s2:mgrs_tile"]
    path = (
        f"{SENTINEL2_BUCKET}/L2/tiles/"
        f"{tile_id[:2]}/{tile_id[2]}/{tile_id[3:]}/"
        f"{item.id}/"
        f"{path_rel}"
    )
    if expand_path:
        path = gcs.expand_path(path).pop()
    return f"{BASE_URL}/{path}"

    
def add_assets(item):
    """Use metadata to construct pystac.Asset objects and add them to the input Item."""

    # metadata files
    granule_metadata_href = make_url(item, "GRANULE/*/MTD_TL.xml", True)
    granule_path = pathlib.Path(granule_metadata_href).parent.name
    
    metadata_asset_hrefs = {
        SAFE_MANIFEST_ASSET_KEY: make_url(item, "manifest.safe"),
        INSPIRE_METADATA_ASSET_KEY: make_url(item, "INSPIRE.xml"),
        PRODUCT_METADATA_ASSET_KEY: make_url(item, "MTD_MSIL2A.xml"),
        DATASTRIP_METADATA_ASSET_KEY: make_url(item, "DATASTRIP/*/MTD_DS.xml", True),
        GRANULE_METADATA_ASSET_KEY: granule_metadata_href,
    }

    for key, href in metadata_asset_hrefs.items():
        item.add_asset(
            key,
            pystac.Asset(href=href,
                         media_type=pystac.MediaType.XML,
                         roles=["metadata"]))
    
    # bands
    band_stem = "T{}_{}".format(
        item.properties["s2:mgrs_tile"],
        item.datetime.strftime("%Y%m%dT%H%M%S")
    )
    
    for key, band in SENTINEL_BANDS.items():
        gsd = BANDS_TO_RESOLUTIONS[key][0]
        href = make_url(
            item, 
            f"GRANULE/{granule_path}/IMG_DATA/R{gsd}m/{band_stem}_{key}_{gsd}m.jp2"
        )
        asset = pystac.Asset(href=href,
                             media_type=pystac.MediaType.JPEG2000,
                             title=f"{band.description} - {gsd}m",
                             roles=["data"])
        asset.common_metadata.gsd = gsd
        asset_eo = EOExtension.ext(asset)
        asset_eo.bands = [band]
        item.add_asset(key, asset)

    # true color image
    asset = pystac.Asset(
        href=make_url(
            item, f"GRANULE/{granule_path}/IMG_DATA/R10m/{band_stem}_TCI_10m.jp2"
        ),
        media_type=pystac.MediaType.JPEG2000,
        title="True color image",
        roles=["data"]
    )
    asset.common_metadata.gsd = 10
    asset_eo = EOExtension.ext(asset)
    asset_eo.bands = [
        SENTINEL_BANDS['B04'], SENTINEL_BANDS['B03'], SENTINEL_BANDS['B02']
    ]
    item.add_asset("visual", asset)

    # water vapour
    asset = pystac.Asset(
        href=make_url(
            item, f"GRANULE/{granule_path}/IMG_DATA/R10m/{band_stem}_WVP_10m.jp2"
        ),
        media_type=pystac.MediaType.JPEG2000,
        title="Water Vapour",
        roles=["data"]
    )
    asset.common_metadata.gsd = 10
    item.add_asset("WVP", asset)
     
    # aerosol optical thickness
    asset = pystac.Asset(
        href=make_url(
            item, f"GRANULE/{granule_path}/IMG_DATA/R10m/{band_stem}_AOT_10m.jp2"
        ),
        media_type=pystac.MediaType.JPEG2000,
        title="Aerosol Optical Thickness",
        roles=["data"]
    )
    asset.common_metadata.gsd = 10
    item.add_asset("AOT", asset)
    
    # scene classification map
    asset = pystac.Asset(
        href=make_url(
            item, f"GRANULE/{granule_path}/IMG_DATA/R60m/{band_stem}_SCL_60m.jp2"
        ),
        media_type=pystac.MediaType.JPEG2000,
        title="Scene Classification Map",
        roles=["data"]
    )
    asset.common_metadata.gsd = 60
    item.add_asset("SCL", asset)
    
    # thumbnail
    asset = pystac.Asset(
        href=make_url(item, f"GRANULE/{granule_path}/QI_DATA/{band_stem}_PVI.jp2"),
        media_type=pystac.MediaType.JPEG2000,
        title="True color preview",
        roles=["thumbnail"]
    )
    asset_eo = EOExtension.ext(asset)
    asset_eo.bands = [
        SENTINEL_BANDS['B04'], SENTINEL_BANDS['B03'], SENTINEL_BANDS['B02']
    ]
    
    item.add_asset("preview", asset)


In [38]:
for item in catalog.get_all_items():
    add_assets(item)

## Save the catalog

To save the catalog (only metadata) locally:

In [39]:
catalog.normalize_and_save(
    f"./{catalog_id}",
    catalog_type='SELF_CONTAINED'
)

To save the catalog on the dCache storage system, we use [STAC2dCache](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache). In order to authenticate on dCache, we use a macaroon, which we have saved in a plain-text file.

In [40]:
url = (f"https://webdav.grid.surfsara.nl:2880/pnfs/"
       f"grid.sara.nl/data/eratosthenes/disk/{catalog_id}")

In [41]:
import stac2dcache
# configure PySTAC to read from/write to dCache
stac2dcache.configure(
    token_filename="macaroon.dat"
)
catalog._stac_io = stac2dcache.stac_io

In [42]:
# save catalog to storage
catalog.normalize_and_save(url, catalog_type='SELF_CONTAINED')