# STAC Catalogs on the SURF dCache Storage

We search for some assets in the [Sentinel-2 Open Data collection available on AWS](https://registry.opendata.aws/sentinel-2-l2a-cogs/), querying the [Earth Search STAC API end point](https://earth-search.aws.element84.com/v0):

In [1]:
from pystac_client import Client

In [2]:
STAC_API_URL = "https://earth-search.aws.element84.com/v0"

client = Client.open(STAC_API_URL)

# search assets
search = client.search(    
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2018-03-16/2018-03-25",
    # query Sentinel-2 tile 5VNK
    query=[
        "sentinel:utm_zone=5",
        "sentinel:latitude_band=V",
        "sentinel:grid_square=NK"
    ]
)

In [3]:
# get all items matching the query
items = search.get_all_items()

We create a STAC catalog to save the items found using [PySTAC](https://pystac.readthedocs.io/en/latest/index.html):

In [4]:
from pystac import Catalog, Item

In [5]:
# create new catalog
catalog = Catalog(
    id='s2-catalog',
    description='Test catalog for Sentinel-2 data'
)
catalog

<Catalog id=s2-catalog>

In [6]:
# add search results to catalog
catalog.add_items(items)
catalog.describe()

* <Catalog id=s2-catalog>
  * <Item id=S2B_5VNK_20180325_1_L2A>
  * <Item id=S2A_5VNK_20180324_0_L2A>
  * <Item id=S2B_5VNK_20180322_0_L2A>
  * <Item id=S2B_5VNK_20180319_0_L2A>


Let's save the catalog on the dCache storage. For authentication we use a macaroon (see [here](https://github.com/sara-nl/GridScripts/blob/master/get-macaroon) for instructions on how to generate the token), but username/password authentication can aso be employed. We use [STAC2dCache](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache) to provide the functionality to read/write PySTAC objects from/to the dCache storage system:  

In [7]:
import os
import stac2dcache

token_filename = os.path.expanduser("~/dcache/macaroon.dat")
stac2dcache.configure(token_filename=token_filename)

In [8]:
urlpath = "https://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/eratosthenes/disk/tmp-sentinel-2-catalog"

# temporary fix to https://github.com/stac-utils/pystac/issues/666 ?
catalog._stac_io = stac2dcache.stac_io
catalog.normalize_and_save(
    urlpath,
    catalog_type='SELF_CONTAINED'
)

Let's now retrieve few assets from AWS, and save them to dCache. We download the original XML metadata file and one band file ('B-1') from the MSI.

In [9]:
from stac2dcache.utils import copy_asset

In [10]:
# download assets - from web to storage
for asset_key in ('metadata', 'B01'):
    copy_asset(
        catalog, 
        asset_key, 
        update_catalog=True,  # update the catalog's links to the assets  
        filesystem_to=stac2dcache.fs,
        max_workers=2
    )
    
# save catalog with the updated links
catalog.normalize_and_save(urlpath, catalog_type='SELF_CONTAINED')

Note that `copy_asset` makes use of multiple (local) processes to download the data (use the `max_workers` argument to set the number of processes spawned).

STAC2dCache also includes some utility functions to download assets from dCache to the local filesystem for further processing, as well as to load assets directly into memory (check the [notebook tutorial](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache/blob/main/notebooks/tutorial.ipynb), which this notebook is based on).