# Data Access and Retrieval

The [SpatioTemporal Asset Catalog (STAC) specification](https://stacspec.org/en) is becoming a standard to store and organized geospatial (meta)data. In this notebook we explore the world of STAC and show how it can be used to retrieve remote datasets, also in combination with the [SURF dCache storage system](http://doc.grid.surfsara.nl/en/latest/Pages/Advanced/grid_storage.html#dcache). 

## 1. STAC: APIs vs Static catalogs

There are two main types of STAC catalogs: "dynamic" catalogs (STAC APIs) and static catalogs. STAC APIs can be accessed (and queried!) via `pystac_client`:

In [None]:
stac_api_url = "https://earth-search.aws.element84.com/v1"

In [None]:
import pystac_client

In [None]:
client = pystac_client.Client.open(stac_api_url)

In [None]:
client

Static catalogs are a set of files (on your filesystem, web server, object storage, ...). They can be opened directly using `pystac`: 

In [None]:
stac_root_dir = "/project/stursdat/Data/RS-DAT/sentinel-2-l2a_AMS_2023-04"

In [None]:
!ls $stac_root_dir

In [None]:
!tree $stac_root_dir

In [None]:
import pystac

In [None]:
catalog = pystac.Catalog.from_file(f"{stac_root_dir}/catalog.json")

In [None]:
catalog.describe()

In [None]:
for item in catalog.get_all_items():
    print(item)

In [None]:
item.assets

In [None]:
item.properties

## 2. Constructing a catalog: Daymet 

### 2.1 The dataset

The Daymet dataset includes daily surface weather data for North America, starting from from January 1, 1980 (1950 for Puerto Rico). The dataset consists of a set of netCDF files that include gridded estimates of 7 parameters on a 1-km grid. More information on the dataset can be found [here](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=2129) (dataset version 4.5, https://doi.org/10.3334/ORNLDAAC/2129).

![](https://daac.ornl.gov/DAYMET/guides/Daymet_Daily_V4R1_Fig1.png)

In [None]:
# Spatial and temporal ranges
# (all latitude and longitude given in decimal degrees)

regions = {
    "na": {  # Continental North America
        "bbox": (-178.1333, 14.0749, -53.0567, 82.9143),
        "year_range": range(1980, 2023),
    },
    "hi": {  # Hawaii
        "bbox": (-160.3056, 17.9539, -154.772, 23.5186),
        "year_range": range(1980, 2023),
    },
    "pr": {  # Puerto Rico
        "bbox": (-67.9927, 16.8444, -64.1196, 19.9382),
        "year_range": range(1950, 2023),
    },
}

In [None]:
# Parameters

parameters = [
    "dayl",  # Day length
    "prcp",  # Precipitation
    "srad",  # Shortwave radiation
    "swe",   # Snow water equivalent
    "tmax",  # Maximum air temperature
    "tmin",  # Minimum air temperature
    "vp",    # Water vapor pressure
]

Daymet is made available from the [NASA's Distributed Active Archive Center (DAAC) at Oak Ridge National Laboratory (ORNL)](https://daac.ornl.gov). Individual NetCDF files can be accessed from URLs formatted as below:

In [None]:
# Dataset URLs

ORNL_DAAC_ROOT = "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac"

def get_daymet_file_url(region, param, year):
    return (
        f"{ORNL_DAAC_ROOT}/2129/daymet_v4_daily_{region}_{param}_{year}.nc"
    )

### 2.2 Constructing the catalog

In [None]:
import datetime

In [None]:
catalog = pystac.Catalog(id="daymet-daily-v4.5", description="Daymet daily v4.5")

In [None]:
import shapely

In [None]:
box = shapely.box(*regions["na"]["bbox"])

In [None]:
shapely.geometry.mapping(box)

In [None]:
for region_id, region in regions.items():
    years = region["year_range"]
    bbox = region["bbox"]
    geom = shapely.box(*bbox)
    for year in years:
        # Create STAC item
        item = pystac.Item(
            id=f"{region_id}-{year}",
            geometry=shapely.geometry.mapping(geom),
            bbox=bbox,
            datetime=datetime.datetime(year, 1, 1),
            properties={"region": region_id},
        )
        catalog.add_item(item)
        for parameter in parameters:
            # Create STAC asset
            asset = pystac.Asset(
                href=get_daymet_file_url(region_id, parameter, year)
            )
            item.add_asset(parameter, asset)

In [None]:
catalog.describe()

In [None]:
catalog.generate_subcatalogs("${region}")

In [None]:
catalog.describe()

In [None]:
catalog.normalize_and_save(
    "daymet-daily-v4.5",
    catalog_type=pystac.CatalogType.SELF_CONTAINED,
)

### 2.3 Retrieving the data

In [None]:
from stac2dcache.utils import copy_asset

In [None]:
hawaii = catalog.get_child("hi")

In [None]:
copy_asset(
    hawaii,
    "tmin",
    update_catalog=True,
    max_workers=2,
)

In [None]:
hawaii.save()

### 2.4 Accessing a STAC catalog on dCache

We authenticate using a macaroon, saved in a configuration file (in `~/.config/fsspec/`, see the [STAC2dCache tutorial](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache/blob/main/notebooks/tutorial.ipynb) and the [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration) for more info). We can use the `fsspec` GUI to select the root `catalog.json` file on the Daymet catalog:

In [None]:
import dcachefs
from fsspec.gui import FileSelector

In [None]:
dcache_root_path = "dcache://pnfs/grid.sara.nl/data/remotesensing/disk/"

In [None]:
sel = FileSelector(dcache_root_path)

In [None]:
sel

In [None]:
catalog = pystac.Catalog.from_file(sel.urlpath)

In [None]:
catalog.describe()

In [None]:
item = catalog.get_item("na-1981", recursive=True)

In [None]:
item