# STAC Client Walkthrough

This notebook mirrors the usage patterns from the [PySTAC Client documentation](https://pystac-client.readthedocs.io/en/stable/) against the EDITO data lake. It shows how to open a catalog, traverse links, and query items programmatically.

## Prerequisites
- Activate the project virtual environment (`source .venv/bin/activate.fish`).
- Install `pystac-client` (already listed in `requirements.txt`).
- Export `EDITO_API_TOKEN` or `EDITO_ACCESS_TOKEN`, or drop a fallback token into `MANUAL_TOKEN` below.

In [8]:
# If you are running this notebook standalone, uncomment the next lines
%pip install -q -r "../requirements.txt"
%pip install -q openpyxl

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import json
import os
from collections import deque
from typing import Dict, List, Optional

import pandas as pd
from dotenv import load_dotenv
from pystac import STACError
from pystac_client import Client
from pystac_client.exceptions import APIError, ParametersError

In [3]:
load_dotenv()

API_BASE = os.getenv("EDITO_DATA_BASE_URL", "https://api.dive.edito.eu/data").rstrip("/")

TOKEN = (
    os.getenv("EDITO_API_TOKEN")
    or os.getenv("EDITO_ACCESS_TOKEN")
    or MANUAL_TOKEN
)
TOKEN_SOURCE = (
    "EDITO_API_TOKEN"
    if os.getenv("EDITO_API_TOKEN")
    else "EDITO_ACCESS_TOKEN"
    if os.getenv("EDITO_ACCESS_TOKEN")
    else "MANUAL_TOKEN"
)

if not TOKEN:
    raise RuntimeError(
        "No EDITO token detected. Export EDITO_API_TOKEN / EDITO_ACCESS_TOKEN "
        "or set MANUAL_TOKEN in your environment before running this notebook."
    )

if TOKEN_SOURCE == "MANUAL_TOKEN":
    print("⚠️ Using MANUAL_TOKEN baked into the notebook; refresh it if you encounter 401s.")

auth_headers = {"Authorization": f"Bearer {TOKEN}"}
stac_root = API_BASE
client = Client.open(stac_root, headers=auth_headers)
client

## Inspect the catalog hierarchy
The tutorial explores how STAC catalogs relate to one another. The code below lists the immediate `child` links exposed by the EDITO root.

> ℹ️ PySTAC only forwards the headers you provide to the underlying STAC API[^1], so expired or missing bearer tokens propagate as `401 Unauthorized`. Make sure you refresh EDITO tokens before opening the client.
>
> [^1]: See “Working with APIs” in the [PySTAC documentation](https://pystac.readthedocs.io/en/stable/api.html#pystac-client) for details about the `headers` argument.

In [4]:
root_children = [
    {
        "title": link.title,
        "rel": link.rel,
        "href": link.href,
    }
    for link in client.get_links("child")
]
pd.set_option("display.max_colwidth", None)
pd.DataFrame(root_children)

Unnamed: 0,title,rel,href
0,Collections,child,https://api.dive.edito.eu/data/collections
1,Catalogs,child,https://api.dive.edito.eu/data/catalogs


### Explore the full catalog hierarchy
Walk breadth-first through every nested catalog so you can see how providers are organized. Adjust `EDITO_CATALOG_DEPTH` in your environment if you want to crawl deeper than the default.

In [5]:
MAX_CATALOG_DEPTH = int(os.getenv("EDITO_CATALOG_DEPTH", "1"))


def _link_target_id(link) -> Optional[str]:
    target_id = getattr(link, "target_id", None)
    if target_id:
        return target_id
    extra_fields = getattr(link, "extra_fields", None)
    if isinstance(extra_fields, dict):
        return extra_fields.get("title") or extra_fields.get("href")
    return link.title or link.href


def crawl_catalog_hierarchy(root_client: Client, max_depth: int = MAX_CATALOG_DEPTH) -> pd.DataFrame:
    """Breadth-first traversal of catalog child links up to the requested depth."""
    queue = deque([(root_client, 0, None)])
    seen_hrefs = set()
    rows: List[Dict] = []

    while queue:
        current_client, depth, parent_id = queue.popleft()
        self_href = getattr(current_client, "href", None)
        if not self_href and hasattr(current_client, "get_self_href"):
            self_href = current_client.get_self_href()

        if self_href in seen_hrefs:
            continue
        if self_href:
            seen_hrefs.add(self_href)

        child_links = list(current_client.get_links("child"))
        rows.append(
            {
                "depth": depth,
                "id": getattr(current_client, "id", None),
                "title": getattr(current_client, "title", None),
                "parent_id": parent_id,
                "child_count": len(child_links),
                "href": self_href,
            }
        )

        if depth >= max_depth:
            continue

        for link in child_links:
            link_id = _link_target_id(link)
            try:
                child_client = Client.open(link.href, headers=auth_headers)
                queue.append((child_client, depth + 1, getattr(current_client, "id", None)))
            except (APIError, ParametersError, STACError, Exception) as exc:
                rows.append(
                    {
                        "depth": depth + 1,
                        "id": link_id,
                        "title": link.title,
                        "parent_id": getattr(current_client, "id", None),
                        "child_count": None,
                        "href": link.href,
                        "error": str(exc),
                    }
                )

    hierarchy_df = pd.DataFrame(rows)
    if not hierarchy_df.empty:
        hierarchy_df = hierarchy_df.sort_values(["depth", "id"], na_position="last").reset_index(drop=True)
    return hierarchy_df

catalog_hierarchy_df = crawl_catalog_hierarchy(client)
catalog_hierarchy_df

/Users/daniels/Mono/projects/work/sintef/Edito-Playground/.venv/lib/python3.13/site-packages/pystac_client/client.py:191: NoConformsTo: Server does not advertise any conformance classes.


Unnamed: 0,depth,id,title,parent_id,child_count,href
0,0,root,EDITO Data Catalog,,2,https://api.dive.edito.eu/data
1,1,catalogs,Catalogs,root,24,https://api.dive.edito.eu/data/catalogs
2,1,collections,Collections,root,443,https://api.dive.edito.eu/data/collections


## Enumerate collections via `Client.get_collections`
The PySTAC Client docs highlight `client.get_collections()` for retrieving collection metadata lazily. Convert the generator to a DataFrame so it is easy to scan provider details.

In [10]:
collections = list(client.get_collections())
collections_df = pd.DataFrame(
    [
        {
            "id": c.id,
            "title": c.title,
            "license": c.license,
            "providers": ", ".join(p.name for p in c.providers or []),
            "provider_roles": ", ".join("; ".join(p.roles or []) for p in c.providers or []),
            "provider_urls": ", ".join(p.url or "" for p in c.providers or []),
            "keywords": ", ".join(c.keywords or []),
            "description": c.description,
            "source_href": next((link.href for link in c.links if link.rel == "derived_from"), None),
        }
        for c in collections
    ]
)
collections_output_path = os.getenv("EDITO_COLLECTIONS_EXPORT", "collections_export.xlsx")
collections_df.to_excel(collections_output_path, index=False)
print(f"Exported {len(collections_df)} collections to {collections_output_path}")

provider_counts = (
    pd.Series(
        provider.name
        for collection in collections
        for provider in (collection.providers or [])
        if provider and provider.name
    )
    .value_counts()
    .rename_axis("provider")
    .reset_index(name="collection_count")
)

with pd.option_context("display.max_rows", None, "display.max_colwidth", None):
    display(collections_df)
    display(provider_counts)

Exported 443 collections to collections_export.xlsx


Unnamed: 0,id,title,license,providers,provider_roles,provider_urls,keywords,description,source_href
0,emodnet-3d_habitat_suitability_maps_of_the_30_main_commercial_fish_species_from_the_atlantic_ocean,3d habitat suitability maps of the 30 main commercial fish species from the atlantic ocean,CC-BY-4.0+,"AZTI, Marine Research",p; r; o; v; i; d; e; r,,,A collection of 3D habitat suitability maps of the 30 main commercial fish species from the Atlantic Ocean data,
1,emodnet-additional_information_coastal_vulnerability_index,Additional information coastal vulnerability index,CC-BY-4.0+,"Geological Survey of the Netherlands (TNO), EMODnet Geology","p; r; o; v; i; d; e; r, p; r; o; v; i; d; e; r",",",,A collection of additional_information_coastal_vulnerability_index data,
2,emodnet-additional_information_coastal_vulnerability_index_of_closest_coastline,Additional information coastal vulnerability index of closest coastline,CC-BY-4.0+,"Geological Survey of the Netherlands (TNO), EMODnet Geology","p; r; o; v; i; d; e; r, p; r; o; v; i; d; e; r",",",,A collection of additional_information_coastal_vulnerability_index_of_closest_coastline data,
3,climate_forecast-age_of_sea_ice,Age of sea ice (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,"""Age of sea ice"" means the length of time elapsed since the ice formed. ""Sea ice"" means all ice floating in the sea which has formed from freezing sea water, rather than by other processes such as calving of land ice to form icebergs.",
4,emodnet-aggregate_extraction,Aggregate extraction,CC-BY-4.0+,EMODnet Human Activities,r; e; s; o; u; r; c; e; P; r; o; v; i; d; e; r,,,A collection of Aggregate Extraction data,
5,climate_forecast-aggregate_quality_flag,Aggregate quality flag (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,This flag is an algorithmic combination of the results of all relevant quality tests run for the related ancillary parent data variable. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute. The aggregate quality flag provides a summary of all quality tests performed on the data variable (both automated and manual) whether present in the dataset as independent ancillary variables to the parent data variable or not.,
6,climate_forecast-air_density,Air density (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,No help available.,
7,climate_forecast-air_pressure,Air pressure (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,Air pressure is the force per unit area which would be exerted when the moving gas molecules of which the air is composed strike a theoretical surface of any orientation.,
8,climate_forecast-air_pressure_at_mean_sea_level,Air pressure at mean sea level (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,"Air pressure at sea level is the quantity often abbreviated as MSLP or PMSL. Air pressure is the force per unit area which would be exerted when the moving gas molecules of which the air is composed strike a theoretical surface of any orientation. ""Mean sea level"" means the time mean of sea surface elevation at a given location over an arbitrary period sufficient to eliminate the tidal signals.",
9,climate_forecast-air_temperature,Air temperature (Climate Forecast convention),proprietary,EDITO,licensor,https://edito.eu/,,"Air temperature is the bulk temperature of the air, not the surface (skin) temperature.",


Unnamed: 0,provider,collection_count
0,EDITO,209
1,EMODnet Seabed Habitats,77
2,EMODnet Geology,59
3,EMODnet Chemistry,17
4,Bundesanstalt für Geowissenschaften und Rohstoffe (BGR),15
5,EMODnet Bathymetry,14
6,ISPRA,11
7,Cogea srl,11
8,Geological Survey of the Netherlands (TNO),11
9,EMODnet Human Activities,10


## Inspect a specific collection
Inspired by the CDSE example notebook, parameterize the collection ID you want to study and materialize its metadata so you can understand the spatial/temporal extents before querying items.

In [13]:
SPECIFIC_COLLECTION_ID = os.getenv("EDITO_SPECIFIC_COLLECTION", "climate_forecast-air_temperature")
specific_collection = next(
    (collection for collection in collections if collection.id == SPECIFIC_COLLECTION_ID),
    None,
)
if specific_collection is None:
    available_ids = sorted(collection.id for collection in collections[:20])
    raise ValueError(
        f"Collection {SPECIFIC_COLLECTION_ID} not found in the cached catalog list. "
        f"Set EDITO_SPECIFIC_COLLECTION to one of the available IDs (sample: {available_ids})."
    )

collection_summary = specific_collection.to_dict().get("summaries", {})
collection_row = {
    "id": specific_collection.id,
    "title": specific_collection.title,
    "license": specific_collection.license,
    "providers": ", ".join(provider.name for provider in specific_collection.providers or []),
    "spatial_extent": specific_collection.extent.spatial.bboxes if specific_collection.extent else None,
    "temporal_extent": specific_collection.extent.temporal.intervals if specific_collection.extent else None,
    "platform": collection_summary.get("platform"),
}
pd.DataFrame([collection_row])

Unnamed: 0,id,title,license,providers,spatial_extent,temporal_extent,platform
0,climate_forecast-air_temperature,Air temperature (Climate Forecast convention),proprietary,EDITO,"[[-180, -82.351699829102, 179.99998474121, 90]]","[[1841-03-21 00:00:00+00:00, 2025-06-10 00:00:00+00:00]]",


### Search the collection with CDSE-style filters
Mirror the CDSE example by running a spatial/temporal search against the chosen collection. Adjust the bounding box or time range if your area of interest differs.

In [19]:
COLLECTION_SEARCH_BBOX = json.loads(
    os.getenv("EDITO_COLLECTION_SEARCH_BBOX", "[-180.0, -90.0, 180.0, 90.0]")
)
COLLECTION_SEARCH_DATETIME = os.getenv(
    "EDITO_COLLECTION_SEARCH_DATETIME", "2024-01-01/2024-12-31"
)
COLLECTION_SEARCH_LIMIT = int(os.getenv("EDITO_COLLECTION_SEARCH_LIMIT", "5"))

specific_search = client.search(
    collections=[SPECIFIC_COLLECTION_ID],
    bbox=COLLECTION_SEARCH_BBOX,
    datetime=COLLECTION_SEARCH_DATETIME,
    max_items=COLLECTION_SEARCH_LIMIT,
)
specific_items = list(specific_search.get_items())
if not specific_items:
    print(
        "⚠️ No items match the configured bbox/datetime. "
        "Falling back to an unbounded search for sample rows."
    )
    fallback_search = client.search(
        collections=[SPECIFIC_COLLECTION_ID], max_items=COLLECTION_SEARCH_LIMIT
    )
    specific_items = list(fallback_search.get_items())
    if not specific_items:
        raise ValueError(
            "Collection appears empty; double-check your authorization or try again later."
        )

specific_item_rows: List[Dict] = []
for item in specific_items:
    props = item.to_dict().get("properties", {})
    first_asset_href = None
    if item.assets:
        first_asset_href = next(iter(item.assets.values())).href
    specific_item_rows.append(
        {
            "id": item.id,
            "datetime": props.get("datetime"),
            "bbox": item.bbox,
            "asset_count": len(item.assets or {}),
            "asset_preview_href": first_asset_href,
        }
    )
pd.DataFrame(specific_item_rows)



⚠️ No items match the configured bbox/datetime. Falling back to an unbounded search for sample rows.


Unnamed: 0,id,datetime,bbox,asset_count,asset_preview_href
0,25aaed46-b135-576e-8f7a-676a7371a7da,,"[-52.900001525878906, -70.4000015258789, 159.46066284179688, 76.10540008544922]",2,https://s3.waw3-1.cloudferro.com/mdl-arco-time-060/arco/INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311--ext--latest/platformChunked
1,8ebf3d6d-1a6f-58f8-beb3-e6ff6eae3c90,,"[-52.900001525878906, -70.4000015258789, 159.46066284179688, 76.10540008544922]",2,https://s3.waw3-1.cloudferro.com/mdl-arco-geo-060/arco/INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311--ext--latest/geoChunked
2,33857744-4f8c-5eb2-acc7-3cf15e6026b7,,"[-52.900001525878906, -70.4000015258789, 159.46066284179688, 76.10540008544922]",2,https://s3.waw3-1.cloudferro.com/mdl-arco-time-060/arco/INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311--ext--latest/timeChunked
3,51552bd9-81cf-5683-97b5-0e6310fbf055,,"[-171, -39.15924835205078, 159.46066284179688, 77.61932373046875]",2,https://s3.waw3-1.cloudferro.com/mdl-arco-time-058/arco/INSITU_IBI_PHYBGCWAV_DISCRETE_MYNRT_013_033/cmems_obs-ins_ibi_phybgcwav_mynrt_na_irr_202311--ext--latest/platformChunked
4,99b5c8e6-696f-5512-9f59-2f6e53caf374,,"[-171, -39.15924835205078, 159.46066284179688, 77.61932373046875]",2,https://s3.waw3-1.cloudferro.com/mdl-arco-geo-058/arco/INSITU_IBI_PHYBGCWAV_DISCRETE_MYNRT_013_033/cmems_obs-ins_ibi_phybgcwav_mynrt_na_irr_202311--ext--latest/geoChunked


## Open a provider catalog and inspect collections
Pick any catalog from above (for example `copernicus-marine-products`) and fetch its `collection` links.

In [20]:
TARGET_CATALOG_HREF = f"{API_BASE}/catalogs/copernicus-marine-products"
provider_client = Client.open(TARGET_CATALOG_HREF, headers=auth_headers)
collection_links = [
    {
        "id": link.target_id,
        "title": link.title,
        "href": link.href,
    }
    for link in provider_client.get_links("collection")
]
pd.DataFrame(collection_links).head(10)

/Users/daniels/Mono/projects/work/sintef/Edito-Playground/.venv/lib/python3.13/site-packages/pystac_client/client.py:191: NoConformsTo: Server does not advertise any conformance classes.


## Search for CMEMS items
The tutorial demonstrates `Client.search` for filtering items. Here we look up the requested CMEMS collection IDs.

In [23]:
TARGET_COLLECTION = os.getenv("EDITO_TARGET_COLLECTION", SPECIFIC_COLLECTION_ID)
collection_ids = {collection.id for collection in collections}
if TARGET_COLLECTION not in collection_ids:
    sample_ids = sorted(collection_ids)[:10]
    raise ValueError(
        f"Collection {TARGET_COLLECTION!r} is not present in the loaded catalog. "
        f"Pick one of the IDs displayed above (sample: {sample_ids})."
    )

try:
    search = client.search(collections=[TARGET_COLLECTION], max_items=10)
    items = list(search.get_items())
except APIError as exc:
    raise RuntimeError(f"Search for collection {TARGET_COLLECTION} failed: {exc}") from exc

if not items:
    raise ValueError(
        f"Search returned no items for {TARGET_COLLECTION}; try relaxing filters or use another collection."
    )

item_rows: List[Dict] = []
for item in items:
    props = item.to_dict().get("properties", {})
    item_rows.append(
        {
            "id": item.id,
            "collection": item.collection_id,
            "datetime": props.get("datetime"),
            "asset_keys": list(item.assets.keys()),
        }
    )

print(f"Showing {len(item_rows)} items from {TARGET_COLLECTION}")
pd.DataFrame(item_rows)



Showing 10 items from climate_forecast-air_temperature


Unnamed: 0,id,collection,datetime,asset_keys
0,25aaed46-b135-576e-8f7a-676a7371a7da,climate_forecast-air_temperature,,"[arco-platform-series, wmts]"
1,8ebf3d6d-1a6f-58f8-beb3-e6ff6eae3c90,climate_forecast-air_temperature,,"[arco-time-series, wmts]"
2,33857744-4f8c-5eb2-acc7-3cf15e6026b7,climate_forecast-air_temperature,,"[arco-geo-series, wmts]"
3,51552bd9-81cf-5683-97b5-0e6310fbf055,climate_forecast-air_temperature,,"[arco-platform-series, wmts]"
4,99b5c8e6-696f-5512-9f59-2f6e53caf374,climate_forecast-air_temperature,,"[arco-time-series, wmts]"
5,2539d45e-9cf0-539a-b356-9bc7a4636012,climate_forecast-air_temperature,,"[arco-geo-series, wmts]"
6,89f3408c-6ca4-56f0-9233-22d08465df7d,climate_forecast-air_temperature,,"[arco-platform-series, wmts]"
7,925a16c3-544c-5f27-93d6-1af514c89b0e,climate_forecast-air_temperature,,"[arco-time-series, wmts]"
8,c8345a7f-28e2-56ce-b138-58a3e4d3a24f,climate_forecast-air_temperature,,"[arco-geo-series, wmts]"
9,12d8e0e0-dba6-5ab9-a363-017ff8926ee9,climate_forecast-air_temperature,,"[arco-platform-series, wmts]"


### ItemCollection summary
Per the PySTAC Client guide, materialize the full STAC `ItemCollection` to inspect counts and aggregate metadata for the current search result.

In [25]:
item_collection = search.item_collection()
item_collection_dict = item_collection.to_dict()
summary = {
    "returned_features": len(item_collection_dict.get("features", [])),
    "matched": item_collection_dict.get("numberMatched"),
    "returned": item_collection_dict.get("numberReturned"),
    "bbox": item_collection_dict.get("bbox"),
}
pd.DataFrame([summary])

Unnamed: 0,returned_features,matched,returned,bbox
0,10,,,


## Inspect assets from the first item
Mimicking the tutorial, expose the available assets, media types, and roles so you can decide what to download next.

In [28]:
first_item = items[0]
assets_df = pd.DataFrame(
    [
        {
            "asset_key": key,
            "roles": asset.roles,
            "media_type": asset.media_type,
            "href": asset.href,
        }
        for key, asset in first_item.assets.items()
    ]
)
assets_df

Unnamed: 0,asset_key,roles,media_type,href
0,arco-platform-series,[data],,https://s3.waw3-1.cloudferro.com/mdl-arco-time-060/arco/INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311--ext--latest/platformChunked
1,wmts,[data],OGC:WMTS,https://wmts.marine.copernicus.eu/teroWmts/INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311--ext--latest?layer=INSITU_NWS_PHYBGCWAV_DISCRETE_MYNRT_013_036/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_202311/DRYT


## Preview the full STAC item JSON
Use this when you need to inspect geometry, providers, or asset metadata in detail.

In [29]:
print(json.dumps(first_item.to_dict(), indent=2)[:2000])

{
  "type": "Feature",
  "stac_version": "1.1.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/datacube/v2.2.0/schema.json",
    "https://stac-extensions.github.io/cf/v0.2.0/schema.json",
    "https://stac-extensions.github.io/scientific/v1.0.0/schema.json",
    "https://stac-extensions.github.io/ssys/v1.1.0/schema.json",
    "https://stac-extensions.github.io/example-links/v0.0.1/schema.json"
  ],
  "id": "25aaed46-b135-576e-8f7a-676a7371a7da",
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -52.900002,
          -70.400002
        ],
        [
          159.460663,
          -70.400002
        ],
        [
          159.460663,
          76.1054
        ],
        [
          -52.900002,
          76.1054
        ],
        [
          -52.900002,
          -70.400002
        ]
      ]
    ]
  },
  "bbox": [
    -52.900001525878906,
    -70.4000015258789,
    159.46066284179688,
    76.10540008544922
  ],
  "properties": {
    