# Ingesting STAC metadata in APEx Product Catalogue

This notebook demonstrates how to add STAC metadata to an [APEx Product Catalogue](../instantiation/catalog.qmd).
Since STAC is a widely adopted specification, existing STAC documentation may also be useful for understanding its broader applications.

## Creating STAC metadata from scratch

In this first part, we provide a basic example of how to manually create STAC metadata from scratch.
However, in most real-world cases, this approach is not needed nor recommended. Instead, various tools and platforms will generate STAC metadata for you.

The STAC metadata in this example is entirely fictional and not intended for practical use.
It does not demonstrate how to create high-quality, FAIR-compliant metadata but serves purely as a demonstration of the STAC API and its ease of use.
Most of the more complex examples and tools will avoid this type of direct interaction.

In [2]:
import requests
from owslib.ogcapi.records import Records
from owslib.util import Authentication

import json

For ease of use, we define the URL of the APEx catalogue here. This allows us to reuse it as a reference throughout the notebook.

In [2]:
CATALOGUE = "https://catalogue.demo.apex.esa.int" 

### Authentication for APEx Product Catalogue

To interact with an APEx Product Catalogue, you need a valid authentication token. This token ensures that you can write metadata to an APEx-initialized STAC catalogue, provided you have been granted sufficient access rights.

To obtain an authentication token, follow the dedicated guide on [generating your OIDC token](token.md). This step is essential for securely accessing and modifying catalogue records.

In [None]:
# Helper class to support the authentication with owslib and requests libraries
class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r
    

token = input("Please provide your OIDC token here")
auth = BearerAuth(token)

In [29]:
r = Records(CATALOGUE,auth=Authentication(auth_delegate=auth))
print(json.dumps(r.collections(), indent=2))

{
  "collections": [],
  "links": [
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://catalogue.demo.apex.esa.int/"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://catalogue.demo.apex.esa.int/"
    },
    {
      "rel": "self",
      "type": "application/json",
      "href": "https://catalogue.demo.apex.esa.int/collections"
    }
  ]
}


### Exploring the dataset
Before creating the STAC metadata, it's useful to explore the dataset. This helps in understanding key properties such as spatial extent, resolution, and format.

A quick way to retrieve metadata from a representative asset is by using the `gdalinfo` command. Running this command provides useful details, including:

- Projection and coordinate system
- Bounding box and spatial resolution
- Number of bands and data type
- ...
  
This information will guide the creation of accurate STAC metadata for the dataset.

In [30]:
!GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR gdalinfo -json "/vsicurl/https://eoresults.esa.int/d/APEX_TEST/2020/03/01/europe_aggr-orgc_00-020_mean_100_202003-202210/europe_aggr-orgc_00-020_mean_100_202003-202210.tif"

{
  "description":"/vsicurl/https://eoresults.esa.int/d/APEX_TEST/2020/03/01/europe_aggr-orgc_00-020_mean_100_202003-202210/europe_aggr-orgc_00-020_mean_100_202003-202210.tif",
  "driverShortName":"GTiff",
  "driverLongName":"GeoTIFF",
  "files":[
    "/vsicurl/https://eoresults.esa.int/d/APEX_TEST/2020/03/01/europe_aggr-orgc_00-020_mean_100_202003-202210/europe_aggr-orgc_00-020_mean_100_202003-202210.tif"
  ],
  "size":[
    40888,
    41712
  ],
  "coordinateSystem":{
    "wkt":"PROJCRS[\"ETRS89-extended / LAEA Europe\",\n    BASEGEOGCRS[\"ETRS89\",\n        ENSEMBLE[\"European Terrestrial Reference System 1989 ensemble\",\n            MEMBER[\"European Terrestrial Reference Frame 1989\"],\n            MEMBER[\"European Terrestrial Reference Frame 1990\"],\n            MEMBER[\"European Terrestrial Reference Frame 1991\"],\n            MEMBER[\"European Terrestrial Reference Frame 1992\"],\n            MEMBER[\"European Terrestrial Reference Frame 1993\"],\n            MEMBER[\"Europ

### Creating a STAC collection

Now, we will construct and register the STAC collection from scratch using the `pystac` library. The STAC collection is used to store all of the assets of our dataset.

In [31]:
from datetime import datetime
import pystac

spatial_extent = pystac.SpatialExtent(bboxes=[[4,51,5,52]])
collection_interval = sorted([datetime(2020,1,1), datetime(2022,1,1)])
temporal_extent = pystac.TemporalExtent(intervals=[collection_interval])

description_markdown = """
SOC content in the 0-20 cm top soil expressed in g kg-1 for 100 m resolution pixels

Applications:

- First globally consistent and contiguous complete gridded soil property map of Europe
- Indicator of soil health, as per Mission Area Soil Health and Food
- By 2030, at least 75% of soils in each EU Member State should be in healthy condition or show a
significant improvement towards meeting accepted thresholds of indicators, to support ecosystem
services

*Reliability*: More than 83 % of the cross-validation points fall within the 70% prediction interval for the bare soil model.
For the vegetated area model 94 % of the points fall within the 90 % prediction interval.
"""

item_assets = {
    "SOC": {
       "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "title": "SOC",
        "description": "SOC content in the 0-20 cm top soil expressed in g kg-1",
        "data_type": "uint16",
        "nodata": 65535,
        "unit": "g kg-1",
        "roles": [
            "data"
        ]
    }
}

collection_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)
collection = pystac.Collection(id='esa-worldsoils',
                               description=description_markdown,
                               extent=collection_extent,
                               extra_fields=dict(item_assets=item_assets),
                               license='CC-BY-SA-4.0')

collection.summaries.add("proj:code",["EPSG:3035"])
collection.summaries.add("gsd",[100])
collection.summaries.add("bands",[{
    "title": "SOC",
    "gsd": 100
}])

print(json.dumps(collection.to_dict(), indent=2))

{
  "type": "Collection",
  "id": "esa-worldsoils",
  "stac_version": "1.1.0",
  "description": "\nSOC content in the 0-20 cm top soil expressed in g kg-1 for 100 m resolution pixels\n\nApplications:\n\n- First globally consistent and contiguous complete gridded soil property map of Europe\n- Indicator of soil health, as per Mission Area Soil Health and Food\n- By 2030, at least 75% of soils in each EU Member State should be in healthy condition or show a\nsignificant improvement towards meeting accepted thresholds of indicators, to support ecosystem\nservices\n\n*Reliability*: More than 83 % of the cross-validation points fall within the 70% prediction interval for the bare soil model.\nFor the vegetated area model 94 % of the points fall within the 90 % prediction interval.\n",
  "links": [],
  "item_assets": {
    "SOC": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "title": "SOC",
      "description": "SOC content in the 0-20 cm top soil expresse

To ensure the proper permissions are applied to our collection, we restrict write access to only the STAC administrators defined within our project. These administrators are assigned the `stac-admin-<environment>` role, granting them exclusive rights to add content to the collection.

Read permissions are set to public, allowing unrestricted access for viewing the collection.


In [32]:
coll_dict = collection.to_dict()

default_auth = {
    "_auth": {
        "read": ["anonymous"],
        "write": ["stac-admin-prod"]
    }
}

coll_dict.update(default_auth)

Finally, we send a request to the `/collections` endpoint to create the new collection

In [33]:
response = requests.post(f"{CATALOGUE}/collections", auth=auth,json=coll_dict)
response

<Response [201]>

We can verify the creation of the collection by querying the collections from the catalogue.

In [34]:
print(json.dumps(r.collections()['collections'][0], indent=2))

{
  "id": "esa-worldsoils",
  "description": "\nSOC content in the 0-20 cm top soil expressed in g kg-1 for 100 m resolution pixels\n\nApplications:\n\n- First globally consistent and contiguous complete gridded soil property map of Europe\n- Indicator of soil health, as per Mission Area Soil Health and Food\n- By 2030, at least 75% of soils in each EU Member State should be in healthy condition or show a\nsignificant improvement towards meeting accepted thresholds of indicators, to support ecosystem\nservices\n\n*Reliability*: More than 83 % of the cross-validation points fall within the 70% prediction interval for the bare soil model.\nFor the vegetated area model 94 % of the points fall within the 90 % prediction interval.\n",
  "stac_version": "1.1.0",
  "links": [
    {
      "rel": "items",
      "type": "application/geo+json",
      "href": "https://catalogue.demo.apex.esa.int/collections/esa-worldsoils/items"
    },
    {
      "rel": "parent",
      "type": "application/json",

### Creating the STAC item

In this step, we will create and register a dedicated STAC item to represent an element from our dataset. As mentioned earlier, this approach is generally not recommended, as most tools automatically generate STAC metadata for you.

In [35]:
import shapely

geometry = {
    "type":"Polygon",
    "coordinates":[
      [
        [
          -35.0421748,
          67.1549879
        ],
        [
          -9.3033749,
          33.067326
        ],
        [
          34.1876844,
          31.8603398
        ],
        [
          62.8622654,
          64.3475852
        ],
        [
          -35.0421748,
          67.1549879
        ]
      ]
    ]
}

item_bbox = shapely.geometry.shape(geometry).bounds
collection_item = pystac.Item(id='europe_aggr-orgc_00-020_mean_100_202003-202210',
                              geometry=geometry,
                              bbox = item_bbox,
                              datetime=datetime(2022,3,1),
                              collection=collection.id,
                              properties={})

collection_item.common_metadata.gsd = 100

asset = pystac.Asset(href="https://eoresults.esa.int/d/APEX_TEST/2020/03/01/europe_aggr-orgc_00-020_mean_100_202003-202210/europe_aggr-orgc_00-020_mean_100_202003-202210.tif", 
                      media_type=pystac.MediaType.GEOTIFF, roles=["data"])
collection_item.add_asset("SOC", asset)

print(json.dumps(collection_item.to_dict(), indent=2))

{
  "type": "Feature",
  "stac_version": "1.1.0",
  "stac_extensions": [],
  "id": "europe_aggr-orgc_00-020_mean_100_202003-202210",
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -35.0421748,
          67.1549879
        ],
        [
          -9.3033749,
          33.067326
        ],
        [
          34.1876844,
          31.8603398
        ],
        [
          62.8622654,
          64.3475852
        ],
        [
          -35.0421748,
          67.1549879
        ]
      ]
    ]
  },
  "bbox": [
    -35.0421748,
    31.8603398,
    62.8622654,
    67.1549879
  ],
  "properties": {
    "gsd": 100,
    "datetime": "2022-03-01T00:00:00Z"
  },
  "links": [],
  "assets": {
    "SOC": {
      "href": "https://eoresults.esa.int/d/APEX_TEST/2020/03/01/europe_aggr-orgc_00-020_mean_100_202003-202210/europe_aggr-orgc_00-020_mean_100_202003-202210.tif",
      "type": "image/tiff; application=geotiff",
      "roles": [
        "data"
      ]
    }


After creating the STAC metadata, we can now add it to our collection.

In [36]:
r.collection_item_create(collection.id, collection_item.to_dict())

True

We can verify the existing if the item was created by retrieving all of the collection items.

In [37]:
print(json.dumps(r.collection_items(collection.id), indent=2))

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "stac_version": "1.1.0",
      "stac_extensions": [],
      "id": "europe_aggr-orgc_00-020_mean_100_202003-202210",
      "collection": "esa-worldsoils",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              -35.0421748,
              67.1549879
            ],
            [
              -9.3033749,
              33.067326
            ],
            [
              34.1876844,
              31.8603398
            ],
            [
              62.8622654,
              64.3475852
            ],
            [
              -35.0421748,
              67.1549879
            ]
          ]
        ]
      },
      "bbox": [
        -35.0421748,
        31.8603398,
        62.8622654,
        67.1549879
      ],
      "properties": {
        "datetime": "2022-03-01T00:00:00Z",
        "gsd": 100.0,
        "created": "2025-03-07T10:14:49.124241Z",
  

## Visualizing in the APEx STAC Browser

The APEx product catalogue includes a [STAC browser](../instantiation/catalog.qmd#stac-browser) that enables you to visually explore collections and items. You can access the browser at the following URL: `browser.<project>.apex.esa.int`.

## Cleaning Up

Once the demo is complete, we can easily clean up by deleting the test collection, which is no longer needed.

In [38]:
requests.delete(f"{CATALOGUE}/collections/" + collection.id, auth=auth)

<Response [204]>