# EOEPCA+ Use Case: NO2 Tropospheric Content Cloud Filtering - Register Input Data: NO2

This source is from here: https://gist.github.com/soxofaan/67dfad89da88b9036ee301e410cd0ae7

In response to questions regarding credentials with Terrascope STAC download in notebook   https://github.com/EOEPCA/demo/blob/main/demoroot/notebooks/08%20Use%20Case%20External%20Data%20Registration.ipynb

```
from odc.stac import stac_load
import pystac
from pystac import Item, Collection, Catalog, Extent, SpatialExtent, TemporalExtent, Asset
from datetime import datetime
from pystac_client import Client
import requests
```

In [1]:
from pystac_client import Client
from odc.stac import stac_load
from pystac import Item, Collection, Catalog, Extent, SpatialExtent, TemporalExtent, Asset
from datetime import datetime
import numpy as np
import pandas as pd
import openeo
import certifi

# for workspace management
import requests
import jwt
import time
import boto3
import os

## NO2 Tropospheric Content

Get Tropospheric NO2 Content data from the VITO Terrascope STAC API.

- S5P NO2 Tropospheric Content: terrascope-s5p-l3-no2-td-v2
- https://stac.terrascope.be/collections/terrascope-s5p-l3-no2-td-v2

Use Case: Regional DataCube: We are pulling STAC metadata from a collection containing all items with global coverage. We are extracting the spatial bbox we are intereseted in and provide this to the stac catalog/users. When users interact with the collection it is still fetched from the original source. But the extents are adapted.

### Request

In [2]:
#url = "https://services.terrascope.be/stac/"
url = "https://stac.terrascope.be/"
catalog = Client.open(url)


In [3]:
# TODO: THIS ONE BETTER? https://stac.terrascope.be/collections/terrascope-s5p-l3-no2-td-v2 --> STAC 1.1
#collection_id = "urn:eop:VITO:TERRASCOPE_S5P_L3_NO2_TD_V2"

collection_id = "terrascope-s5p-l3-no2-td-v2"
bbox = [3.0, 51.0, 4.0, 52.0]
bbox = [-10.0, 35.0, 30.0, 70.0]  # Europe
date_time = "2023-08-01T00:00:00Z/2023-12-31T23:59:59Z"

search = catalog.search(
    collections=[collection_id],
    bbox=bbox,
    datetime=date_time,
    limit=400,
)

In [4]:
items = list(search.items())
print(f"{len(items)=}")

len(items)=153


### Check Data

Get asset URL of an item

In [5]:
items[0]

In [6]:
asset_url = items[0].assets["NO2"].href
asset_url

'https://services.terrascope.be/download/Sentinel5P/L3_NO2_TD_V2/2023/12/S5P_OFFL_L3_NO2_TD_20231231_V200/S5P_NO2_TD_20231231_NO2_V200.tif'

Try to load as from GeoServer

In [7]:
ds = stac_load(
    items,
    #bands=["CF"], 
    crs="EPSG:4326",
    resolution=0.1,
    bbox=bbox,
    chunks={"time": 1} 
)

In [8]:
ds

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.71 MiB 546.88 kiB Shape (153, 350, 400) (1, 350, 400) Dask graph 153 chunks in 3 graph layers Data type float32 numpy.ndarray",400  350  153,

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.71 MiB 546.88 kiB Shape (153, 350, 400) (1, 350, 400) Dask graph 153 chunks in 3 graph layers Data type float32 numpy.ndarray",400  350  153,

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [9]:
monthly_mean_no2 = ds['NO2'].groupby('time.month').median(dim='time')

This is loading the data into memory. **Gives 401 Unauthorized.**

In [None]:
%%time
monthly_mean_no2.plot(col="month",
    col_wrap=3,
    cmap="viridis",
    vmin=0,
    vmax=1,
    figsize=(12, 6),
    cbar_kwargs={"label": "NO2"}) # 401 unauthorized

### Register to EOEPCA via Registration BB

For registration in EOEPCA create a catalogue and collection from these items.

In [10]:
print(len(items))
print(items[0])
print(items[-1])

153
<Item id=S5P_OFFL_L3_NO2_TD_20231231_V200>
<Item id=S5P_OFFL_L3_NO2_TD_20230801_V200>


In [11]:
items[0].stac_extensions # there are already most extensions for the best practices

['https://stac-extensions.github.io/product/v0.1.0/schema.json',
 'https://stac-extensions.github.io/processing/v1.2.0/schema.json',
 'https://stac-extensions.github.io/file/v2.1.0/schema.json',
 'https://stac-extensions.github.io/raster/v2.0.0/schema.json',
 'https://stac-extensions.github.io/projection/v2.0.0/schema.json',
 'https://stac-extensions.github.io/alternate-assets/v1.2.0/schema.json',
 'https://stac-extensions.github.io/authentication/v1.1.0/schema.json']

**DataCubeAccess BB:** Adapt collection and items to stac data cube access best practices, add them to the collection and save. Best Practices [here](https://github.com/EOEPCA/datacube-access/blob/main/best_practices/stac_best_practices.md).

Set a local folder for saving and name of the catalog.

In [12]:
local_folder = 's5p-no2-stac-catalog'

In [13]:
# ToDo: Apply best practices @Juraj --> Analog to other demo script
original_items = items

catalog = Catalog(
    id="s5p-no2-root",
    description="Root catalog for S5P NO2 STAC data."
)

get collection info from original terrascope url

In [14]:
url = "https://stac.terrascope.be/collections/terrascope-s5p-l3-no2-td-v2"
collection = Collection.from_file(url)
print(collection.id)
print(collection.title)

# Adapt some things, extent

terrascope-s5p-l3-no2-td-v2
Sentinel-5P Level-3 NO2 Daily Product - V2


In [15]:
# check what else to adapt
# id, ..., ..., 

bbox_actual = [
    float(ds.longitude.min()), float(ds.latitude.min()),
    float(ds.longitude.max()), float(ds.latitude.max())
]

new_extent = Extent(
    SpatialExtent([bbox_actual]),
    TemporalExtent([[datetime(2023, 8, 1), datetime(2023, 8, 31, 23, 59, 59)]])
)
collection.extent = new_extent
collection.extent.spatial.bboxes


[[-9.95, 35.050000000000004, 29.950000000000006, 69.95]]

In [16]:
# add items to collection
# and adapt what is to be adapted

for i, src_item in enumerate(original_items):
    t = ds['time'].values[i]

    # ...
    
    # Update properties with extension fields
    properties = dict(src_item.properties)

    # ...
    
    # Add/Update extension fields
    properties.update({
        "proj:bbox": bbox_actual,
        "proj:epsg": 4326,
        "proj:shape": [ds.latitude.size, ds.longitude.size],
        "proj:transform": [
            float(ds.longitude[1] - ds.longitude[0]), 0.0, float(ds.longitude.min()),
            0.0, float(ds.latitude[1] - ds.latitude[0]), float(ds.latitude.min())
        ],
        "cube:dimensions": {
            "x": {"type": "spatial", "axis": "x", "extent": [float(ds.longitude.min()), float(ds.longitude.max())]},
            "y": {"type": "spatial", "axis": "y", "extent": [float(ds.latitude.min()), float(ds.latitude.max())]},
            "t": {"type": "temporal", "extent": [str(t), str(t)]}
        }
    })

   # ...

    src_item.geometry = {
        "type": "Polygon",
        "coordinates": [[
            [bbox_actual[0], bbox_actual[1]],
            [bbox_actual[2], bbox_actual[1]],
            [bbox_actual[2], bbox_actual[3]],
            [bbox_actual[0], bbox_actual[3]],
            [bbox_actual[0], bbox_actual[1]]
        ]]
    }
    src_item.bbox = bbox_actual
    src_item.datetime = pd.to_datetime(t).to_pydatetime()
    src_item.properties = properties

    collection.add_item(src_item)


In [17]:
catalog.add_child(collection)
catalog.normalize_and_save(local_folder, catalog_type="SELF_CONTAINED")
print(f"Catalog saved to: {local_folder}")
print(f"Number of items in collection: {len(list(collection.get_all_items()))}")

Catalog saved to: s5p-no2-stac-catalog
Number of items in collection: 153


In [18]:
catalog = Catalog.from_file(f"{local_folder}/catalog.json")
catalog.validate()

['https://schemas.stacspec.org/v1.1.0/catalog-spec/json-schema/catalog.json']

In [19]:
collection = catalog.get_child("terrascope-s5p-l3-no2-td-v2") # has to be adapted to match naming scheme of other script
collection.validate()

['https://schemas.stacspec.org/v1.1.0/collection-spec/json-schema/collection.json',
 'https://stac-extensions.github.io/render/v1.0.0/schema.json',
 'https://stac-extensions.github.io/version/v1.2.0/schema.json',
 'https://stac-extensions.github.io/authentication/v1.1.0/schema.json']

**Workspace BB**: Save json to Workspace BB or devcluster object storage.

Definitions for workspace

In [20]:
# get this info from here: https://workspace-api.develop.eoepca.org/workspaces/ws-eric, can also be retrieved programatically by following example "05 Workspace Management"
owner = "eric"
password = "changeme"
ws_name = "ws-eric"

# is this info really needed?
realm = "eoepca"
base_domain = "develop.eoepca.org" #"apx.develop.eoepca.org"
keycloak_endpoint = f"https://iam-auth.{base_domain}"
workspace_api_endpoint = f'https://workspace-api.{base_domain}/workspaces'
token_endpoint = f"{keycloak_endpoint}/realms/{realm}/protocol/openid-connect/token"
minio_endpoint = "https://minio.develop.eoepca.org"

Define functions to interact with workspace.

In [21]:
# do these functions need to be defined for only using the workspace?
def iam_token(username, password):
    headers = {
        "Cache-Control": "no-cache",
        "Content-Type": "application/x-www-form-urlencoded"
    }
    data = {
        "scope": "roles",
        "grant_type": "password",
        "username": username,
        "password": password,
        "client_id": "demo",
        "client_secret": "demo"
    }    
    response = requests.post(token_endpoint, headers=headers, data=data)
    if response.ok:
        return response.json()["access_token"]
    else:
        print(response)
        return None

def access_ws(ws_name, token):
    headers = {
        'Authorization': 'bearer ' + token
    }
    url = f"{workspace_api_endpoint}/{ws_name}"
    print(f"HTTP GET {url}")
    response = requests.get(url, headers=headers)
    print(response)
    #print(response.text)
    return response

Prepare workspace

In [22]:
while True:
    response = access_ws(ws_name, iam_token(owner, password))
    if response.status_code == 200:
        try:
            workspace_data = response.json()
            print(workspace_data.get("status"))
            if workspace_data.get("status") == "ready":
                break
        except ValueError:
            print("not ready yet")

    print("...")
    time.sleep(20)    

HTTP GET https://workspace-api.develop.eoepca.org/workspaces/ws-eric
<Response [200]>
ready


Retrieve relevant information sor storage.

In [23]:
response1 = access_ws(ws_name, iam_token(owner, password))
assert response1.status_code == 200 # this sometimes gives 401, upon next execution 200 again.
print(f"✅workspace ownership checked and can retrieve workspace details")
#workspace_data

HTTP GET https://workspace-api.develop.eoepca.org/workspaces/ws-eric
<Response [200]>
✅workspace ownership checked and can retrieve workspace details


In [24]:
bucket_name = workspace_data["storage"]["credentials"]["bucketname"]
s3_access = workspace_data["storage"]["credentials"]["access"]
s3_secret = workspace_data["storage"]["credentials"]["secret"]
s3_endpoint = workspace_data["storage"]["credentials"]["endpoint"]

Connect to s3.

In [25]:
session = boto3.session.Session()
s3resource = session.resource('s3', aws_access_key_id=s3_access, aws_secret_access_key=s3_secret, endpoint_url=minio_endpoint)

In [26]:
s3_folder_prefix = f"end2end/{local_folder}/"

Recursively upload all files (the catalog, collection and items)

In [27]:
for root, dirs, files in os.walk(local_folder):
    for file in files:
        local_path = os.path.join(root, file)
        relative_path = os.path.relpath(local_path, local_folder)
        s3_key = os.path.join(s3_folder_prefix, relative_path).replace("\\", "/")

        try:
            object = s3resource.Object(bucket_name, s3_key)
            with open(local_path, 'rb') as data:
                result = object.put(Body=data)

            res = result.get('ResponseMetadata')
            if res.get('HTTPStatusCode') == 200:
                print(f"✅ Uploaded: {s3_key}")
            else:
                print(f"❌ Failed: {s3_key}")
        except Exception as e:
            print(f"🚨 Error uploading {s3_key}: {e}")

✅ Uploaded: end2end/s5p-no2-stac-catalog/catalog.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/collection.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20230902_V200/S5P_OFFL_L3_NO2_TD_20230902_V200.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20231222_V200/S5P_OFFL_L3_NO2_TD_20231222_V200.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20230908_V200/S5P_OFFL_L3_NO2_TD_20230908_V200.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20231214_V200/S5P_OFFL_L3_NO2_TD_20231214_V200.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20231028_V200/S5P_OFFL_L3_NO2_TD_20231028_V200.json
✅ Uploaded: end2end/s5p-no2-stac-catalog/terrascope-s5p-l3-no2-td-v2/S5P_OFFL_L3_NO2_TD_20230812_V200/S5P_OFFL_L3_NO2_TD_20230812_V200.json
✅ Uploaded: end2end/s

Check that files are available here: https://ws-eric.develop.eoepca.org/files/ws-eric/end2end/

**Registration BB**: Have Registration BB - Harvester add the catalogue to EOEPCA STAC API

In [28]:
#  https://github.com/EOEPCA/demo/blob/main/demoroot/notebooks/06%20Resource%20Registration%20Harvester.ipynb
from requests import Session
from requests.auth import HTTPBasicAuth
import json

# Setup connection to Flowable API
flowable_base_url = "https://registration-harvester-api.develop.eoepca.org/flowable-rest"
flowable_rest_user = "eoepca"
flowable_rest_pw = "eoepca"
flowable_session = Session()
flowable_session.auth = HTTPBasicAuth(flowable_rest_user, flowable_rest_pw)

In [29]:
url = f"{flowable_base_url}/service/repository/process-definitions"
print(f"GET {url}")
response = flowable_session.get(url)
processes = response.json()["data"]
if len(processes) == 0:
    print("No workflow definitions")
else:
    for idx, process in enumerate(processes, 1):
        print("%-2s %-28s version: %-5s id: %s" % (idx, process['name'], process['version'], process['id']))
        if process["name"] == "STAC Publish":
            stac_processId = process["id"]

GET https://registration-harvester-api.develop.eoepca.org/flowable-rest/service/repository/process-definitions
1  STAC Publish                 version: 1     id: stacPublish:1:201803f1-7840-11f0-b011-7ed8fc866c09


In [30]:
# Workflow input variable
variables = [
    {
        "name": "stac_catalog_source",
        "value": f"s3://{s3_access}/{s3_folder_prefix}catalog.json" # "s3://ws-eric/end2end/s5p-bp-stac-catalog/catalog.json"
    },
    {
        "name": "s3_endpoint_url",
        "value": s3_endpoint
    },
    {
        "name": "s3_access_key",
        "value": s3_access
    },
    {
        "name": "s3_secret_key",
        "value": s3_secret 
    }
]
print(json.dumps(variables, indent=4))

# Create HTTP request to start the workflow
body = {}
body["processDefinitionId"] = stac_processId # "stacPublish:1:201803f1-7840-11f0-b011-7ed8fc866c09"
body["variables"] = variables
response = flowable_session.post(url=f"{flowable_base_url}/service/runtime/process-instances", json=body)
print(response.status_code)
print(f'Created process instance at {response.json()["url"]}')


[
    {
        "name": "stac_catalog_source",
        "value": "s3://ws-eric/end2end/s5p-no2-stac-catalog/catalog.json"
    },
    {
        "name": "s3_endpoint_url",
        "value": "https://minio.develop.eoepca.org"
    },
    {
        "name": "s3_access_key",
        "value": "ws-eric"
    },
    {
        "name": "s3_secret_key",
        "value": "aQMtPaEaDgbMeTbILmd2ruXWkO2FRdOVSGgFWtinGALCaCz0rEaiZHgrVx27DHeC"
    }
]
201
Created process instance at https://registration-harvester-api.develop.eoepca.org/flowable-rest/service/runtime/process-instances/4074e5a7-9f78-11f0-9120-d2eb55003a74


Check the STAC Catalogue here: **Data Access**
https://radiantearth.github.io/stac-browser/#/external/eoapi.develop.eoepca.org/stac/collections/terrascope-s5p-l3-no2-td-v2


and here: **Data Cube Access**
https://datacube-access.develop.eoepca.org/collections

Try to access

## Naive download - From Stephaan

In [7]:
resp = requests.get(
    asset_url
)

print(resp, resp.headers, resp.content[:100], sep="\n")

<Response [200]>
{'Date': 'Mon, 08 Sep 2025 09:29:00 GMT', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Content-Length': '756', 'Keep-Alive': 'timeout=15, max=99', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=UTF-8', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'}
b'\n\t<html>\n\t\t<head>\n\t\t\t<style>\n\t\t\t\tbody {\n          margin: 0;\n        }\n\n        .container {\n       '


So that seemed to work (`200 Ok` status), but the response is actually a HTML page to log in.

## Download with appropriate auth headers

Loosely based on Terrascope docs on download with authentication.

- https://docs.terrascope.be/Developers/WebServices/TerraCatalogue/ProductDownload.html#authentication
- https://github.com/VITObelgium/notebook-samples/blob/master/terrascope-samples/Terrascope/Beginner/Terrascope_data_download.ipynb


Instead of using the discouraged OIDC Password Grant, we'll use the OIDC Device Code Grant, 
leveraging some utilities from the openeo package to do the heavy lifting.

### Get access token

In [8]:
import openeo.rest.auth.oidc

In [9]:
issuer = "https://sso.terrascope.be/auth/realms/terrascope"
client_id = "public"

provider = openeo.rest.auth.oidc.OidcProviderInfo(issuer=issuer)
client_info = openeo.rest.auth.oidc.OidcClientInfo(client_id=client_id, provider=provider)  
authenticator = openeo.rest.auth.oidc.OidcDeviceAuthenticator(client_info=client_info)

In [13]:
tokens = authenticator.get_tokens()

### Do download

With access token, try again to do the download.

Additionally we just do a partial (range request) download as we're just testing:

In [14]:
resp = requests.get(
    asset_url,
    headers={
        "Authorization": f"Bearer {tokens.access_token}",
        "Range": "bytes=0-100",
    }
)
print(resp, resp.headers, resp.content[:100], sep="\n")

<Response [206]>
{'Date': 'Mon, 08 Sep 2025 09:30:38 GMT', 'Vary': 'Authorization', 'Last-Modified': 'Sat, 12 Aug 2023 07:24:13 GMT', 'ETag': '"8046a89-602b4b754a1cb"', 'Accept-Ranges': 'bytes', 'Content-Length': '101', 'Content-Range': 'bytes 0-100/134507145', 'Keep-Alive': 'timeout=15, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'image/tiff', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'}
b'II*\x00\xc0\x00\x00\x00GDAL_STRUCTURAL_METADATA_SIZE=000140 bytes\nLAYOUT=IFDS_BEFORE_DATA\nBLOCK_ORDER=ROW_MAJOR\nBLO'


Looks like we got Tiff data here.