# EOEPCA+ Use Case: NO2 Tropospheric Content Cloud Filtering - Register Input Data

![end2end_workflow](img/end2end_workflow.png)

`micromamba create -n eoepca_end2end -c conda-forge pystac pystac-client odc odc-stac openeo xarray rioxarray rasterio geopandas pyproj numpy folium shapely pyjwt pip jupyterlab`

In [1]:
from pystac_client import Client
from odc.stac import stac_load
from pystac import Item, Collection, Catalog, Extent, SpatialExtent, TemporalExtent, Asset
from datetime import datetime
import numpy as np
import pandas as pd
import openeo
import certifi

# for workspace management
import requests
#import jwt
import time
#import boto3
import os

## Cloud Fraction

Get cloud fraction data from the DLR GeoService STAC API.
- S5P Cloud Fraction Inpuls L3: EOC Geoservice Sentinel-5P TROPOMI L3 Daily Composites - Cloud Fraction (CF)
- https://geoservice.dlr.de/eoc/ogc/stac/v1/collections/S5P_TROPOMI_L3_P1D_CF

Use Case: Regional DataCube: We are pulling STAC metadata from a collection containing all items with global coverage. We are extracting the spatial bbox we are intereseted in and provide this to the stac catalog/users. When users interact with the collection it is still fetched from the original source. But the extents are adapted.

### Request

In [2]:
# Handle Geoservice Certificates - et SSL-Cert-Path via Env Var
os.environ['REQUESTS_CA_BUNDLE'] = certifi.where()
os.environ['CURL_CA_BUNDLE'] = certifi.where()

In [3]:
url = "https://geoservice.dlr.de/eoc/ogc/stac/v1/"
catalog = Client.open('https://geoservice.dlr.de/eoc/ogc/stac/v1/')

In [4]:
collection_id = "S5P_TROPOMI_L3_P1D_CF" # "S5P_TROPOMI_L3_P1D_CF_v2" --> full history, new collection
bbox = [-10.0, 35.0, 30.0, 70.0]  # Europe
date_time = "2023-08-01T00:00:00Z/2023-12-31T23:59:59Z"

In [5]:
search = catalog.search(
    collections=[collection_id],
    bbox=bbox,
    datetime=date_time,
    limit=400  # adjust as needed
)

In [6]:
items = list(search.items())

### Check Data
Load the data and check that it's valid.

In [7]:
ds = stac_load(
    items,
    bands=["cf"], 
    crs="EPSG:4326",
    resolution=0.1,
    bbox=bbox,
    chunks={"time": 1} 
)


In [8]:
ds

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.71 MiB 546.88 kiB Shape (153, 350, 400) (1, 350, 400) Dask graph 153 chunks in 3 graph layers Data type float32 numpy.ndarray",400  350  153,

Unnamed: 0,Array,Chunk
Bytes,81.71 MiB,546.88 kiB
Shape,"(153, 350, 400)","(1, 350, 400)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [9]:
monthly_mean_cf = ds['cf'].groupby('time.month').median(dim='time')

In [None]:
%%time
monthly_mean_cf.plot(col="month",
    col_wrap=3,
    cmap="viridis",
    vmin=0,
    vmax=1,
    figsize=(12, 6),
    cbar_kwargs={"label": "Cloud Fraction"})

### Register to EOEPCA via Registration BB

For registration in EOEPCA create a catalogue and collection from these items.

In [10]:
print(len(items))
print(items[0])
print(items[-1])

153
<Item id=S5P_DLR_NRTI_01_040201_L3_CF_20231231>
<Item id=S5P_DLR_NRTI_01_040100_L3_CF_20230801>


In [11]:
items[0].stac_extensions # no data cube extension for example

['https://stac-extensions.github.io/eo/v1.1.0/schema.json',
 'https://stac-extensions.github.io/view/v1.0.0/schema.json',
 'https://stac-extensions.github.io/projection/v1.0.0/schema.json',
 'https://stac-extensions.github.io/processing/v1.0.0/schema.json',
 'https://stac-extensions.github.io/scientific/v1.0.0/schema.json']

**DataCubeAccess BB:** Adapt collection and items to stac data cube access best practices, add them to the collection and save. Best Practices [here](https://github.com/EOEPCA/datacube-access/blob/main/best_practices/stac_best_practices.md).

Set a local folder for saving and name of the catalog.

In [12]:
local_folder = 's5p-bp-stac-catalog-v3'

In [13]:
import pystac
from pystac.extensions.eo import EOExtension, Band
from pystac.extensions.projection import ProjectionExtension
from pystac.extensions.datacube import SpatialDimension, TemporalDimension, Dimension, DatacubeExtension
from pystac.extensions.raster import RasterExtension
from pystac.extensions.storage import StorageExtension
from pystac.extensions.scientific import ScientificExtension
from pystac import Item, Asset, Collection, Catalog, Extent, SpatialExtent, TemporalExtent, Summaries
from pystac.extensions.raster import RasterExtension, RasterBand
import xarray as xr
import rioxarray
import pandas as pd
from datetime import timezone
import rasterio
from shapely.geometry import Polygon, mapping

# Comment out auth extension to fix OpenEO compatibility issues
stac_extensions_catalog = [
        # "https://stac-extensions.github.io/authentication/v1.1.0/schema.json"  # Commented out for OpenEO
]

stac_extensions_collection = [
    "https://stac-extensions.github.io/item-assets/v1.0.0/schema.json",
    "https://stac-extensions.github.io/eo/v1.1.0/schema.json",
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/raster/v1.1.0/schema.json",
    # "https://stac-extensions.github.io/storage/v2.0.0/schema.json"  # Commented out for OpenEO
]

stac_extensions_item = [
    "https://stac-extensions.github.io/eo/v1.1.0/schema.json",
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/raster/v1.1.0/schema.json",
    "https://stac-extensions.github.io/storage/v2.0.0/schema.json",
    "https://stac-extensions.github.io/sentinel-5p/v0.2.0/schema.json"
]

items_list = list(items)

catalog = pystac.Catalog(id="s5p-bp-root-v3",
                         description="Root catalog for S5P Cloud Fraction STAC data.",
                         title="S5P Cloud Fraction STAC Catalog v3",
                         stac_extensions=stac_extensions_catalog
                         )

# Comment out auth schemes to fix OpenEO compatibility
# catalog.extra_fields["auth:schemes"] = {
#     "oauth": {
#         "type": "oauth2",
#         "description": "requires a login and user token",
#         "flows": {
#             "authorizationCode": {
#                 "authorizationUrl": "https://example.com/oauth/authorize",
#                 "tokenUrl": "https://example.com/oauth/token",
#                 "scopes": {
#                     "read:example": "Read the example data",
#                     "write:example": "Write the example data",
#                     "admin:example": "Read/write/delete the example data"
#                 }
#             }
#         }
#     }
# }

def get_bbox_and_footprint(ds: xr.Dataset) -> tuple:
    minx, miny, maxx, maxy = ds.rio.bounds()
    bbox = [minx, miny, maxx, maxy]
    footprint = Polygon([
        [minx, miny],
        [minx, maxy],
        [maxx, maxy],
        [maxx, miny],
    ])
    return bbox, mapping(footprint)

bbox, footprint = get_bbox_and_footprint(ds)

collection = Collection(
    id="s5p-bp-cf-v3",
    title="Sentinel-5P Cloud Fraction Collection v3",
    description="Sentinel-5P Cloud Fraction L3 data (Aug-Dec 2023)",
    stac_extensions=stac_extensions_collection,
    license="proprietary",
    extent=Extent(
        spatial=SpatialExtent(bbox),
        temporal=TemporalExtent(intervals=[(                                                        #type: ignore
            pd.to_datetime(ds.time.min().values).to_pydatetime().replace(tzinfo=timezone.utc),
            pd.to_datetime(ds.time.max().values).to_pydatetime().replace(tzinfo=timezone.utc)
        )])
    )
)

collection.add_asset(
    "thumbnail",
    Asset(
        href="https://geoservice.dlr.de/catalogue/srv/api/records/587a076a-0cb6-494b-8f87-3fc8cfea8b22/attachments/atmosphere-S5P_TROPOMI_L3_P1D_CF_ql_s.jpg",
        media_type="image/png",
        title="Cloud Fraction Thumbnail",
        roles=["thumbnail"]
    )
)

# Comment out auth schemes to fix OpenEO compatibility
# auth = {
#         "auth:schemes": {
#             "oauth": {
#                 "type": "oauth2",
#                 "description": "requires a login and user token",
#                 "flows": {
#                     "authorizationCode": {
#             "authorizationUrl": "https://example.com/oauth/authorize",
#             "tokenUrl": "https://example.com/oauth/token",
#             "scopes": {
#                 "read:example": "Read the example data",
#                 "write:example": "Write the example data",
#                 "admin:example": "Read/write/delete the example data"
#                     }
#                 }
#             }
#         }
#     }
# }

# collection.extra_fields.update(auth)  # Commented out for OpenEO compatibility

collection.summaries = Summaries(
    summaries={
        "datetime": {
            "min": pd.to_datetime(ds.time.min().values).isoformat(),
            "max": pd.to_datetime(ds.time.max().values).isoformat()
        },
        "platform": [items_list[0].properties.get("platform")],
        "constellation": [items_list[0].properties.get("constellation")],
        "instruments": [items_list[0].properties.get("instruments")],
        "eo:cloud_cover": [items_list[0].properties.get("eo:cloud_cover")],
        "eo:bands": [Band.create(
            name="cf",
            description="Radiometric cloud fraction (Sentinel-5P)",
            common_name="cloud_fraction"
        ).to_dict()],
        "proj:epsg": [int(ds.cf.rio.crs.to_epsg())],
        "proj:shape": list(ds.cf.rio.shape),
        "proj:transform": list(ds.cf.rio.transform())  
    }
)

# Comment out storage schemes to fix OpenEO compatibility  
# collection.extra_fields["storage:schemes"] = {
#     "aws": {
#       "type": "aws-s3",
#       "platform": "https://{bucket}.s3.{region}.amazonaws.com",
#       "bucket": "mybucket",
#       "region": "us-west-2",
#       "requester_pays": True,
#       "tier": "Standard"
#     }
# }

collection.extra_fields["item_assets"] = {
    "cf": {
        "type": "image/tiff; application=geotiff",
        "title": "Cloud Fraction",
        "description": "Radiometric cloud fraction (Sentinel-5P)",
        "roles": ["data"]
    }
}

start_time = pd.to_datetime(ds.time.min().values).to_pydatetime().replace(tzinfo=timezone.utc)
end_time = pd.to_datetime(ds.time.max().values).to_pydatetime().replace(tzinfo=timezone.utc)
temporal_extent = [
    (start_time.isoformat(), end_time.isoformat())
]

# Datacube extension on collection
datacube_ext = {"cube:dimensions": {
    "x": {
      "type": "spatial",
      "axis": "x",
      "extent": [bbox[0], bbox[2]],
      "reference_system": 4326
    },
    "y": {
      "type": "spatial",
      "axis": "y",
      "extent": [bbox[1], bbox[3]],
      "reference_system": 4326
    },
    "time": {
      "type": "temporal",
      "extent": temporal_extent,
      "step": "P5D"
    },
    "band": {
        "type": "bands",
        "values": ["cf"]
    }
  },
}
collection.extra_fields.update(datacube_ext)

new_items = []
for old_item in items:
    new_item = Item(
        id=old_item.id,
        geometry=old_item.geometry,
        bbox=old_item.bbox,
        datetime=old_item.datetime,
        properties={
            k: v
            for k, v in old_item.properties.items()
            if k not in ["eo:bands", "proj:code"]
        },
    )

    for key, asset in old_item.assets.items():
        if key == "cf":
            new_asset = Asset(
                href=asset.href,
                media_type=asset.media_type,
                roles=asset.roles,
                title=asset.title,
                description=asset.description,
            )
            new_item.add_asset("cf", new_asset)

            # Attach projection extension correctly
            proj = ProjectionExtension.ext(new_asset, add_if_missing=True)            
            # Fix the projection extension to use correct field names for ODC STAC compatibility
            # Convert 'EPSG:4326' format to numeric EPSG code
            proj_code = asset.extra_fields.get('proj:code', '')
            if proj_code.startswith('EPSG:'):
                proj.epsg = int(proj_code.split(':')[1])
            
            proj.shape = asset.extra_fields.get('proj:shape')
            proj.transform = asset.extra_fields.get('proj:transform')

    # Normalize other properties
    if new_item.properties.get("license") == "CC-BY 4.0":
        new_item.properties["license"] = "CC-BY-4.0"
    if isinstance(new_item.properties.get("instruments"), str):
        new_item.properties["instruments"] = [new_item.properties["instruments"]]

    new_item.properties["updated"] = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ")

    collection.add_item(new_item)
    new_items.append(new_item)

catalog.add_child(collection)

catalog.normalize_hrefs("./s5p-bp-stac-catalog-v3")

catalog.normalize_and_save(local_folder, catalog_type="SELF_CONTAINED") # type: ignore
print(f"Catalog saved to: {local_folder}")
print(f"Number of items in collection: {len(list(collection.get_all_items()))}")


Catalog saved to: s5p-bp-stac-catalog-v3
Number of items in collection: 153


## Validate the saved catalog
Before uploading our files we validate the collection and each individual item.

In [16]:
# Look at stac node validator js package
catalog = Catalog.from_file(f"{local_folder}/catalog.json")
catalog.validate()

collection = catalog.get_child("s5p-bp-cf-v3")

print(collection.validate()) # type: ignore

items = list(collection.get_all_items()) # type: ignore

for item in items:
    item.validate()

['https://schemas.stacspec.org/v1.0.0/collection-spec/json-schema/collection.json', 'https://stac-extensions.github.io/item-assets/v1.0.0/schema.json', 'https://stac-extensions.github.io/eo/v1.1.0/schema.json', 'https://stac-extensions.github.io/projection/v2.0.0/schema.json', 'https://stac-extensions.github.io/raster/v1.1.0/schema.json']


## Delete old collection

To be on the safe side, we delete the old collection before uploading the new one. While this step may not always be necessary, it helps to avoid potential issues with updating existing collections.

This part has been adapted from `04 Data Access` notebook.

In [17]:
import requests

realm = "eoepca"
base_domain = "develop.eoepca.org"
keycloak_endpoint = f"https://iam-auth.{base_domain}"
stac_endpoint = f"https://eoapi.{base_domain}/stac"
token_endpoint = f"{keycloak_endpoint}/realms/{realm}/protocol/openid-connect/token"
print(token_endpoint)

for collection in requests.get(f"{stac_endpoint}/collections").json()['collections']:
    print(collection['id'])

def iam_token(username, password):
    headers = {
        "Cache-Control": "no-cache",
        "Content-Type": "application/x-www-form-urlencoded"
    }
    data = {
        "scope": "roles",
        "grant_type": "password",
        "username": username,
        "password": password,
        "client_id": "demo",
        "client_secret": "demo"
    }
    response = requests.post(token_endpoint, headers=headers, data=data)
    if response.ok:
        return response.json()["access_token"]
    else:
        print(response)
        return None

token_eric = iam_token("eric", "changeme")
response = requests.get(
    stac_endpoint + "/collections/ws-eric.naip",
    headers={"Authorization": f"Bearer {token_eric}"},
    timeout=5,
    allow_redirects=False
)

token_admin = iam_token("example-admin", "changeme")

collections_to_delete = ["s5p-bp-cf-v3"]
for collection in collections_to_delete:
    response = requests.delete(f"{stac_endpoint}/collections/{collection}", headers={"Authorization": f"Bearer {token_admin}"})
    if response.status_code > 204:
        print(f"Error deleting collection {collection}: {response.status_code} - {response.text}")
        continue
    else:
        print(f"({response.status_code}) Collection {collection} deleted successfully.")

https://iam-auth.develop.eoepca.org/realms/eoepca/protocol/openid-connect/token
landsat-8-l1
noaa-emergency-response
s5p-bp-cf-v3
s5p-bp-cloud-fraction-2023-aug-dec
terrascope-s5p-l3-no2-td-v2
(200) Collection s5p-bp-cf-v3 deleted successfully.
(200) Collection s5p-bp-cf-v3 deleted successfully.


## Workspace Building Block

The Workspace BB is used to upload the json files to our workspace `eric`

Definitions for workspace

In [None]:
# get this info from here: https://workspace-api.develop.eoepca.org/workspaces/ws-eric, can also be retrieved programatically by following example "05 Workspace Management"
owner = "eric"
password = "changeme"
ws_name = "ws-eric"

realm = "eoepca"
base_domain = "develop.eoepca.org" #"apx.develop.eoepca.org"
keycloak_endpoint = f"https://iam-auth.{base_domain}"
workspace_api_endpoint = f'https://workspace-api.{base_domain}/workspaces'
token_endpoint = f"{keycloak_endpoint}/realms/{realm}/protocol/openid-connect/token"
minio_endpoint = "https://minio.develop.eoepca.org"

Define functions to interact with workspace.

In [26]:
def iam_token(username, password):
    headers = {
        "Cache-Control": "no-cache",
        "Content-Type": "application/x-www-form-urlencoded"
    }
    data = {
        "scope": "roles",
        "grant_type": "password",
        "username": username,
        "password": password,
        "client_id": "demo",
        "client_secret": "demo"
    }    
    response = requests.post(token_endpoint, headers=headers, data=data)
    if response.ok:
        return response.json()["access_token"]
    else:
        print(response)
        return None

def access_ws(ws_name, token):
    headers = {
        'Authorization': 'bearer ' + token
    }
    url = f"{workspace_api_endpoint}/{ws_name}"
    print(f"HTTP GET {url}")
    response = requests.get(url, headers=headers)
    print(response)
    return response

Prepare workspace

In [27]:
while True:
    response = access_ws(ws_name, iam_token(owner, password))
    if response.status_code == 200:
        try:
            workspace_data = response.json()
            print(workspace_data.get("status"))
            if workspace_data.get("status") == "ready":
                break
        except ValueError:
            print("not ready yet")

    print("...")
    time.sleep(20)    

<Response [401]>


TypeError: can only concatenate str (not "NoneType") to str

Retrieve relevant information sor storage.

In [None]:
#jwt.decode(iam_token(owner, password), options={"verify_signature": False})

In [28]:
response1 = access_ws(ws_name, iam_token(owner, password))
assert response1.status_code == 200 # this sometimes gives 401, upon next execution 200 again.
print(f"‚úÖworkspace ownership checked and can retrieve workspace details")

<Response [401]>


TypeError: can only concatenate str (not "NoneType") to str

In [None]:
bucket_name = workspace_data["storage"]["credentials"]["bucketname"]
s3_access = workspace_data["storage"]["credentials"]["access"]
s3_secret = workspace_data["storage"]["credentials"]["secret"]
s3_endpoint = workspace_data["storage"]["credentials"]["endpoint"]

Connect to s3.

In [None]:
import boto3
session = boto3.session.Session()
s3resource = session.resource('s3', aws_access_key_id=s3_access, aws_secret_access_key=s3_secret, endpoint_url=minio_endpoint)

In [None]:
s3_folder_prefix = f"end2end/{local_folder}/"

Recursively upload all files (the catalog, collection and items)

In [None]:
for root, dirs, files in os.walk(local_folder):
    for file in files:
        local_path = os.path.join(root, file)
        relative_path = os.path.relpath(local_path, local_folder)
        s3_key = os.path.join(s3_folder_prefix, relative_path).replace("\\", "/")

        try:
            object = s3resource.Object(bucket_name, s3_key)
            with open(local_path, 'rb') as data:
                result = object.put(Body=data)

            res = result.get('ResponseMetadata')
            if res.get('HTTPStatusCode') == 200:
                print(f"‚úÖ Uploaded: {s3_key}")
            else:
                print(f"‚ùå Failed: {s3_key}")
        except Exception as e:
            print(f"üö® Error uploading {s3_key}: {e}")

Check that files are available here: https://ws-eric.develop.eoepca.org/files/ws-eric/end2end/

## Registration Building Block 

The Registration BB - Harvester adds the catalogue to EOEPCA STAC API

In [None]:
#  https://github.com/EOEPCA/demo/blob/main/demoroot/notebooks/06%20Resource%20Registration%20Harvester.ipynb
from requests import Session
from requests.auth import HTTPBasicAuth
import json

# Setup connection to Flowable API
flowable_base_url = "https://registration-harvester-api.develop.eoepca.org/flowable-rest"
flowable_rest_user = "eoepca"
flowable_rest_pw = "eoepca"
flowable_session = Session()
flowable_session.auth = HTTPBasicAuth(flowable_rest_user, flowable_rest_pw)


In [None]:
url = f"{flowable_base_url}/service/repository/process-definitions"
print(f"GET {url}")
response = flowable_session.get(url)
processes = response.json()["data"]
if len(processes) == 0:
    print("No workflow definitions")
else:
    for idx, process in enumerate(processes, 1):
        print("%-2s %-28s version: %-5s id: %s" % (idx, process['name'], process['version'], process['id']))
        if process["name"] == "STAC Publish":
            stac_processId = process["id"]

In [None]:
# Workflow input variable
s3_folder_prefix = f"end2end/{local_folder}/"

s3_access = "eric"
s3_secret_key = "<SECRET_KEY>"
variables = [
    {
        "name": "stac_catalog_source",
        "value": f"s3://ws-eric/{s3_folder_prefix}catalog.json" # "s3://ws-eric/end2end/s5p-bp-stac-catalog/catalog.json"
    },
    {
        "name": "s3_endpoint_url",
        "value": s3_endpoint
    },
    {
        "name": "s3_access_key",
        "value": s3_access
    },
    {
        "name": "s3_secret_key",
        "value": s3_secret 
    }
]
print(json.dumps(variables, indent=4))

# Create HTTP request to start the workflow
body = {}
body["processDefinitionId"] = stac_processId # "stacPublish:1:201803f1-7840-11f0-b011-7ed8fc866c09"
body["variables"] = variables
response = flowable_session.post(url=f"{flowable_base_url}/service/runtime/process-instances", json=body)
print(response.status_code)
print(f'Created process instance at {response.json()["url"]}')


Check the STAC Catalogue here: **Data Access**
https://radiantearth.github.io/stac-browser/#/external/eoapi.develop.eoepca.org/stac/collections/s5p-bp-cloud-fraction-2023-aug-dec


and here: **Data Cube Access**
https://datacube-access.develop.eoepca.org/collections

**Potential To Do Registration BB:** Replicate workflow with [eodm](https://github.com/geopython/eodm).

- As long as the corrected STAC Items and Collection are in memory, they can be registered using eodm [`load_stac_api_collections()`](stactools-sentinel2/examples/s2_dateline at s2_dateline ¬∑ DLR-terrabyte/stactools-sentinel2) and [`load_stac_api_items()`](https://github.com/geopython/eodm/blob/main/src/eodm/load.py#L9)
- The target should be the URL of the EOEPCA STAC API **--> Which one would that be currently?**
- This would be a shortcut by not storing the jsons and not using the Registration BB Harvester.

In [None]:
# https://github.com/geopython/eodm

## Loading the collection with BP applied with ODC STAC

Using ODC STAC to load the data again seems to be working without errors

In [1]:
url = "https://eoapi.develop.eoepca.org/stac"
catalog = Client.open(url)
#https://eoapi.develop.eoepca.org/stac/collections/s5p-bp-cf-v3
collection_id = "s5p-bp-cf-v3"
bbox = [-10.0, 35.0, 30.0, 70.0]  # Europe
date_time = "2023-08-01T00:00:00Z/2023-12-31T23:59:59Z"

search = catalog.search(
    collections=[collection_id],
    bbox=bbox,
    datetime=date_time,
    limit=400  
)

items = list(search.items())

ds = stac_load(
    items,
    #bands=["CF"], 
    crs="EPSG:4326",
    resolution=0.1,
    bbox=bbox,
    chunks={"time": 1} 
)

monthly_mean_cf = ds['cf'].groupby('time.month').median(dim='time')


monthly_mean_cf

NameError: name 'Client' is not defined

## Processing Building Block

 First tests with load_stac() on CDSE and EODC backends.

Connect to openEO backend. ToDo: Connect to dev version on EOEPCA+

In [None]:
connection = openeo.connect('https://openeo.dataspace.copernicus.eu/').authenticate_oidc()

In [None]:
url = "https://eoapi.develop.eoepca.org/stac/collections/s5p-bp-cf-v3"
# Larger N√ºrnberg area for the month of september
cube = connection.load_stac(
    url=url,
    spatial_extent = {
        "west": 10.940,
        "south": 49.340,
        "east": 11.300,
        "north": 49.550,
    },
    temporal_extent=["2023-09-01", "2023-10-01"],
    
)

cube_mnth = cube.reduce_dimension(dimension="t", reducer="mean")
res = cube_mnth.download("s5p_cf_mean.tiff", format="GTiff")

In [None]:
connection = openeo.connect('https://openeo.eodc.eu/openeo/1.2.0').authenticate_oidc()

#original = "https://geoservice.dlr.de/eoc/ogc/stac/v1/collections/S5P_TROPOMI_L3_P1D_CF"
url = "https://eoapi.develop.eoepca.org/stac/collections/s5p-bp-cf-v3"

# Note: auth extensions have been removed from the collection to fix OpenEO compatibility
cube = connection.load_stac(
    url=url,
    spatial_extent = {
        "west": 10.940,
        "south": 49.340,
        "east": 11.300,
        "north": 49.550,
    },
    temporal_extent=["2023-09-01", "2023-10-01"],
    bands=["cf"]  # Enable bands parameter
)

cube.execute()
cube

In [None]:
cube_mnth = cube.reduce_dimension(dimension="t", reducer="mean")

In [None]:
cube_mnth

In [None]:
# Add proj shape, transform, and epsg to the assets as well
save = cube_mnth.save_result(format = "GTIFF")
job = save.create_job()
job.start_and_wait()
# res = cube_mnth.execute()

In [None]:
res = cube_mnth.download("s5p_cf_mean.tiff", format="GTiff")

## Tropospheric NO2 - Terrascope STAC API

**To Do:** Evaluate whether it makes sense to follow Terrascope STAC API approach or if openEO makes more sense.

Get Tropospheric NO2 Data from a publicly available STAC API: S5P NO2 Troposphere L2: Sentinel-5P Nitrogen Dioxide tropospheric column

CDSE: not well filled for NO2
- Offline: https://browser.stac.dataspace.copernicus.eu/collections/sentinel-5p-l2-no2-offl
- Near Real Time: https://browser.stac.dataspace.copernicus.eu/collections/sentinel-5p-l2-no2-nrti?.language=de

Terrascope: need special credentials
- https://services.terrascope.be/stac/collections/urn:eop:VITO:TERRASCOPE_S5P_L3_NO2_TD_V1/items
- https://docs.terrascope.be/Developers/WebServices/TerraCatalogue/STACAPI.html
- https://docs.terrascope.be/Developers/WebServices/TerraCatalogue/ProductDownload.html#authentication

Request

In [None]:
#url = "https://stac.dataspace.copernicus.eu/v1"
url = "https://services.terrascope.be/stac/"
catalog = Client.open(url)

In [None]:
#collection_id = "sentinel-5p-l2-no2-offl"
collection_id = "urn:eop:VITO:TERRASCOPE_S5P_L3_NO2_TD_V2"

In [None]:
search = catalog.search(
    collections=[collection_id],
    bbox=bbox,
    datetime=date_time,
    #limit=1000 # adjust as needed
)

In [None]:
items_no2 = list(search.items())

In [None]:
print(len(items_no2))
print(items_no2[0])
print(items_no2[-1])

In [None]:
items_no2[0]

Check data

In [None]:
ds_no2 = stac_load(
    items_no2,
    #bands=["NO2"], 
    crs="EPSG:4326",
    resolution=0.1,
    bbox=bbox,
    chunks={"time": 1}  # Enable Dask chunking
)

In [None]:
ds_no2 # lazy

In [None]:
monthly_mean_no2 = ds_no2['NO2'].groupby('time.month').median(dim='time') #lazy

In [None]:
monthly_mean_no2 # lazy

To actually access data authentication is needed. **This is probably not the right way to get data from terrascope (ideally it would be analog to the example above).**

In [None]:
import requests
import xarray as xr
import rioxarray
from rasterio.io import MemoryFile

def get_terrascope_token(username: str, password: str) -> str:
    url = "https://sso.terrascope.be/auth/realms/terrascope/protocol/openid-connect/token"
    data = {
        "grant_type": "password",
        "client_id": "public",
        "username": username,
        "password": password
    }
    response = requests.post(url, data=data)
    response.raise_for_status()
    return response.json()["access_token"]

def load_no2_from_items(items, token, asset_key="NO2"):
    """Takes a list of STAC items and loads the NO2 band from each into a time-stacked xarray DataArray."""
    datasets = []
    for item in items:
        try:
            url = item.assets[asset_key].href
            headers = {"Authorization": f"Bearer {token}"}
            r = requests.get(url, headers=headers)
            r.raise_for_status()

            with MemoryFile(r.content) as memfile:
                with memfile.open() as dataset:
                    da = rioxarray.open_rasterio(dataset).squeeze("band", drop=True)
                    da = da.rio.write_crs("EPSG:4326")
                    da = da.expand_dims(time=[item.datetime])
                    datasets.append(da)
        except Exception as e:
            print(f"Failed to load {item.id}: {e}")

    if datasets:
        return xr.concat(datasets, dim="time").sortby("time")
    else:
        print("No valid datasets loaded.")
        return None

In [None]:
import getpass

username = "peter.zellner"
password = getpass.getpass("Terrascope password: ")

token = get_terrascope_token(username, password)

Trying to simulate how the data access would look like after registering the STAC Metadata via the Registration BB...

In [None]:
no2_data = load_no2_from_items(items_no2, token)

if no2_data is not None:
    print(no2_data)
    no2_data.mean(dim="time").plot(cmap="viridis", robust=True)

## Tropospheric NO2 - CDSE aggregator openEO

- CDSE openEO aggregator with terrascope
- https://openeofed.dataspace.copernicus.eu/

In [None]:
# Option A: Save files to eopca workspace, adapt asset path in STAC
# Option B: Register files with original href -> Authentication at access?? -> Don't get the original terrascope STAC Items from openEO

In [None]:
import openeo
connection = openeo.connect("openeofed.dataspace.copernicus.eu").authenticate_oidc()

Using openEO the data has to be retrieved/downloaded directly. STAC items are created for the results.

In [None]:
bbox

In [None]:
%%time
load = connection.load_collection(collection_id = "TERRASCOPE_S5P_L3_NO2_TD", 
                                  spatial_extent = {"west": bbox[0], "east": bbox[1], "south": bbox[2], "north": bbox[3]}, 
                                  temporal_extent = ["2023-08-01T00:00:00Z", "2023-12-31T00:00:00Z"], 
                                  bands = ["NO2"])
save = load.save_result(format = "GTIFF")

job = save.create_job()
job.start_and_wait()

# The process can be executed synchronously (see below), as batch job or as web service now
#result = connection.execute(save2)

These files could be downloaded and stored alongside with the created STAC metadata for the registration BB. Probably there's a more elegant solution...

In [None]:
job.get_results()

In [None]:
%%time
job.get_results().download_files("output")