(c) European Space Agency (ESA) Licensed under ESA Software Community Licence Permissive (Type 3) – v2.4

#### **An introduction on Biomass data**
Combine Biomass data with other datasets from MAAP catalog with [maap-py](https://github.com/MAAP-Project/maap-py) (documentation and tutorials [maap-science](https://github.com/MAAP-Project/maap-documentation/tree/develop/docs/source/science) and [maap-technical](https://github.com/MAAP-Project/maap-documentation/tree/develop/docs/source/technical_tutorials)).

Users can generate and retrieve a valid token here:
👉 https://portal.maap.eo.esa.int/ini/services/auth/token/

**Prerequisities**

In [None]:
import os
import pathlib
from tqdm import tqdm
from io import BytesIO
import xml.etree.ElementTree as ET
import requests
import fsspec
from PIL import Image
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import rasterio as rio
from rasterio.windows import Window
import geopandas as gpd
from shapely.geometry import Polygon
import fiona

fiona.supported_drivers["KML"] = "rw"  # Enable kml driver
import folium
from pystac_client import Client

**ESA MAAP (STAC) Catalog API**

In [None]:
catalog_url = "https://catalog.maap.eo.esa.int/catalogue/"
catalog = Client.open(catalog_url)

In [None]:
# find all available collections in MAAP
collections = catalog.get_collections()
# sorted([c.id for c in collections])

**Filter** - allows you to search based on different metadata parameters. To understand which queryables exist you can visit: [Queryables](https://catalog.maap.eo.esa.int/catalogue/collections/BiomassLevel1aIOC/queryables). \
Examples include: 
* productType
* frame 
* processingLevel
* instrument
* orbitNumber 
...

**datetime** 2 dots (..) can be used for both start and end to indicated unbounded queries \
**bbox** is defined by the bottom left corner (longmin latmin) and the top right corner coordinates (longmax latmax) \
**max_items** limit to a few rows \

In [None]:
# find all biommass collections
results = catalog.collection_search(
    filter="platform='Biomass'",
    datetime="2025-05-01T00:00:00.000Z/..",
)
print(f"{results.matched()} collections found.")

data = results.collection_list_as_dict()
df = pd.json_normalize(data, record_path=["collections"]).sort_values(
    by="title", ignore_index=True
)
df[["id", "title", "extent.temporal.interval", "extent.spatial.bbox"]]

In [None]:
# open specific collection
search = catalog.search(
    collections=["BiomassLevel1aIOC"],
    filter="productType = 'S1_SCS__1S' and orbitDirection = 'ASCENDING'",  # and orbitDirection = 'ASCENDING'
    # bbox = [0, -20, 10, -10],
    # datetime=["2025-08-01T00:00:00Z", "2025-08-01T04:00:00Z"],
    datetime=["2025-09-14T08:00:00Z", "2025-09-14T10:00:00Z"],
    # sortby=[{"field": "datetime", "direction": "asc"}], # sort by asc datetime, not supported (01.09.2025)!
    method="GET",
    max_items=10,
)

items = list(search.items())
# items = sorted(items, key=lambda item: item.datetime)  # sort by asc datetime
print(f"Accessing {len(items)} items (limited by max_items).")
print(f"{search.matched()} items found that matched the query.")

### Results

Understanding Assets in the ESA MAAP STAC Catalog: Each granule (one Biomass acquisition per product) includes multiple **assets**, which are different files that serve distinct purposes. These assets can include preview images, scientific data, metadata, and more.

**Types of assets in a granule**:

| **Asset Name**           | **Description**                           | **File type** | **Purpose / Use**                                    |
| ------------------------ | ----------------------------------------- | ------------- | ---------------------------------------------------- |
| assets.thumbnail.href    | Preview image                             | .png          | Quick visual inspection                              |
| assets.quicklook_1.href  | Preview georeferenced image               | .kml          | Quick visual inspection                              |
| assets.enclosure_13.href | Absolute values products (hh, hv, vh, vv) | .tiff         | For load into tmp memory                             |
| assets.enclosure_14.href | Phase values products (hh, hv, vh, vv)    | .tiff         | For load into tmp memory                             |
| assets.quicklook_1.href  | Complete zipped product bundle            | .zip          | For full download (not recommended unless necessary) |

**Tips**: 

- **Want a quick look?** Use the quicklook or thumbnail to preview the data
- **Need to analyze?** Work with the enclosure_13 and ecnlosure_14 (.tiff files), cf. next cells
- **Don't need everything?** Avoid the .zip unless you really need to download all files

In [None]:
# show product properties/accessible data and metadata
search.item_collection()[-1]

In [None]:
# convert to df for easy visualisation
data = search.item_collection_as_dict()

df = pd.json_normalize(data, record_path=["features"])[
    [
        "id",  # unique id
        "properties.eopf:datatake_id",  # not unique id
        "properties.product:type",
        "properties.updated",
        "properties.sat:absolute_orbit",
        "properties.sat:orbit_state",
        "assets.thumbnail.href",
        "assets.quicklook_1.href",
        "assets.enclosure_13.href",
        "assets.enclosure_14.href",
        "assets.product.href",
    ]
]

# Renaming the assets for
df.rename(
    columns={
        "properties.eopf:datatake_id": "dt_id",
        "properties.product:type": "product_type",
        "properties.updated": "last_modified",
        "properties.sat:absolute_orbit": "abs_orbit",
        "properties.sat:orbit_state": "orbit_state",
        "assets.thumbnail.href": "quicklook",
        "assets.quicklook_1.href": "quicklook_kml",
        "assets.enclosure_13.href": "abs_product",
        "assets.enclosure_14.href": "phase_product",
        "assets.product.href": "zipped_product",
    },
    inplace=True,
)

df.sort_values(by="id", ascending=True, ignore_index=True, inplace=True)
df

**Quicklook of the data**

In [None]:
fig, axes = plt.subplots(1, len(df), figsize=(2 * len(df), 10), sharey=True)

for ax, row in zip(axes, df.iterrows()):
    response = requests.get(row[1].quicklook)
    img = Image.open(BytesIO(response.content))
    ax.imshow(img)
    ax.set_title(row[1].dt_id, fontsize=10)
    # ax.axis("off")

plt.tight_layout()
plt.show()

**Stream and plot data**
> *Generate your token from [MAAP](https://portal.maap.eo.esa.int/ini/services/auth/token/index.php) (01.09.2025 valid only for 10h!)*

In [None]:
def read_gpd_streaming(url, token):
    fs = fsspec.filesystem("https", headers={"Authorization": f"Bearer {token}"})
    with fs.open(url, "rb") as f:
        # gdf = gpd.read_file(f) # not so simple (datastrcuture not recognized by gpd)...
        kml_data = f.read()  # get file from HTTPS using token
        root = ET.fromstring(kml_data)  # parse kml
        ns = {
            "kml": "http://www.opengis.net/kml/2.2",
            "gx": "http://www.google.com/kml/ext/2.2",
        }  # define namespaces
        latlonquads = root.findall(".//gx:LatLonQuad", ns)  # get gx:LatLonQuad elements

        polygons = []
        for quad in latlonquads:
            coords_elem = quad.find("coordinates")  # find coords
            if coords_elem is None:
                continue
            coords_text = coords_elem.text.strip()
            coord_pairs = coords_text.split()
            coords = [tuple(map(float, pair.split(","))) for pair in coord_pairs]
            if len(coords) >= 4:  # should be a closed polygon
                polygons.append(Polygon(coords))

        gdf = gpd.GeoDataFrame(
            {"name": "unnamed", "geometry": polygons}, crs="EPSG:4326"
        )

        return gdf


def read_rio_streaming(url, token, subset=False):
    fs = fsspec.filesystem("https", headers={"Authorization": f"Bearer {token}"})

    with fs.open(url, "rb") as f:
        with rio.open(f) as src:
            if subset:
                window = Window(
                    col_off=subset[0],
                    row_off=subset[1],
                    width=subset[2],
                    height=subset[3],
                )
                bands = [src.read(i + 1, window=window) for i in range(src.count)]
                gcps = src.get_gcps()  # image encoded in GCP
            else:
                bands = [src.read(i + 1) for i in range(src.count)]
                gcps = src.get_gcps()  # image encoded in GCP

        return src, bands

In [None]:
# get your token
# Users can generate and retrieve the token here:
#  https://portal.maap.eo.esa.int/ini/services/auth/token/

_TOKEN = ""  # optional
if pathlib.Path("token.txt").exists():
    print("Re-using token")
    with open("token.txt", "rt") as f:
        token = f.read().strip().replace("\n", "")
else:
    token = _TOKEN

In [None]:
# sample
img_n = 5
abs_img_url = df.loc[img_n, "abs_product"]
kml_url = df.loc[img_n, "quicklook_kml"]
zip_url = df.loc[img_n, "zipped_product"]
print(abs_img_url)
print(kml_url)

In [None]:
# open kml
gdf = read_gpd_streaming(kml_url, token)

center = gdf.geometry.unary_union.centroid
m = folium.Map(location=[center.y, center.x], zoom_start=8)
folium.GeoJson(gdf).add_to(m)
m

In [None]:
# --- load a slice of the scene
subset = (0, 200, 1000, 1500)
src, bands = read_rio_streaming(abs_img_url, token, subset=None)

In [None]:
# --- plot the images
fig, axs = plt.subplots(4,1, figsize=(30,10))
polarisations = ['HH', 'HV', 'VH', 'VV']
for ((i, ax), p) in zip(enumerate(axs.flatten()), polarisations):
    ax.imshow(np.rot90(bands[i]), vmin=0, vmax=2 * np.nanmean(bands[i]), cmap="gray", aspect='equal', interpolation='auto', )
    ax.set_title(p)
    ax.set_xticks([])
    ax.set_yticks([])

plt.tight_layout(pad=1.0, w_pad=0.5, h_pad=1.0)
plt.show()

**Download data**

In [None]:
def download_file_with_bearer_token(url, token, folder_path, disable_bar=False):
    try:
        headers = {"Authorization": f"Bearer {token}"}
        response = requests.get(url, headers=headers, stream=True)
        response.raise_for_status()  # Raise an exception for bad status codes
        file_size = int(response.headers.get("content-length", 0))

        chunk_size = 8 * 1024 * 1024  # Byes - 1MiB
        file_path = (
            url.rsplit("/", 1)[-1]
            if "." in url.rsplit("/", 1)[-1]
            else url.rsplit("/", 1)[-1] + ".zip"
        )
        os.makedirs(folder_path, exist_ok=True)
        file_path = folder_path + file_path
        with open(file_path, "wb") as f, tqdm(
            desc=file_path,
            total=file_size,
            unit="iB",
            unit_scale=True,
            unit_divisor=1024,
            disable=disable_bar,
        ) as bar:
            for chunk in response.iter_content(chunk_size=chunk_size):
                read_size = f.write(chunk)
                bar.update(read_size)

        if disable_bar:
            print(f"File downloaded successfully to {file_path}")

    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")

In [None]:
folder_path = "maap-data/"
download_file_with_bearer_token(abs_img_url, token, folder_path)
download_file_with_bearer_token(kml_url, token, folder_path)
download_file_with_bearer_token(zip_url, token, folder_path)