# Archive Copernicus STAC Assets into DASI Database

This example demonstrates how to use the STAC (SpatioTemporal Asset Catalog) API to search Sentinel-2 Level 2A data based on specific parameters; date range, cloud coverage, and geographic area (bounding box).

We define a bounding box (around Bonn) as the area of interest (AOI), 01-31 January 2025 as the date, and < 30% as the cloud cover. We then search for Sentinel-2 Level 2A data that meets these criteria.

We use the `pystac_client` Python library to interact with the STAC (SpatioTemporal Asset Catalog) API endpoint of the Copernicus Data Space Ecosystem.

## Workflow

1. Define the area of interest (AOI) as a bounding box.
2. Define the date range.
3. Define the cloud coverage.
4. Search for Sentinel-2 Level 2A data based on the defined parameters.
5. Archive the search results to DASI database.


## Requirements

- `pydasi`: DASI Python API.
- `boto3`: used to interact with S3 endpoints.
- `pystac_client`: used to interact with STAC services, reading catalog information, and searching for products.

In [1]:
!pip install pystac_client --quiet && pip install boto3 --quiet && pip install pydasi --quiet
import os

## Copernicus STAC Catalog

We initialize a STAC client and configure it to ensure that the catalog supports item search. The `add_conforms_to` method on the catalog object adds a conformance class "ITEM_SEARCH", indicating that the catalog supports item search functionality.

The Copernicus STAC catalog is available at [https://stac.dataspace.copernicus.eu/v1](https://stac.dataspace.copernicus.eu/v1).


In [2]:
from pystac_client import Client

catalog = Client.open("https://stac.dataspace.copernicus.eu/v1")
catalog.add_conforms_to("ITEM_SEARCH")

## Area of Interest (AOI)

We define the geographical area of interest (e.g., Bonn/Germany) coordinates in GeoJSON format.


In [3]:
aoi = {
    "type": "Polygon",
    "coordinates": [
        [
            [6.95, 50.85],
            [6.95, 50.65],
            [7.25, 50.65],
            [7.25, 50.85],
            [6.95, 50.85],
        ]
    ],
}

## Query STAC Catalog

We query the STAC catalog for Sentinel-2 Level 2A data based on the defined parameters below.

### Parameters

- `collections`: The STAC collection (e.g., Sentinel-2 Level 2A data).
- `datetime`: Date range.
- `intersects`: The geographical area.
- `filter`: Filters the data based on specific properties. JSON object with an operator (op) and arguments (args), which specify the filtering condition (e.g., cloud_cover < 10).
- `max_items`: The maximum number of results to retrieve.
<!-- * `sortby`: Sorts results (e.g., by the eo:cloud_cover property in ascending order). -->
- `fields`: Used to exclude geometry.


In [4]:
params = {
    "max_items": 10,
    "collections": "sentinel-2-l2a",
    "datetime": "2025-01-01/2025-01-31",
    "intersects": aoi,
    "filter": {"op": "<", "args": [{"property": "eo:cloud_cover"}, 30]},
    "fields": {"exclude": ["geometry"]},
}

results = catalog.search(**params)

print(f"Found {len(results.item_collection())} items.")

Found 4 items.


## Archive STAC Items (Metadata JSON)

In this step, we iterate over the search results and archive each STAC item along with its metadata into the DASI database.


In [None]:
from pydasi import Dasi
import urllib.request

dasi = Dasi("metadata.yaml")

for item in results.items_as_dicts():

    # reduce string size (which is limited to 32 chars)
    bbox = [round(x, 2) for x in item["bbox"]]

    # create a metadata key for the archive
    meta_key = {
        "collection": item["collection"],
        "bbox": str(bbox),
        "grid": item["properties"]["grid:code"],
        "date": item["properties"]["datetime"],
        "cloud_cover": item["properties"]["eo:cloud_cover"],
        "snow_cover": item["properties"]["eo:snow_cover"],
        "water": item["properties"]["statistics"]["water"],
        "darkarea": item["properties"]["statistics"]["dark_area"],
        "vegetation": item["properties"]["statistics"]["vegetation"],
        "cloudshadow": item["properties"]["statistics"]["cloud_shadow"],
    }

    # get the STAC item's link
    href = next(link["href"] for link in item["links"] if link["rel"] == "self")

    with urllib.request.urlopen(href) as response:
        print("Processing: %s" % href)
        feature = response.read()
        # print(len(feature))
        dasi.archive(meta_key, feature)
        print("Archived: %s" % meta_key)

dasi.flush()

=   DASI: Data Access and Storage Interface   =
Searching library 'dasi' ...
found: '/workspace/dasi/pydasi/src/backend/libs/Linux/libdasi.so.0.2.6'
Processing: https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20250118T103351_N0511_R108_T32ULB_20250118T143252
Archived: {'collection': 'sentinel-2-l2a', 'bbox': '[6.59, 50.44, 7.73, 51.44]', 'grid': 'MGRS-32ULB', 'date': '2025-01-18T10:33:51.024Z', 'cloud_cover': 20.1, 'snow_cover': 15.35, 'water': 1.402335, 'darkarea': 8.766965, 'vegetation': 12.458074, 'cloudshadow': 0.0}
Processing: https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20250118T103351_N0511_R108_T31UGS_20250118T143252
Archived: {'collection': 'sentinel-2-l2a', 'bbox': '[6.57, 50.38, 7.45, 51.38]', 'grid': 'MGRS-31UGS', 'date': '2025-01-18T10:33:51.024Z', 'cloud_cover': 25.96, 'snow_cover': 10.13, 'water': 1.646185, 'darkarea': 7.669812, 'vegetation': 12.455364, 'cloudshadow': 0.000839}
Processing: https:/

## Archive STAC Item Assets

We iterate over the search results and archive each STAC item's assets (e.g., bands) into the DASI database.

The bands we select to be used are:

- `B04_10m`: Red band
- `B03_10m`: Green band
- `B02_10m`: Blue band


### S3 API Access

We use the S3 API for high-performance and scalable access to the EO data. S3 API is a RESTful API that provides access to the Copernicus EO data stored in the cloud.

For accessing the EO Data, one must register on the [Copernicus Data Space Ecosystem](https://documentation.dataspace.copernicus.eu/APIs/S3.html#registration) and obtain the access credentials.

We set the following environment variables:

- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`.


In [None]:
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""

### Fetch S3 URLs

We use the Python library `boto3` to fetch the S3 assets. We define a function that returns the s3 objects from the EO data endpoint url.


In [7]:
import boto3

s3 = boto3.resource("s3", endpoint_url="https://eodata.dataspace.copernicus.eu")


def fetch_s3_asset(asset_href):
    bucket, key = asset_href.lstrip("s3://").split("/", 1)
    response = s3.Object(bucket, key).get()
    return response["Body"].read()

Found credentials in environment variables.


### Archive Assets into DASI Database

We fetch only the selected assets and archive into the DASI database.


In [None]:
# rgb = stack.sel(band=["SCL_20m","B02_10m", "B03_10m", "B04_10m", "B08_10m"])
bands = ["B02_10m"]

dasi = Dasi("assets.yaml")

for item in results.items():

    for asset_key, asset in item.assets.items():

        if asset_key not in bands:
            continue

        print(f"Fetching asset '{asset_key}' from item '{item.id}'")

        asset_dic = asset.to_dict()

        # create a metadata key for the archive
        asset_key = {
            "collection": item.collection_id,
            "date": item.properties["datetime"],
            "grid": item.properties["grid:code"],
            "date": item.properties["datetime"],
            "cloud_cover": item.properties["eo:cloud_cover"],
            "snow_cover": item.properties["eo:snow_cover"],
            "name": asset_dic["bands"][0]["name"],
            "gsd": asset_dic["gsd"],
            "type": asset_dic["type"].split("/")[1],
        }

        if item.bbox is not None:
            # reduce string size (which is limited to 32 chars)
            asset_key["bbox"] = [round(x, 2) for x in item.bbox]

        # fetch the asset to memory
        data = fetch_s3_asset(asset.href)

        dasi.archive(asset_key, data)
        print("Archived: %s" % asset_key)

Fetching asset 'B02_10m' from item 'S2A_MSIL2A_20250118T103351_N0511_R108_T32ULB_20250118T143252'
Archived: {'collection': 'sentinel-2-l2a', 'date': '2025-01-18T10:33:51.024Z', 'grid': 'MGRS-32ULB', 'cloud_cover': 20.1, 'snow_cover': 15.35, 'name': 'B02', 'gsd': 10, 'type': 'jp2', 'bbox': [6.59, 50.44, 7.73, 51.44]}
Fetching asset 'B02_10m' from item 'S2A_MSIL2A_20250118T103351_N0511_R108_T31UGS_20250118T143252'
Archived: {'collection': 'sentinel-2-l2a', 'date': '2025-01-18T10:33:51.024Z', 'grid': 'MGRS-31UGS', 'cloud_cover': 25.96, 'snow_cover': 10.13, 'name': 'B02', 'gsd': 10, 'type': 'jp2', 'bbox': [6.57, 50.38, 7.45, 51.38]}
Fetching asset 'B02_10m' from item 'S2B_MSIL2A_20250113T103309_N0511_R108_T32ULB_20250113T124201'
Archived: {'collection': 'sentinel-2-l2a', 'date': '2025-01-13T10:33:09.024Z', 'grid': 'MGRS-32ULB', 'cloud_cover': 5.06, 'snow_cover': 60.29, 'name': 'B02', 'gsd': 10, 'type': 'jp2', 'bbox': [6.59, 50.44, 7.73, 51.44]}
Fetching asset 'B02_10m' from item 'S2B_MSIL2