<img src="https://raw.githubusercontent.com/Harmonize-Brazil/code-gallery/main/img/harmonize_logo.png" align="right" width="64"/>

# <span style="color:#336699">HARMONIZE drone image collections access using the SpatioTemporal Asset Catalog (STAC)</span>
<hr style="border:2px solid #0077b9;">


<br/>

<div style="text-align: center;font-size: 90%;">
    Marcos L. Rodrigues<sup><img src="https://orcid.filecamp.com/static/thumbs/folders/qLJ1tuei4m6ugC3g.png" width="16"/><a href="https://orcid.org/0000-0002-9199-6928"> https://orcid.org/0000-0002-9199-6928</a></sup>
    <br/><br/>
    Earth Observation and Geoinformatics Division, National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact: <a href="mailto:marcos.rodrigues@inpe.br">marcos.rodrigues@inpe.br</a>
    <br/><br/>
    Last Update: June 12, 2025
</div>

<br/>

<div style="text-align: justify;  margin-left: 25%; margin-right: 25%;">
<b>Abstract.</b> This Jupyter Notebook gives an overview on how to use the STAC service to discover and access the drone data products<br/> from the Earth Observation Data Cubes tuned for Health Response Systems (EODCtHRS) a <a href="https://harmonize-tools.org" target="_blank">HARMONIZE project</a> component.
</div>
</div>

# Introduction

Adapted from  <a href="https://github.com/brazil-data-cube/code-gallery/blob/master/jupyter/Python/stac/stac-introduction.ipynb" target="_blank">Introduction to the SpatioTemporal Asset Catalog (STAC)</a> available at Github code gallery of the <a href="https://data.inpe.br/bdc/web" target="_blank">Brazil Data Cube (BDC)</a> project.

<hr style="border:1px solid #0077b9;">

The [**S**patio**T**emporal **A**sset **C**atalog (STAC)](https://stacspec.org/) is a specification created through the colaboration of several organizations intended to increase satellite image search interoperability.

The diagram depicted in the picture contains the most important concepts behind the STAC data model:

<center>
<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-concept.png" width="480" />
<br/>
<b>Figure 1</b> - STAC model.
</center>

The description of the concepts below are adapted from the [STAC Specification](https://github.com/radiantearth/stac-spec):

- **Item**: a `STAC Item` is the atomic unit of metadata in STAC, providing links to the actual `assets` (including thumbnails) that they represent. It is a `GeoJSON Feature` with additional fields for things like time, links to related entities and mainly to the assets. According to the specification, this is the atomic unit that describes the data to be discovered in a `STAC Catalog` or `Collection`.

- **Asset**: a `spatiotemporal asset` is any file that represents information about the earth captured in a certain space and time.


- **Catalog**: provides a structure to link various `STAC Items` together or even to other `STAC Catalogs` or `Collections`.


- **Collection:** is a specialization of the `Catalog` that allows additional information about a spatio-temporal collection of data.

# STAC Client API
<hr style="border:1px solid #0077b9;">

For running the examples in this Jupyter Notebook you will need to install the [pystac-client](https://pystac-client.readthedocs.io/en/latest/). To install it from PyPI using `pip`, use the following commands:

In [None]:
!pip3 install scikit-learn
!pip3 install pystac-client

In [None]:
!pip install rasterio shapely matplotlib tqdm folium

In order to access the funcionalities of the client API, you should import the `stac` package, as follows:

In [None]:
import pystac_client

After that, you can check the installed `stac` package version:

In [None]:
pystac_client.__version__

Then, create a `STAC` object attached to the HARMONIZE instance of BDC STAC service:

In [None]:
service = pystac_client.Client.open('https://brazildatacube.dpi.inpe.br/harmonize/dev/stac/v1/')

# Listing the Available Data Products
<hr style="border:1px solid #0077b9;">

In the Jupyter environment, the `STAC` object will list the available image and data cube collections from the service:

In [None]:
for collection in service.get_collections():
    #print(collection)
    if "FlightHeight" in collection.id: #keyword for collections from drone data
        print(collection)

# Retrieving the Metadata of a Collection
<hr style="border:1px solid #0077b9;">

The `collection` method returns information about a given image or data cube collection identified by its name. In this example we are retrieving information about the Mavic 3M collection `Mavic3M_FlightHeight120m-1`:

In [None]:
collection = service.get_collection('Mavic3M_FlightHeight120m-1')
collection

In [None]:
collection.description

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-item.png?raw=true" align="right" width="300"/>

# Retrieving Items
<hr style="border:1px solid #0077b9;">

The `get_items` method returns the items of a given collection:

In [None]:
collection_items = collection.get_items()
collection_items #generator object can be used to loop over items from collection

Visualization of a Region Of Interest (ROI) defined to filter items:

In [None]:
import folium
from shapely.geometry import Polygon

bbox = [-49.5171,-2.5970,-49.4907,-2.5669] #define ROI bounding box

roi_area = Polygon([(bbox[0],bbox[1]),
                    (bbox[0],bbox[3]),
                    (bbox[2],bbox[3]),
                    (bbox[2],bbox[1])])


# Create a folium map centered around the geographic area of interest
folium_map = folium.Map(location=[roi_area.centroid.y, roi_area.centroid.x], zoom_start=14,
                        control_scale=True, zoom_control=False)

folium.Rectangle(
    bounds=[[bbox[1],bbox[0]],[bbox[3],bbox[2]]],
    color="blue",
    weight=2,
    fill=True,
    fill_color="blue",
    fill_opacity=0.2
).add_to(folium_map)

folium_map

In order to support filtering rules through the specification of a rectangle (`bbox`) or a date and time (`datatime`) criterias, use the `Client.search(**kwargs)`. There are other options available, for example, using spatial intersects of a GeoJSON Geometry. Please see the documentation available at https://api.stacspec.org/v1.0.0/item-search.

In [None]:
item_search = service.search(bbox=bbox,
                             datetime='2023-11-07T20:00:00Z/2023-11-07T20:50:00Z',
                             collections=['Mavic3M_FlightHeight120m-1'])
item_search

The method `.search(**kwargs)` returns a `ItemSearch` representation which has handy methods to identify the matched results. For example, to check the number of items matched, use `.matched()`:

In [None]:
item_search.matched()

To iterate over the matched result, use `.items()` to traverse the list of items:

In [None]:
for item in item_search.items():
    print(item)
    break #remove break to view all items

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-asset.png?raw=true" align="right" width="300"/>

# Assets
<hr style="border:1px solid #0077b9;">

The assets with the links to the images, thumbnails or specific metadata files, can be accessed through the property `assets` (from a given item):

In [None]:
assets = item.assets

Then, from the assets it is possible to traverse or access individual elements:

In [None]:
for k in assets.keys():
    print(k)

The metadata related to the Mavic 3M RGB image is available under the dictionary key `file`:

In [None]:
rgb_asset = assets['file']
rgb_asset

To iterate in the item's assets, use the following pattern:

In [None]:
for asset in assets.values():
    print(asset)

# Using RasterIO and NumPy
<hr style="border:1px solid #0077b9;">

The `rasterio` library can be used to read image files from STAC service on-the-fly and then to create `NumPy` arrays. The `read` method of an `Item` can be used to perform the reading and array creation:

In [None]:
import rasterio
from rasterio.plot import show
import numpy

In [None]:
with rasterio.open(assets['file'].href) as rgb_ds:
    rgb = rgb_ds.read()
    rgb_transform = rgb_ds.transform 

<div style="text-align: justify;  margin-left: 15%; margin-right: 15%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>Note:</b> If there are errors because of your pyproj version, you can run the code below as specified in <a  href="https://rasterio.readthedocs.io/en/latest/faq.html#why-can-t-rasterio-find-proj-db-rasterio-from-pypi-versions-1-2-0" target="_blank">rasterio documentation</a> and try again:

       import os
       del os.environ['PROJ_LIB']
</div>

The `rasterio` also provides `rasterio.plot.show()` to perform common tasks such as displaying multi-band images as RGB and labeling the axes with proper geo-referenced extents. Note that when passing arrays, you can pass in a transform in order to get extent labels.

In [None]:
rgb[rgb==0] = 255
rasterio.plot.show(rgb, transform=rgb_transform)

The next cell code import the `Window` class from the `rasterio` library in order to retrieve a subset of an image and then create an array:

In [None]:
from rasterio.windows import Window

We have prepared a basic function `read()`to read raster windows as [`numpy.ma.masked_array`](https://numpy.org/doc/stable/reference/maskedarray.generic.html).

In [None]:
def read(uri: str, window: Window, masked: bool = True, show_bounds: bool = False ):
    """Read raster window as numpy.ma.masked_array."""
    with rasterio.open(uri) as ds:
        if show_bounds:
            print('Window bounds:',ds.window_bounds(window)) # Output: (left, bottom, right, top)
        return ds.read(window=window, masked=masked)

We can specify a subset of the image file (window or chunck) to be read. Let's read a range that starts on pixel (0, 0) with 500 x 500 and column 0 to column 500, for the RGB image:

In [None]:
rgb = read(assets['file'].href, window=Window(0, 0, 500, 500), show_bounds=True) # Window(col_off, row_off, width, height)

In [None]:
rasterio.plot.show(rgb);

You can also load using coordinates:

In [None]:
from rasterio.windows import from_bounds

In [None]:
with rasterio.open(assets['file'].href) as src:
    rst = src.read(window=from_bounds(-5510164.043876435, -284030.70913576556, -5510147.955408352, -284014.6206676836, src.transform))
print(rst.shape)

In [None]:
rasterio.plot.show(rst);

# Using Matplotlib to Visualize Images Composition and NDVI
<hr style="border:1px solid #0077b9;">


The Mavic 3 Multispectral has two forms of sight. It combines an RGB camera with a multispectral camera to scan and analyze crop growth with total clarity. Agricultural production management requires precision and data, and Mavic 3M delivers both.

Source: [DJI Mavic 3m](https://ag.dji.com/mavic-3-m?backup_page=index)


Besides RGB images, we have also produced multispectral images from (NIR, Red Edge, Red and Green) bands and NDVI with Mavic 3M data. All these products are available at the STAC service:

In [None]:
# Get Mavic 3M multispectral data collection:
collection = service.get_collection('Mavic3M_FlightHeight120m_MS-1')
collection.description

In [None]:
search = service.search(collections=["Mavic3M_FlightHeight120m_MS-1"])
collection_items = search.item_collection()

for item in collection_items:
    print(item)

In [None]:
collection_item = collection.get_item('Mavic3M_120m_MS_Mocajuba_BairroNovo_20231107202757')

In [None]:
collection_item.assets 

### Overview of image based on thumbnail
<img src="https://brazildatacube.dpi.inpe.br/harmonize/dev/data/Mavic3M_FlightHeight120m_MS/2023/11/Mocajuba_BairroNovo_20231107T202757_MS.png">

We can specify a subset of the image file (window or chunck) to be read. Let's read a range that starts on pixel (500, 500) with 500 x 500 and column 500 to column 1000, for the spectral bands red, nir and green:



In [None]:
# Using numpy squeze to remove axes of length one:
red = numpy.squeeze(read(collection_item.assets['RED'].href, window=Window(500, 500, 500, 500)), axis=0) # Window(col_off, row_off, width, height)
nir = numpy.squeeze(read(collection_item.assets['NIR'].href, window=Window(500, 500, 500, 500)), axis=0)
green = numpy.squeeze(read(collection_item.assets['GREEN'].href, window=Window(500, 500, 500, 500)), axis=0)

The `Matplotlib` can be used to plot the arrays read in the last section:

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(12, 4))
ax1.imshow(red, cmap='gray');
ax1.set_title('Red band');
ax2.imshow(nir, cmap='gray');
ax2.set_title('Near Infrared band');
ax3.imshow(green, cmap='gray');
ax3.set_title('Green band');

Using `Numpy` we can stack the previous arrays and use `Matplotlib` to plot a color image, but first we need to normalize their values:

In [None]:
def normalize(array):
    """Normalizes numpy arrays into scale 0.0 - 1.0"""
    array_min, array_max = array.min(), array.max()
    return ((array - array_min)/(array_max - array_min))

In [None]:
rgb = numpy.dstack((normalize(red), normalize(nir), normalize(green)))
plt.imshow(rgb);

### Ploting NDVI

In [None]:
asset = collection_item.assets['NDVI']

Using `extra_fields` from asset, we can see information needed to plot properly the vegetation indice raster. For example, `scale`, `nodata`, etc.:

In [None]:
asset.extra_fields

In [None]:
with rasterio.open(asset.href) as raster_ds:
    ndvi = raster_ds.read(1)
    left, bottom, right, top = raster_ds.bounds

In [None]:
# Mask nodata and apply scale to get original values of NDVI:
nodata = asset.extra_fields['eo:bands'][0]['nodata']
scale = asset.extra_fields['eo:bands'][0]['scale']
ndvi = numpy.ma.masked_values(ndvi, nodata) * scale

In [None]:
%matplotlib inline

im1 = plt.imshow(ndvi, extent=(left, right, bottom, top),cmap='summer_r')
plt.colorbar(im1, label='NDVI');


# Retrieving Image Files
<hr style="border:1px solid #0077b9;">

The file related to an asset can be retrieved through the `download` method. The cell code below shows how to download the image file associated to the asset into a folder named `img`:

In [None]:
import os
from urllib.parse import urlparse

import requests
from pystac import Asset
from tqdm import tqdm

def download(asset: Asset, directory: str = None, chunk_size: int = 1024 * 16, **request_options) -> str:
    """Smart download STAC Item asset.

    This method uses a checksum validation and a progress bar to monitor download status.
    """
    if directory is None:
        directory = ''

    response = requests.get(asset.href, stream=True, **request_options)
    output_file = os.path.join(directory, urlparse(asset.href)[2].split('/')[-1])
    os.makedirs(directory, exist_ok=True)
    total_bytes = int(response.headers.get('content-length', 0))
    with tqdm.wrapattr(open(output_file, 'wb'), 'write', miniters=1, total=total_bytes, desc=os.path.basename(output_file)) as fout:
        for chunk in response.iter_content(chunk_size=chunk_size):
            fout.write(chunk)

In [None]:
download(collection_item.assets['NDVI'], 'images')

In order to download all files related to an item, iterate over assets and download each one as following:

In [None]:
for asset in collection_item.assets.values():
    download(asset, 'images')

# References
<hr style="border:1px solid #0077b9;">

- [Spatio Temporal Asset Catalog Specification](https://stacspec.org/)


- [Python Client Library for STAC Service](https://pystac-client.readthedocs.io/en/latest/)

# See also the following Jupyter Notebooks
<hr style="border:1px solid #0077b9;">

* [Introduction to Earth Observation Data Cubes tuned for Health Response (EDPU)
STAC functions in Python](https://github.com/Harmonize-Brazil/code-gallery/blob/main/jupyter/Python/edpu/publish_collection.ipynb)
* [Earth Observation Data Cubes tuned for Health Response Health Indicator PRocessing (EHIPR) user manual](https://github.com/Harmonize-Brazil/code-gallery/blob/main/jupyter/Python/ehipr/spatializing_lis_indicator.ipynb)