

# <span style="color:#336699">Introduction to the SpatioTemporal Asset Catalog (STAC)</span>
<hr style="border:2px solid #0077b9;">

<br/>

<b>Abstract.</b> This Jupyter Notebook gives an overview on how to use the data.geo.admin.ch STAC service to discover and access the data products of  <em>SwissEO</em>.


<br/>


<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac.png?raw=true" align="right" width="66"/>

# Introduction
<hr style="border:1px solid #0077b9;">

The [**S**patio**T**emporal **A**sset **C**atalog (STAC)](https://stacspec.org/) is a specification created through the colaboration of several organizations intended to increase satellite image search interoperability.

The diagram depicted in the picture contains the most important concepts behind the STAC data model:

<center>
<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-concept.png" width="480" />
<br/>
<b>Figure 1</b> - STAC model.
</center>

The description of the concepts below are adapted from the [STAC Specification](https://github.com/radiantearth/stac-spec):

- **Item**: a `STAC Item` is the atomic unit of metadata in STAC, providing links to the actual `assets` (including thumbnails) that they represent. It is a `GeoJSON Feature` with additional fields for things like time, links to related entities and mainly to the assets. According to the specification, this is the atomic unit that describes the data to be discovered in a `STAC Catalog` or `Collection`.

- **Asset**: a `spatiotemporal asset` is any file that represents information about the earth captured in a certain space and time.


- **Catalog**: provides a structure to link various `STAC Items` together or even to other `STAC Catalogs` or `Collections`.


- **Collection:** is a specialization of the `Catalog` that allows additional information about a spatio-temporal collection of data.

# STAC Client API
<hr style="border:1px solid #0077b9;">

For running the examples in this Jupyter Notebook you will need to install the [pystac-client](https://pystac-client.readthedocs.io/en/latest/). To install it from PyPI using `pip`, use the following command:

In [None]:
!pip install pystac-client

In [None]:
!pip install rasterio shapely matplotlib tqdm

In order to access the funcionalities of the client API, you should import the `stac` package, as follows:

In [None]:
import pystac_client

After that, you can check the installed `stac` package version:

In [None]:
pystac_client.__version__

Then, create a `STAC` object attached to the Brazil Data Cube' STAC service:

In [None]:
service = pystac_client.Client.open('https://data.inpe.br/bdc/stac/v1/')

# Listing the Available Data Products
<hr style="border:1px solid #0077b9;">

In the Jupyter environment, the `STAC` object will list the available image and data cube collections from the service:

In [None]:
for collection in service.get_collections():
    print(collection)

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-catalog.png?raw=true" align="right" width="300"/>

# Retrieving the Metadata of a Collection
<hr style="border:1px solid #0077b9;">

The `collection` method returns information about a given image or data cube collection identified by its name. In this example we are retrieving information about the datacube collection `S2-16D-2`:

In [None]:
collection = service.get_collection('S2-16D-2')
collection

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-item.png?raw=true" align="right" width="300"/>

# Retrieving Items
<hr style="border:1px solid #0077b9;">

The `get_items` method returns the items of a given collection:

In [None]:
collection.get_items()

In order to support filtering rules through the specification of a rectangle (`bbox`) or a date and time (`datatime`) criterias, use the `Client.search(**kwargs)`:

In [None]:
item_search = service.search(bbox=(-61.7960,-9.0374,-61.7033,-8.9390),
                             datetime='2018-08-01/2019-07-31',
                             collections=['S2-16D-2'])
item_search

The method `.search(**kwargs)` returns a `ItemSearch` representation which has handy methods to identify the matched results. For example, to check the number of items matched, use `.matched()`:

In [None]:
item_search.matched()

To iterate over the matched result, use `.items()` to traverse the list of items:

In [None]:
for item in item_search.items():
    print(item)

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-asset.png?raw=true" align="right" width="300"/>

# Assets
<hr style="border:1px solid #0077b9;">

The assets with the links to the images, thumbnails or specific metadata files, can be accessed through the property `assets` (from a given item):

In [None]:
assets = item.assets

Then, from the assets it is possible to traverse or access individual elements:

In [None]:
for k in assets.keys():
    print(k)

The metadata related to the Sentinel-2/MSI blue band is available under the dictionary key `B02`:

In [None]:
blue_asset = assets['B02']
blue_asset

To iterate in the item's assets, use the following pattern:

In [None]:
for asset in assets.values():
    print(asset)

# Using RasterIO and NumPy
<hr style="border:1px solid #0077b9;">

The `rasterio` library can be used to read image files from the Brazil Data Cube' service on-the-fly and then to create `NumPy` arrays. The `read` method of an `Item` can be used to perform the reading and array creation:

In [None]:
import rasterio

In [None]:
with rasterio.open(assets['B08'].href) as nir_ds:
    nir = nir_ds.read(1)

<div style="text-align: justify;  margin-left: 15%; margin-right: 15%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>Note:</b> If there are errors because of your pyproj version, you can run the code below as specified in <a  href="https://rasterio.readthedocs.io/en/latest/faq.html#why-can-t-rasterio-find-proj-db-rasterio-from-pypi-versions-1-2-0" target="_blank">rasterio documentation</a> and try again:

       import os
       del os.environ['PROJ_LIB']
</div>

In [None]:
nir

The next cell code import the `Window` class from the `rasterio` library in order to retrieve a subset of an image and then create an array:

In [None]:
from rasterio.windows import Window

We have prepared a basic function `read()`to read raster windows as [`numpy.ma.masked_array`](https://numpy.org/doc/stable/reference/maskedarray.generic.html).

In [None]:
def read(uri: str, window: Window, masked: bool = True):
    """Read raster window as numpy.ma.masked_array."""
    with rasterio.open(uri) as ds:
        return ds.read(1, window=window, masked=masked)

We can specify a subset of the image file (window or chunck) to be read. Let's read a range that starts on pixel (0, 0) with 500 x 500 and column 0 to column 500, for the spectral bands `red`, `green` and `blue`:

In [None]:
red = read(assets['B04'].href, window=Window(0, 0, 500, 500)) # Window(col_off, row_off, width, height)

In [None]:
green = read(assets['B03'].href, window=Window(0, 0, 500, 500))

In [None]:
blue = read(assets['B02'].href, window=Window(0, 0, 500, 500))

In [None]:
blue

You can also load using Coordinates:

In [None]:
from rasterio.windows import from_bounds

In [None]:
with rasterio.open(assets['B02'].href) as src:
    rst = src.read(1, window=from_bounds(4150000.0, 10300000, 4160000.0, 10310000, src.transform))
print(rst.shape)

If you wish you can use lat long coordinates and reproject them into the Albers Equal Area Projection, which is used in the BDC products:

In [None]:
from pyproj import Transformer
from pyproj.crs import CRS

inProj = CRS.from_epsg(4326)
outProj = CRS.from_user_input(src.crs)
transformer = Transformer.from_crs(inProj, outProj, always_xy=True)
x1, y1 = -61.7960, -9.0374
x2, y2 = -61.7033, -8.9390
x1_reproj, y1_reproj = transformer.transform(x1, y1)
x2_reproj, y2_reproj = transformer.transform(x2, y2)
print(x1, y1, x2, y2)
print(x1_reproj, y1_reproj, x2_reproj, y2_reproj)

with rasterio.open(assets['B02'].href) as src:
    rst = src.read(1, window=from_bounds(x1_reproj, y1_reproj, x2_reproj, y2_reproj, src.transform))
print(rst.shape)

# Using Matplotlib to Visualize Images
<hr style="border:1px solid #0077b9;">

The `Matplotlib` cab be used to plot the arrays read in the last section:

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(12, 4))
ax1.imshow(red, cmap='gray')
ax2.imshow(green, cmap='gray')
ax3.imshow(blue, cmap='gray')

Using `Numpy` we can stack the previous arrays and use `Matplotlib` to plot a color image, but first we need to normalize their values:

In [None]:
import numpy

In [None]:
def normalize(array):
    """Normalizes numpy arrays into scale 0.0 - 1.0"""
    array_min, array_max = array.min(), array.max()
    return ((array - array_min)/(array_max - array_min))

In [None]:
rgb = numpy.dstack((normalize(red), normalize(green), normalize(blue)))
plt.imshow(rgb)

# Retrieving Image Files
<hr style="border:1px solid #0077b9;">

The file related to an asset can be retrieved through the `download` method. The cell code below shows ho to download the image file associated to the asset into a folder named `img`:

In [None]:
# import os
# from urllib.parse import urlparse

# import requests
# from pystac import Asset
# from tqdm import tqdm

# def download(asset: Asset, directory: str = None, chunk_size: int = 1024 * 16, **request_options) -> str:
#     """Smart download STAC Item asset.

#     This method uses a checksum validation and a progress bar to monitor download status.
#     """
#     if directory is None:
#         directory = ''

#     response = requests.get(asset.href, stream=True, **request_options)
#     output_file = os.path.join(directory, urlparse(asset.href)[2].split('/')[-1])
#     os.makedirs(directory, exist_ok=True)
#     total_bytes = int(response.headers.get('content-length', 0))
#     with tqdm.wrapattr(open(output_file, 'wb'), 'write', miniters=1, total=total_bytes, desc=os.path.basename(output_file)) as fout:
#         for chunk in response.iter_content(chunk_size=chunk_size):
#             fout.write(chunk)

In [None]:
# download(assets['BAND15'], 'img')

In order to download all files related to an item, iterate over assets and download each one as following:

In [None]:
# for asset in assets.values():
#     download(asset, 'images')

# References
<hr style="border:1px solid #0077b9;">

- [Spatio Temporal Asset Catalog Specification](https://stacspec.org/)


- [Python Client Library for STAC Service](https://pystac-client.readthedocs.io/en/latest/)

# See also the following Jupyter Notebooks
<hr style="border:1px solid #0077b9;">

* [Image processing on images obtained through STAC](./stac-image-processing.ipynb)