<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/logo-bdc.png" align="right" width="64"/>

# <span style="color:#336699">Introduction to the SpatioTemporal Asset Catalog (STAC) in R language</span>
<hr style="border:2px solid #0077b9;">

<div style="text-align: left;">
    <a href="https://nbviewer.jupyter.org/github/brazil-data-cube/code-gallery/blob/master/jupyter/Python/stac/stac-introduction.ipynb"><img src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" align="center"/></a>
</div>

<br/>

<div style="text-align: center;font-size: 90%;">
    Felipe Carvalho de Souza<sup><a href="https://orcid.org/0000-0002-5826-1700"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Felipe Menino Carlos<sup><a href="https://orcid.org/0000-0002-3334-4315"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Rennan Marujo<sup><a href="https://orcid.org/0000-0002-0082-9498"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Gilberto R. Queiroz<sup><a href="https://orcid.org/0000-0001-7534-0219"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>
    <br/><br/>
    Earth Observation and Geoinformatics Division, National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact: <a href="mailto:brazildatacube@inpe.br">brazildatacube@inpe.br</a>
    <br/><br/>
    Last Update: February 13, 2023
</div>

<br/>

<div style="text-align: justify;  margin-left: 25%; margin-right: 25%;">
<b>Abstract.</b> This Jupyter Notebook overviews how to use the STAC service to discover and access the data products from the <em>Brazil Data Cube</em> using <em>rstac</em> package.
</div>

<br/>
<div style="text-align: justify;  margin-left: 25%; margin-right: 25%;font-size: 75%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>This Jupyter Notebook is a supplement to the following paper:</b>
    <div style="margin-left: 10px; margin-right: 10px">
    Zaglia, M.; Vinhas, L.; Queiroz, G. R.; Simões, R. <a href="http://urlib.net/rep/8JMKD3MGPDW34R/3UFEFD8" target="_blank">Catalogação de Metadados do Cubo de Dados do Brasil com o SpatioTemporal Asset Catalog</a>. In: Proceedings XX GEOINFO, November 11-13, 2019, São José dos Campos, SP, Brazil. p 280-285.
    </div>
</div>

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac.png?raw=true" align="right" width="66"/>


## <span style="color:#336699">Introduction</span>
<hr style="border:1px solid #0077b9;">

The [**S**patio**T**emporal **A**sset **C**atalog (STAC)](https://stacspec.org/) is a specification created through several organizations' collaboration to increase satellite image search interoperability.

The diagram depicted in the picture contains the most important concepts behind the STAC data model:

<center>
<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-concept.png" width="480" />
<br/>
<b>Figure 1</b> - STAC model.
</center>

The description of the concepts below are adapted from the [STAC Specification](https://github.com/radiantearth/stac-spec):

- **Item**: a `STAC Item` is the atomic unit of metadata in STAC, providing links to the actual `assets` (including thumbnails) they represent. It is a `GeoJSON Feature` with additional fields for things like time, links to related entities, and mainly to the assets. According to the specification, this is the atomic unit that describes the data to be discovered in a `STAC Catalog` or `Collection`.

- **Asset**: a `spatiotemporal asset` is any file representing information about the earth captured in a certain space and time.


- **Catalog**: provides a structure to link various `STAC Items` together or even to other `STAC Catalogs` or `Collections`.


- **Collection:** is a specialization of the `Catalog` that allows additional information about a spatio-temporal collection of data.

### <span style="color:#336699">Clients</span>
<hr style="border:1px solid #0077b9;">


The facilities provided by the STAC service can be used in any programming language that supports network communication via HTTP requests. This means that all modern languages can be used to access the operations of the service.

So, in addition to the specification and implementation of the STAC service reference, the BDC also provides clients with different programming languages. These clients offer facilities that allow easy use of the STAC service in the programming languages they are implemented.

- [stac.py - Python client](https://github.com/brazil-data-cube/stac.py);
- [rstac - R Client](https://github.com/brazil-data-cube/rstac).

This Jupyter Notebook will present how STAC can be used in R through the `rstac` client.

### <span style="color:#336699">First step</span>
<hr style="border:1px solid #0077b9;">

To run the examples in this Jupyter Notebook, you need to install the [rstac](https://github.com/brazil-data-cube/rstac) package.

In [None]:
install.packages("rstac")

If you are running this notebook on your local machine, consider installing the packages listed below:

In [None]:
# Remove '#' to install packages
# install.packages(c("magrittr", "tibble", "dplyr", "raster", "tmap"), dependencies = FALSE)

In [None]:
#system("apt update && apt install ca-certificates")

Let's load the `rstac` and `terra` packages:

In [None]:
library(magrittr) # Package to use pipe operator %>%
library(rstac)    # package rstac
library(terra)    # package to manipulate rasters

Then we will create a query object called `stac_obj` pointing to the service address, allowing us to communicate with the `STAC` service.

In [None]:
stac_obj <- stac("https://data.inpe.br/bdc/stac/v1/")

## <span style="color:#336699">Listing the available Data Products</span>
<hr style="border:1px solid #0077b9;">

To list all the image collections and data cube collections, we will make a request using the `get_request()` function.

In [None]:
#
# query to the data catalog
#
catalog <- stac_obj %>% get_request()

print(catalog)

In [None]:
#
# query the available product collections
#
collections <- stac_obj %>%
    collections() %>%
    get_request()

print(collections, n = 31)

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-catalog.png?raw=true" align="right" width="300"/>

## <span style="color:#336699">Retrieving the Metadata of a Collection</span>
<hr style="border:1px solid #0077b9;">

The `collections()` function returns information about a given image or data cube collection identified by its name. In this example, we are retrieving information about the datacube collection `CB4-16D-2`:


In [None]:
collection_info <- stac_obj %>%
    collections("CBERS4-WFI-16D-2") %>%
    get_request()

print(collection_info)

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-item.png?raw=true" align="right" width="300"/>

## <span style="color:#336699">Retrieving Items</span>
<hr style="border:1px solid #0077b9;">

The `items()` function returns a query given a bounding box (`bbox`) and a date range (`datetime`):

In [None]:
items <- stac_obj %>%
    collections("CBERS4-WFI-16D-2") %>%
    items(datetime = "2018-08-01/2019-07-31",
          bbox  = c(-45.9, -12.9, -45.4, -12.6),
          limit = 20) %>%
    get_request()

print(items)

Let's view the available bands that we can query using the `items_assets()` function:

In [None]:
items_assets(items)

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-asset.png?raw=true" align="right" width="300"/>

## <span style="color:#336699">Assets</span>
<hr style="border:1px solid #0077b9;">

Assets are links to images, thumbnails or specific metadata files and can be accessed through the `assets` property (of an item):

Now, we can view the assets url using `assets_url()` as follows:

In [None]:
assets_url(items, asset_names = c("BAND14", "BAND13"))[1:3]

## <span style="color:#336699">Reading and viewing the images</span>
<hr style="border:1px solid #0077b9;">

We will read and view the images through the terra package. First, we'll filter the assets corresponding to the date `2019-07-28`.

In [None]:
#
# listing the datetime of all items 
#
items_datetime(items)

In [None]:
#
# filtering the assets by a datetime
#
item_filtered <- items_filter(items, filter_fn = function(item) item$properties[["datetime"]] == "2019-07-28T00:00:00.000000Z")

In [None]:
print(item_filtered)

Let's read the images of the filtered item:

In [None]:
blue_url  <- assets_url(item_filtered, asset_names = "BAND13", append_gdalvsi = TRUE)
green_url <- assets_url(item_filtered, asset_names = "BAND14", append_gdalvsi = TRUE)
red_url   <- assets_url(item_filtered, asset_names = "BAND15", append_gdalvsi = TRUE)

In [None]:
#
# reading the first images of each band
#
blue_rast  <- terra::rast(blue_url)
green_rast <- terra::rast(green_url)
red_rast   <- terra::rast(red_url)

We are going to crop the image with a row and column window. To do this, we will create a `bbox` that satisfies this extension.

In [None]:
proj_orig <- sf::st_crs("+proj=longlat +datum=WGS84")
#BDC proj4 string
proj_dest <- sf::st_crs("+proj=aea +lat_0=-12 +lon_0=-54 +lat_1=-2 +lat_2=-22 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +units=m +no_defs")

pts <- tibble::tibble(
    lon = c(-45.89957, -45.40046),
    lat = c(-12.9142, -12.58579)
)
pts_sf <- sf::st_as_sf(pts, coords = c("lon", "lat"), crs = proj_orig)
pts_transf <- sf::st_transform(pts_sf, crs = proj_dest)

lat_dest <- sf::st_coordinates(pts_transf)[, 2]
lon_dest <- sf::st_coordinates(pts_transf)[, 1]

cat("Reprojected longitude:", lon_dest, "\nReprojected latitude:", lat_dest)

In [None]:
#
# defining a clipping length
#
transformed_bbox <- terra::ext(5865751, 5920212, 9884783, 9920060)

In [None]:
#
# cropping images from an extent
#
blue_rast_cropped  <- terra::crop(blue_rast, transformed_bbox)
green_rast_cropped <- terra::crop(green_rast, transformed_bbox)
red_rast_cropped   <- terra::crop(red_rast, transformed_bbox)

Visualizing each band separately

In [None]:
# setting plot display options
options(repr.plot.width = 16, repr.plot.height = 5)
par(mfrow = c(1, 3))

plot(blue_rast_cropped,  main = "Blue Band")
plot(green_rast_cropped, main = "Green Band")
plot(red_rast_cropped,   main = "Red Band")

## <span style="color:#336699">Composite Image Viewing</span>
<hr style="border:1px solid #0077b9;">

Let's create a stack of bands for our composite plot.

In [None]:
#
# creating a band composition
#
rgb <- c(red_rast_cropped, green_rast_cropped, blue_rast_cropped)

#
# rgb view of the created composition
#
plotRGB(rgb, r = 1, g = 2, b = 3, stretch = "lin")

## <span style="color:#336699">Calculating the Normalized Difference Vegetation Index (NDVI)</span>
<hr style="border:1px solid #0077b9;">

The **N**ormalized **D**ifference **V**egetation **I**ndex (NDVI) is calculated using the **Red** and **Near Infrared** (NIR) spectral bands. This index is used to assess whether or not the observed target contains live green vegetation. It can be calculated using the following equation:

$$
NDVI = \frac{(NIR - RED)}{(NIR + RED)}
$$

<center><b>Equation 1</b> - NDVI</center>


<div style="text-align: justify;  margin-left: 15%; margin-right: 15%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
<b>Note:</b>  Brazil Data Cube already provides for this data cube: <em>NDVI</em> and <em>EVI</em> along with the spectral bands. In addition, it also provides quality indicators (<em>CLEAROB</em>, <em>PROVENANCE</em>, <em>CMASK</em>, <em>TOTALOB</em>).
</div>

Thus, the bands `BAND15` and `BAND16` will be loaded from the filtered `items`.

> As can be seen in the metadata of the `items`, `BAND15` corresponds to the **red** wavelength and `BAND16` to the **near-infrared**.

Get the url of the **Red** band

In [None]:
red <- assets_url(item_filtered, asset_names = "BAND15", append_gdalvsi = TRUE)

Get the url of the **Near Infrared** band

In [None]:
nir <- assets_url(item_filtered, asset_names = "BAND16", append_gdalvsi = TRUE)

Reading the cropped scene from the **Red (BAND15)**

In [None]:
red_rast <- terra::crop(terra::rast(red), transformed_bbox)

In [None]:
red_rast

Reading the cropped scene from the **Near Infrared (BAND16)**

In [None]:
nir_rast <- terra::crop(terra::rast(nir), transformed_bbox)

In [None]:
nir_rast

Let's view the data that was loaded

In [None]:
plot(red_rast)

In [None]:
plot(nir_rast)

Now, let's calculate the **NDVI**.

In [None]:
ndvi <- (nir_rast - red_rast) / (nir_rast + red_rast)
ndvi

In [None]:
plot(ndvi)

## <span style="color:#336699">Image Thresholding</span>
<hr style="border:1px solid #0077b9;">

One of the simplest approaches to separate different values in images is thresholding. This process consists of labeling the data based on fixed values.

Let's try to separate our data into groups according to their NDVI values. To get started, first, let's see what the histogram of the image looks like. To do this we will use the `hist` function from the **terra** package

In [None]:
#
# creating the histogram
#
terra::hist(
    ndvi, 
    xlim   = c(0, 1), 
    breaks = 10,  
    main   = "NDVI Distribution",
    xlab   = "NDVI", 
    ylab   = "Frequency", 
    col    = "wheat", 
    xaxt   = "n"
)

#
# defining the interval on the x-axis
#
axis(side = 1, at = seq(0, 1, 0.1), labels = seq(0, 1, 0.1))

Assuming that we can separate the `ndvi` image with thresholding, we assume for this specific case that:

* All pixels with values below 0.2 are dark pixels;
* All pixels with values from 0.2 to 0.45 are sparsely vegetated areas.
* All pixels above 0.45 are heavily vegetated areas;

We can perform this thresholding by selecting from the `ndvi` matrix all values belonging to a given interval and assigning an integer value to it. We assume the following integer values:

* `1`: Dark pixels;
* `2`: Little vegetation
* `3`: Abundant vegetation.

To get started, we will first create a copy of the `ndvi` image:

In [None]:
labelled_img <- ndvi

In [None]:
#
# defining a vector of labels from the thresholds
#
vector_labels <- c(
    0, 0.2, 1,    # valor 1
    0.2, 0.45, 2, # valor 2
    0.45, 1, 3    # valor 3
)   

#
# transforming the label vector into an matrix
#
matrix_labels <- matrix(vector_labels, ncol = 3, byrow = TRUE)

#
# image with added thresholds
#
image_labelled <- terra::classify(labelled_img, matrix_labels, include.lowest = TRUE)

Lets now see *labels*:

In [None]:
options(repr.plot.width = 18, repr.plot.height = 10)

#
# Creating Figure
#
par(mfrow = c(1, 2))

#
# Plot of NDVI data
#
plot(ndvi, main = "NDVI")

#
# Plot of the thresholding result
#
plot(image_labelled, col = c("#E4E538", "#EFB17B", "#00AF22"), main = "Imagem rotulada")

## <span style="color:#336699">Calculating the difference between images</span>
<hr style="border:1px solid #0077b9;">

Now let's compare the NDVI for images from two dates and the same location. This can be used, for example, to check areas where crops have grown and areas that have lost vegetation.

For this calculation, we will use the NDVI indices provided in the data cube from two items with the same location and different dates using STAC.

The first image comprises pixels from September 30, 2018 to October 15, 2018 (`2018-09-30_2018-10-15`):

In [None]:
#
# filtering item by datetime
#
items_first <- items_filter(items, filter_fn = function(item) item$properties[["datetime"]] == "2018-09-30T00:00:00.000000Z")

#
# get the url from NDVI index
#
ndvi_first <- assets_url(items_first, asset_names = "NDVI", append_gdalvsi = TRUE)

The second selected image comprises pixels from January 1, 2019 to January 16, 2019 (`2019-01-01_2019-01-16`), that is, three months after the first selected image:

In [None]:
#
# filtering item by datetime
#
items_second <- items_filter(items, filter_fn = function(item) item$properties[["datetime"]] == "2019-01-01T00:00:00.000000Z")

#
# get the url from NDVI index
#
ndvi_second <- assets_url(items_second, asset_names = "NDVI", append_gdalvsi = TRUE)

<div style="text-align: justify;  margin-left: 15%; margin-right: 15%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>Nota:</b> The NDVI index pre-computed by BDC ranges from <em>-10000</em> to <em>10000</em>, instead of <em>-1</em> to <em>1</em>,  as can be seen in the `item` metadata. This is due to the smaller volume required to store files that use integer (16-bit) values rather than floating-point (32-bit) values.
</div>

Considering that these images are of an agricultural area and that crops are usually planted near August (first observation), six months before or after the first observation, crops are expected to be found, implying higher NDVI values (more vigorous vegetation). This will cause the NDVI band to show brighter values in these areas. Using the gray color map, the high value NDVI pixels will be more like white, while the low-value NDVI pixels will be closer to black.

Based on this, let's visually compare both NDVI images:

Reading the cropped scene for the first NDVI date:
> The region will be the **same** already used in the previous examples.

In [None]:
ndvi_first_rast <- terra::crop(terra::rast(ndvi_first), transformed_bbox)

Reading the cropped scene for the second NDVI date:

In [None]:
ndvi_second_rast <- terra::crop(terra::rast(ndvi_second), transformed_bbox)

In [None]:
#
# plot setup
#
par(mfrow = c(1, 2))

#
# NDVI data plot (First scene)
#
plot(ndvi_first_rast, main = "First scene - 2018-09-30")

#
# NDVI data plot (Second scene)
#
plot(ndvi_second_rast, main = "Second scene - 2019-01-01")

Since we want to see what has grown and what has been lost, we will subtract the newest image from the oldest and plot it:

In [None]:
ndvi_diff <- ndvi_second_rast - ndvi_first_rast

Visualizing the difference between the two scenes with the `tmap` package

In [None]:
# 
# loading package
#
library(tmap)

#
# plot the difference between the two images
#
tm_shape(ndvi_diff) +
  tm_raster(style = "pretty", palette = c("-RdYlBu"), legend.hist = TRUE, midpoint = NA) + 
   tm_layout(legend.outside = TRUE)

As can be seen in the NDVI difference graph, the main changes in pixel values were found in agricultural areas, which was expected due to crop changes.

The blue values indicate negative values, while the red values are positive. This means that there was a loss of vegetation for the blue areas, as a decreasing result in the NDVI value, which means that crops were harvested. Meanwhile, in the red areas, the NDVI value has increased due to more vigorous vegetation in the most recent dates.

## <span style="color:#336699">References</span>
<hr style="border:1px solid #0077b9;">

- [Spatio Temporal Asset Catalog Specification](https://stacspec.org/)


- [Brazil Data Cube R Client Library for STAC Service - GitHub Repository](https://github.com/brazil-data-cube/rstac)