In [2]:
%load_ext autoreload
%autoreload 2

%load_ext dotenv
%dotenv

# STAC metadata generation

In this notebook we are going to generate the STAC metadata of our dataset, converting it from a Q0 dataset to a [Q1 dataset](../00_eotdl.ipynb). 

The [STAC](https://stacspec.org/en) specification is a common language to describe geospatial information, so it can more easily be worked with, indexed, and discovered. Is a standarized way to expose, host, ingest and access geospatial collections that has been adopted as the EOTDL standard metadata format. For further information, check their [website](https://stacspec.org/en).

In order to facilitate the STAC generation, which can be painful and time-expensive, the EOTDL environment has several open source tools that make the process much more straightforward. Let's dive into them!

The first thing we have to understand is that the process starts with a `STACDataFrame`. This `STACDataFrame` is an interface between the images and the STAC catalogs and collections, with some variables that we can define and customise to ensure that the STAC metadata has the information we want, like `extensions`, which defines the [STAC extensions](https://stac-extensions.github.io/) that the image must have, or `bands`, with the bands we want to get from the image, if any.

Let's see the example below.

In [9]:
import pandas as pd

sample_df = pd.read_csv('data/sample_stacdataframe.csv')
sample_df

Unnamed: 0,image,label,ix,collection,extensions,bands
0,data/sentinel_2/Boadella_2019-06-07/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"('proj', 'raster')","('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B0..."
1,data/sentinel_2/Boadella_2019-06-02/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"('proj', 'raster')","('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B0..."
2,data/sentinel_2/Boadella_2019-07-02/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"('proj', 'raster')","('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B0..."
3,data/sentinel_2/Boadella_2019-06-17/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"('proj', 'raster')","('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B0..."
4,data/sentinel_2/Boadella_2019-06-27/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"('proj', 'raster')","('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B0..."


This is a sample `STACDataFrame` already generated for our workshop. Here we can see a lot of information:
- image: the path to every image.
- label: the label assigned to every image. 
- ix: the index of the label.
- collection: the collection which the image belongs to. 
- extensions: a list with the STAC extensions we want the image STAC item to have.
- bands: a list of band we want the image STAC item to have.

Now we have seen this, let's generate the STAC metadata for our dataset. Don't worry, we are going to explain it step by step!

First of all, we need to import the `STACGenerator` class.

In [10]:
from eotdl.curation.stac.stac import STACGenerator

The `STACGenerator` class is the entry point and the STAC generation class, where magic happens. Before we declare it, we need to understand the parameters we can give to it:
- image_format: the extension of the images. Could be `png`, `jpg` and so on. By default is `tiff`.
- catalog_type: the STAC Catalog type. It is a specification defined [here](https://pystac.readthedocs.io/en/0.4/concepts.html#catalog-types). By default is `SELF_CONTAINED`. 
- item_parser
- assets_generator

A key feature is the `label` column. Using the label of every image we are going to assign parameters like the STAC extensions that this image are going to have or the bands we want to extract.

In [3]:
from eotdl.curation.stac.stac import STACGenerator
from eotdl.curation.stac.assets import BandsAssetGenerator

stac_generator = STACGenerator(assets_generator=BandsAssetGenerator)

extensions = {'sentinel-2-l2a': ('proj', 'raster')}
bands = {'sentinel-2-l2a': ('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B11', 'B12')}

df = stac_generator.get_stac_dataframe('data/sentinel_2', bands=bands, extensions=extensions)
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,data/sentinel_2/Boadella_2019-06-07/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"(proj, raster)","(B01, B02, B03, B04, B05, B06, B07, B08, B09, ..."
1,data/sentinel_2/Boadella_2019-06-02/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"(proj, raster)","(B01, B02, B03, B04, B05, B06, B07, B08, B09, ..."
2,data/sentinel_2/Boadella_2019-07-02/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"(proj, raster)","(B01, B02, B03, B04, B05, B06, B07, B08, B09, ..."
3,data/sentinel_2/Boadella_2019-06-17/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"(proj, raster)","(B01, B02, B03, B04, B05, B06, B07, B08, B09, ..."
4,data/sentinel_2/Boadella_2019-06-27/sentinel-2...,sentinel-2-l2a,0,data/sentinel_2/source,"(proj, raster)","(B01, B02, B03, B04, B05, B06, B07, B08, B09, ..."


In [4]:
stac_generator.generate_stac_metadata(id='boadella-dataset',
                                      description='Boadella dataset',
                                      output_folder='data/sentinel_2_stac')

Generating source collection...


100%|██████████| 5/5 [00:00<00:00, 22.21it/s]

Validating and saving catalog...
Success!



