# STAC generation with extensions

In the [previous notebook](20_stac.ipynb) we generated STAC metadata from Sentinel-2 imagery. Despithe this, the resulting metadata is kind of incomplete, as it lacks a powerful STAC feature: [extensions](https://stac-extensions.github.io/).

Uncomment the following line to install eotdl if needed.

In [None]:
# !pip install eotdl

Let's get a `STACDataFrame`.

In [1]:
from eotdl.curation.stac.stac import STACGenerator
from eotdl.curation.stac.assets import STACAssetGenerator
from eotdl.curation.stac.parsers import UnestructuredParser
from eotdl.curation.stac.dataframe_labeling import LabeledStrategy

stac_generator = STACGenerator(item_parser=UnestructuredParser, 
                               assets_generator=STACAssetGenerator, 
                               labeling_strategy=LabeledStrategy,
                               image_format='tif'
                               )

In [2]:
df = stac_generator.get_stac_dataframe('example_data/jaca_dataset/')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/jaca_dataset/Jaca_1.tif,Jaca,0,example_data/jaca_dataset/source,,
1,example_data/jaca_dataset/Jaca_2.tif,Jaca,0,example_data/jaca_dataset/source,,
2,example_data/jaca_dataset/Jaca_3.tif,Jaca,0,example_data/jaca_dataset/source,,
3,example_data/jaca_dataset/Jaca_4.tif,Jaca,0,example_data/jaca_dataset/source,,


A key feature is the `label` column. Using the label of every image we are going to assign parameters like the STAC extensions that this image's item is going to have, or the bands we want to extract using the `BandsAssetGenerator`. We can obtain the existing labels in the STACDataFrame before adding new information.

In [3]:
labels = df.label.unique().tolist()
labels

['Jaca']

Starting from the found label we are going to define the STAC extensions. As STAC extensions we are going to implement the [proj](https://github.com/stac-extensions/projection), [raster](https://github.com/stac-extensions/raster) and [eo](https://github.com/stac-extensions/eo) STAC extensions. 

> Note: the supported extensiones are `('eo', 'sar', 'proj', 'raster')`.

On the other hand, although we don't want to extract the image bands, we can define them to see their metadata using the `eo` STAC extension. To simplify, let's only define the bands `B04`, `B03` and `B02`, which are the RGB bands.

To define these parameters for each label, we simply have to declare a dictionary.

In [4]:
extensions = {'Jaca': ('proj', 'raster', 'eo')}
bands = {'Jaca': ('B02', 'B03', 'B04')}

Now we are ready to generate a `STACDataFrame` with relevant information. Some extra parameters to take into account:
- `path`: is the root path where the images are located at. In our case is `data/sentinel_2`.
- `collections`: we can use this parameter to define the STAC collection to which we want each item with a specific label to go. There are several options:
    - The default option puts all the STAC items in a single collection called `source`.
    - The `*` option will consider folders located directly under the root folder as collections, so it will create a collection for each of them.

    <p align="center">
        <img src="assets/collection.png" alt="* collection" style="height:170px; width:200px;"/>
    </p>

    - You can decide the collection you want an image to go to through its label, as we have seen in the case of extensions and bands. To give an example, we are going to define it like this.

In [5]:
collection = {'Jaca': 'sentinel-2-l2a'}

In [6]:
df = stac_generator.get_stac_dataframe('example_data/jaca_dataset/', collections=collection, extensions=extensions, bands=bands)
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/jaca_dataset/Jaca_1.tif,Jaca,0,example_data/jaca_dataset/sentinel-2-l2a,"(proj, raster, eo)","(B02, B03, B04)"
1,example_data/jaca_dataset/Jaca_2.tif,Jaca,0,example_data/jaca_dataset/sentinel-2-l2a,"(proj, raster, eo)","(B02, B03, B04)"
2,example_data/jaca_dataset/Jaca_3.tif,Jaca,0,example_data/jaca_dataset/sentinel-2-l2a,"(proj, raster, eo)","(B02, B03, B04)"
3,example_data/jaca_dataset/Jaca_4.tif,Jaca,0,example_data/jaca_dataset/sentinel-2-l2a,"(proj, raster, eo)","(B02, B03, B04)"


In [7]:
stac_generator.generate_stac_metadata(stac_id='jaca-dataset-extensions',
                                      description='Jaca dataset with STAC extensions',
                                      output_folder='data/jaca_dataset_stac_extensions')

Generating sentinel-2-l2a collection...


100%|██████████| 4/4 [00:00<00:00, 243.99it/s]

Validating and saving catalog...
Success!





Let's check our new STAC catalog!

In [8]:
from pystac import Catalog

Catalog.from_file('data/jaca_dataset_stac_extensions/catalog.json')