# STAC labels from the filename

In a [previous notebook](20_stac.ipynb) we generated STAC metadata from Sentinel-2 imagery, and in [this notebook](22_stac_labels_scaneo.ipynb) we generated labels suposing they have been generated using SCANEO. In this notebook wer are going to generate the labels collecion suposing the images have their labels in their filenames.

Uncomment the following line to install eotdl if needed.

In [None]:
# !pip install eotdl

In order to generate the labels collection of a source collection (understanding 'source' collection as the source where are the STAC items belonging to the images) we have implementated a customizable class named `LabelExtensionObject`. With this class you can decide how to create the labels of your dataset, wether you want to develop your own implementation or use the implementations we have already developed. Let's explain them!

- `ScaneoLabeler`: this implementation should be used when the labels have been generated using SCANEO, so we have a folder with the `geoJSON` label files and their corresponding images.
- `ImageNameLabeler`: this implementation should be used when the images of the dataset are already named with the corresponding labels, such as `River_1`, `Forest_1`, and so on. 

<p align="center">
        <img src="assets/unestructured_parser.png" alt="Structured parser typical folder structure" style="height:200px; width:200px;"/>
</p>

As seen, this is the implementation we are going to use. Let's check the parameters we should use:
- `catalog`: the path to the STAC catalog, or the pystac Catalog itself, we want to add the labels collection. In our case, `example_data/eurosat_rgb_dataset/catalog.json`.
- `stac_dataframe`: the STACDataFrame generated during the STAC metadata generation.
- `collection`: the STAC collection we want to add the labels to. By default is `source`.
- Extra properties can be added using `kwargs`, such as `label:methods` or `label:overviews`. You can check them [here](https://github.com/stac-extensions/label#item-properties). We are going to add `label_methods` as `manual`.

Knowing this, we can generate our labels collection. The easiest is to generate the entire STAC catalog, as we need the `STACDataFrame`.

In [10]:
from eotdl.curation.stac.dataframe_labeling import LabeledStrategy
from eotdl.curation.stac.parsers import UnestructuredParser
from eotdl.curation.stac.stac import STACGenerator

stac_generator = STACGenerator(item_parser=UnestructuredParser,
                               labeling_strategy=LabeledStrategy,
                               image_format='jpg'   # the images are jpg
                               )
df = stac_generator.get_stac_dataframe('example_data/eurosat_rgb_dataset')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/eurosat_rgb_dataset/Forest/Forest...,Forest,0,example_data/eurosat_rgb_dataset/source,,
1,example_data/eurosat_rgb_dataset/River/River_1...,River,1,example_data/eurosat_rgb_dataset/source,,
2,example_data/eurosat_rgb_dataset/Highway/Highw...,Highway,2,example_data/eurosat_rgb_dataset/source,,
3,example_data/eurosat_rgb_dataset/AnnualCrop/An...,AnnualCrop,3,example_data/eurosat_rgb_dataset/source,,
4,example_data/eurosat_rgb_dataset/SeaLake/SeaLa...,SeaLake,4,example_data/eurosat_rgb_dataset/source,,


In [11]:
stac_generator.generate_stac_metadata(id='eurosat-rgb-dataset',
                                      description='EuroSAT RGB dataset',
                                      output_folder='data/eurosat_rgb_stac')

  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)


Generating source collection...


  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
100%|██████████| 10/10 [00:00<00:00, 405.74it/s]

Validating and saving catalog...
Success!





In [12]:
from eotdl.curation.stac.extensions import ImageNameLabeler

labeler = ImageNameLabeler()

catalog = 'data/eurosat_rgb_stac/catalog.json'
labels_extra_properties = {'label_properties': ["label"],
                          'label_methods': ["manual"],
                          'label_tasks': ["classification"]}
labeler.generate_stac_labels(
    catalog=catalog,
    stac_dataframe=df,
    **labels_extra_properties
)

Generating labels collection...


10it [00:00, 2121.87it/s]

Success on labels generation!





In [13]:
from pystac import Catalog

Catalog.from_file('data/eurosat_rgb_stac/catalog.json')

Get the new labels collection!

In [14]:
from pystac import Collection

Collection.from_file('data/eurosat_rgb_stac/labels/collection.json')