# STACDataFrame labeling

When we want to generate STAC metadata from a imagery dataset through EOTDL, we must generate a STACDataFrame, as seen in this [notebook](20_stac.ipynb). When generating the STACDataFrame, there is a needed parameter called `labeling_strategy`. In this notebook we are going to dive in it.

Uncomment the following line to install eotdl if needed.

In [None]:
# !pip install eotdl

The `labeling_strategy` parameter defines the strategy to extract a label from the filename of an image, to assign a label to it when creating the `STACDataframe`. By default, we have implemented 2 strategies, but if you want to implement your own `labeling_strategy`, you can check this [notebook](32_create_your_own_df_labeler.ipynb) for further information about how to contribute.

- `UnlabeledStrategy`: we will use it when the images do not have a label that identifies them or that has been placed on purpose.

<p align="center">
        <img src="assets/structured_parser.png" alt="Structured parser typical folder structure" style="height:170px; width:200px;"/>
</p>

- `LabeledStrategy`: we will use it when the images are labeled with labels in their filenames. An example would be that in a folder the images were called, for example, River_1.png, River_2.png, River_3.png, and so on. The file name must be the pattern `<label>_<number>`. 

<p align="center">
        <img src="assets/unestructured_parser.png" alt="Structured parser typical folder structure" style="height:200px; width:200px;"/>
</p>

Let's see some examples. In all of them we are going to use the `UnestructuredParser` as `item_parser`. For further information about this feature, see the 
[next notebook](23_stac_item_parsers.ipynb).

In [1]:
from eotdl.curation.stac.dataframe_labeling import UnlabeledStrategy, LabeledStrategy
from eotdl.curation.stac.parsers import UnestructuredParser
from eotdl.curation.stac.stac import STACGenerator

In the first example we are going to generate a `STACDataFrame` from the dataset in `jaca_dataset`, which images simply are named as `Jaca_1`, `Jaca_2`, and so on. This is a perfect example for an Unlabeled dataset to use the `UnlabeledStrategy` with, as `Jaca` is not a label itself.

In [2]:
stac_generator = STACGenerator(item_parser=UnestructuredParser,
                               labeling_strategy=UnlabeledStrategy,
                               image_format='tif'
                               )
df = stac_generator.get_stac_dataframe('example_data/jaca_dataset')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/jaca_dataset/Jaca_1.tif,Jaca_1,0,example_data/jaca_dataset/source,,
1,example_data/jaca_dataset/Jaca_2.tif,Jaca_2,1,example_data/jaca_dataset/source,,
2,example_data/jaca_dataset/Jaca_3.tif,Jaca_3,2,example_data/jaca_dataset/source,,
3,example_data/jaca_dataset/Jaca_4.tif,Jaca_4,3,example_data/jaca_dataset/source,,


> Note: for this concrete dataset we could also have used the `LabeledStrategy`, as we have seen in this [notebook](20_stac.ipynb), but it is also a clear example of `UnlabeledStratefgy`.

In the second example we are going to generate a `STACDataFrame` from the dataset in `eurosat_rgb_dataset`, which images simply are named as `River_1`, `Forest_1`, and so on. This is a perfect example for an labeled dataset to use the `LabeledStrategy` with, as `River`, `Forest` or `AnnualCrop` are labels.

In [3]:
stac_generator = STACGenerator(item_parser=UnestructuredParser,
                               labeling_strategy=LabeledStrategy,
                               image_format='jpg'   # the images are jpg
                               )
df = stac_generator.get_stac_dataframe('example_data/eurosat_rgb_dataset')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/eurosat_rgb_dataset/Forest/Forest...,Forest,0,example_data/eurosat_rgb_dataset/source,,
1,example_data/eurosat_rgb_dataset/River/River_1...,River,1,example_data/eurosat_rgb_dataset/source,,
2,example_data/eurosat_rgb_dataset/Highway/Highw...,Highway,2,example_data/eurosat_rgb_dataset/source,,
3,example_data/eurosat_rgb_dataset/AnnualCrop/An...,AnnualCrop,3,example_data/eurosat_rgb_dataset/source,,
4,example_data/eurosat_rgb_dataset/SeaLake/SeaLa...,SeaLake,4,example_data/eurosat_rgb_dataset/source,,
