# STAC items parsers

When we want to generate STAC metadata from a imagery dataset through EOTDL, we must generate a STACDataFrame, as seen in this [notebook](20_stac.ipynb). When generating the STACDataFrame, there is a needed parameter called `item_parser`. In this notebook we are going to dive in it.

Uncomment the following line to install eotdl if needed.

In [None]:
# !pip install eotdl

The `item_parser` defines the strategy that must be followed to search for satellite images within the folder and **create an ID for every STAC item**. We have defined 2 item_parser strategies, and new ones can be added as needed. The strategies that are implemented right now are the following.
- `StructuredParser`: this strategy is used when the images are each contained within a folder, so that the name of the item will be the name of the folder.

<p align="center">
    <img src="assets/structured_parser.png" alt="Structured parser typical folder structure" style="height:170px; width:200px;"/>
</p>

- `UnestructuredParser`: this strategy is used when there are multiple images contained in the same folder. We will use this strategy when using the EOTDL to download the dataset images, as it will always format the folder structure the same way. As this is what we have done, it is the strategy that we will use for the use case of this workshop, as all the images are in the same folder.

<p align="center">
    <img src="assets/unestructured_parser.png" alt="Structured parser typical folder structure" style="height:200px; width:200px;"/>
</p>

Knowing this, we can take a look over them. Let's try firs the `UnestructuredParser`.

In [4]:
from eotdl.curation.stac.stac import STACGenerator
from eotdl.curation.stac.assets import STACAssetGenerator
from eotdl.curation.stac.parsers import UnestructuredParser, StructuredParser
from eotdl.curation.stac.dataframe_labeling import LabeledStrategy

stac_generator = STACGenerator(item_parser=UnestructuredParser, 
                               assets_generator=STACAssetGenerator, 
                               labeling_strategy=LabeledStrategy,
                               image_format='tif'
                               )

In [2]:
df = stac_generator.get_stac_dataframe('example_data/jaca_dataset')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/jaca_dataset/Jaca_1.tif,Jaca,0,example_data/jaca_dataset/source,,
1,example_data/jaca_dataset/Jaca_2.tif,Jaca,0,example_data/jaca_dataset/source,,
2,example_data/jaca_dataset/Jaca_3.tif,Jaca,0,example_data/jaca_dataset/source,,
3,example_data/jaca_dataset/Jaca_4.tif,Jaca,0,example_data/jaca_dataset/source,,


And now, let's take a look on the `StructuredParser`.

In [9]:
stac_generator = STACGenerator(item_parser=StructuredParser, 
                               assets_generator=STACAssetGenerator, 
                               labeling_strategy=LabeledStrategy,
                               image_format='tif'
                               )

In [10]:
df = stac_generator.get_stac_dataframe('example_data/jaca_dataset_structured')
df.head()

Unnamed: 0,image,label,ix,collection,extensions,bands
0,example_data/jaca_dataset_structured/Jaca_3/Ja...,Jaca,0,example_data/jaca_dataset_structured/source,,
1,example_data/jaca_dataset_structured/Jaca_4/Ja...,Jaca,0,example_data/jaca_dataset_structured/source,,
2,example_data/jaca_dataset_structured/Jaca_2/Ja...,Jaca,0,example_data/jaca_dataset_structured/source,,
3,example_data/jaca_dataset_structured/Jaca_1/Ja...,Jaca,0,example_data/jaca_dataset_structured/source,,


As seen before, the main use of `item_parser` is to extract the ID of the future item from STAC. This way, `UnstructuredParser` will extract it from the file name, while `StructuredParser` will extract it from the name of the containing folder. This is useful depending on how we have the data structured. For example, the `SEN12-FLOODS` dataset is structured, so we could use `StructuredParser`, while the `EuroSAT-RGB` is unstructured and we will use `UnstructuredParser`. However, in most cases we are going to use the `UnestructuredParser`, as is the easiest one.