# STAC generation with extensions

Uncomment the following line to install eotdl if needed.

In [None]:
# !pip install eotdl

The main extension used by EOTDL for Q2 datasets is the ML-Dataset extension. It enhances the STAC metadata of a dataset including information such as data splits (train, validation, test), quality metrics, etc.

Let's see how to generate a Q2 dataset using the EOTDL library for the EuroSAT dataset. Q2 datasets are generated from Q1 datasets, datasets with STAC metadata. We already showed how to generate a Q1 dataset in the previous section.

The addition of the `ml-dataset` STAC extension to a STAC catalog is pretty straightforward, so it can be done with a simple function called `add_ml_extension`. Sounds easy, right? Let's see what we need.
- `catalog`: the path to the STAC catalog, or the pystac Catalog itself, we want to add the extension. In our case, `data/sentinel_2_stac/catalog.json`.
- `destination`: if we want we can define an output folder to save the catalog, but by default the function generates it in the same folder of the given catalog. In our case, `data/sentinel_2_q2`.
- `splits`: we should put is as `True` if we want to split the labels. By default is `False`, and the default values for the splits are `Train`, `Test` and `Validation` in a `80, 10, 10` proportion. In our case is `True`, and we are fine with the default proportions.
- `splits_collection_id`: the id of the collection we want to make the splits to. In our case, `labels`, which is the default option.
- `name`: the name of the dataset. In our case, `Q2 Dataset`, but feel free to customize it at your own.
- `tasks`: the tasks of the dataset. In our case, `[segmentation]`.
- `inputs_type`: the type of the dataset inputs. In our case, `[satellite imagery]`.
- `annotations_type`: the type of the annotations. In our case, `raster`.
- `version`: the version of our dataset. In our case, `0.1.0`.

Let's add the extension!

In [None]:
from eotdl.curation.stac.extensions import add_ml_extension

catalog = 'data/sentinel_2_stac/catalog.json'

add_ml_extension(
	catalog,
	destination='data/sentinel_2_q2',
	splits=True,
	splits_collection_id="labels",
	name='Q2 Dataset',
	tasks=['segmentation'],
	inputs_type=['satellite imagery'],
	annotations_type='raster',
	version='0.1.0'
)