# ML Models

Like training datasets, ML Models in EOTDL are categorized into different [quality levels](https://eotdl.com/docs/datasets/quality), which in turn will impact the range of functionality that will be available for each model.

In this tutorial you will learn about Q2 models, models with STAC metadata and the ML-Model extension (models with STAC metadata but not ML-Model extension will be qualified as Q1). 

## STAC Spec

For Q2 ML Models we rely on the [ML-Model](https://github.com/crim-ca/mlm-extension) STAC extension. Here we develop the required metadata for the [EuroSAT-RGB](https://www.eotdl.com/models/EuroSAT-RGB) Q0 model on EOTDL.

In [1]:
import eotdl

eotdl.__version__

'2024.06.13'

In [5]:
from eotdl.models import download_model

path = download_model('EuroSAT-RGB', path="data", version=1, force=True)

path

100%|██████████| 1/1 [00:04<00:00,  4.20s/file]


'data/EuroSAT-RGB/v1'

In [6]:
import os 

os.listdir(path)

['README.md', 'model.onnx']

Our goal is to provide STAC metadata to run `model.onnx` on any inference processor that implements the ML-Model STAC extension. From the official repo:

> The STAC Machine Learning Model (MLM) Extension provides a standard set of fields to describe machine learning models trained on overhead imagery and enable running model inference.
>
> The main objectives of the extension are:
>
> 1. to enable building model collections that can be searched alongside associated STAC datasets
> 2. record all necessary bands, parameters, modeling artifact locations, and high-level processing steps to deploy an inference service.
>
>Specifically, this extension records the following information to make ML models searchable and reusable:
>
> 1. Sensor band specifications
> 2. Model input transforms including resize and normalization
> 3. Model output shape, data type, and its semantic interpretation
> 4. An optional, flexible description of the runtime environment to be able to run the model
> 5. Scientific references

Let's start with a generic `catalog` for our model.

In [7]:
import pystac

# current directory + 'data/RoadSegmentation/STAC'
root_href = os.path.join(os.getcwd(), 'data/EuroSAT-RGB/STAC')

catalog = pystac.Catalog(id='EuroSAT-RGB-Q2', description='Catalog for the EuroSAT RGB Q2 ML Model')

Now let's create a `collection` for our model.

In [8]:
import pystac
from datetime import datetime

# Create a new Collection
collection = pystac.Collection(
    id='model',
    description='Collection for the EuroSAT RGB Q2 ML Model',
    extent=pystac.Extent(
        spatial=pystac.SpatialExtent([[-180, -90, 180, 90]]), # dummy extent
        temporal=pystac.TemporalExtent([[datetime(2020, 1, 1), None]]) # dummy extent
    ),
	# extra_fields={
    #     'stac_extensions': ['https://crim-ca.github.io/mlm-extension/v1.2.0/schema.json']
    # }
)

# Add the Collection to the Catalog
catalog.add_child(collection)

And finally, an `item` to describe the model itself with the extension.

In [9]:
# Create a new Item
item = pystac.Item(
    id='model',
    geometry={ # dummy geometry
        "type": "Point",
        "coordinates": [125.6, 10.1]
    },
    bbox=[125.6, 10.1, 125.6, 10.1], # dummy bbox
    datetime=datetime.utcnow(), # dummy datetime
    properties={ 
		"mlm:name": "model.onnx", # name of the asset ? otherwise, how can we know which asset to use ?
		"mlm:framework": "ONNX",  # only framework support for now
		"mlm:architecture": "resnet",
		"mlm:tasks": ["classification"], # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#task-enum
		"mlm:input": { # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#model-input-object
			"name": "RGB statellite image",
			"bands": [ # how can we know which bandas correspond depending on the satellite ?
				"red",
				"green",
				"blue"
			],
			"input": { # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#input-structure-object
				"shape": [
					-1,
					3,
					-1, 
					-1
				],
				"dim_order": [
					"batch",
					"channel",
					"height",
					"width"
				],
				"data_type": "float32",
				# "pre_processing_function": { # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#processing-expression
				# 	"format": 
				# 	"expression": 
				# }
			}
		},
		"mlm:output": {
			"name": "logits",
			"tasks": ["classification"], # redundant ?
			"classification:classes": [
				'AnnualCrop',
				'Forest',
				'HerbaceousVegetation',
				'Highway',
				'Industrial',
				'Pasture',
				'PermanentCrop',
				'Residential',
				'River',
				'SeaLake'
			],
			"result": { # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#result-structure-object
				"shape": [-1, 10],
				"dim_order": [
					"batch",
					"height",
					"width"
				],
				"data_type": "float32",
				# "post_processing_function": { # https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#processing-expression
				# }
			},
		},
	}, 
    stac_extensions=['https://crim-ca.github.io/mlm-extension/v1.2.0/schema.json']
)

# Add the Item to the Collection
collection.add_item(item)

# Save the Catalog to a file
# catalog.normalize_and_save(root_href=root_href, catalog_type=pystac.CatalogType.SELF_CONTAINED)


The model weights are added as an asset to the item

In [10]:
# Create an Asset
model_asset = pystac.Asset(
    href=os.path.abspath('data/EuroSAT-RGB/v1/model.onnx'), 
)

# Add the Asset to the Item
item.add_asset('model', model_asset)

Finally, we validate and save the metadata

In [11]:
# Validate the Catalog

# catalog.validate_all()

catalog.normalize_and_save(root_href=root_href, catalog_type=pystac.CatalogType.SELF_CONTAINED)


Now we can ingest the model to the EOTDL. 

In [12]:
from eotdl.models import ingest_model 

ingest_model(root_href)

Loading STAC catalog...
New version created, version: 1


100%|██████████| 1/1 [00:02<00:00,  2.16s/it]


Ingesting STAC catalog...
Done
