Define and Store metadata in STAC #51

ymoisan · 2019-02-05T19:45:14Z

Currently, training is performed on a list of GeoTIFF input images using reference data in GeoPackage files. That list of inputs is stored in csv files. For the results we store just the weights of our model (.pth file).

To make our models interoperable, we need to write out the model together with related weights; those items are our final shareable outputs. Also, should we care to implement checks on whether a particular dataset is amenable to inference using a given model, we need to store all inputs somewhere.

Initially we thought of using HDF to store both the inputs to and outputs of our models. It now appears one of the STAC extensions might be a more logical approach, as STAC is much more web-friendly than HDF.

mpelchat04 · 2019-02-06T15:53:57Z

Mandatory information to store with the model, for re-usability:

Weights (.pth)
Model definition (e.g. Unet model)
Task type (e.g. classification or semantic segmentation)
Number of classes and surely their definition (e.g. 1-Vegetation, 2- Lake, 3- Building, etc.)
Number of band used for training and their definition (e.g. 4 bands: R-G-B-PIR);
- The definition should describe the source of each band:
  - Sensor type (e.g. Satellite, LiDAR, aerial photos, radar, etc.)
  - Acquisition date
  - Wavelength (if applicable)
  - Preprocess (if applicable)
Spatial resolution to which the training was conducted
Geographic location where the training/validation and tests were conducted. (e.g. bounding box or footprint, maybe?)

Optional information to store:

Training and validation accuracy
Training parameters (e.g. learning rate, # of epoch, class weights, etc.)

ymoisan · 2019-02-07T21:49:25Z

A nice way of validating if inputs are applicable to a given model implemented as a decorator : see "input validation" in A comprehensive guide to putting a machine learning model in production using Flask, Docker, and Kubernetes.

ymoisan · 2019-02-08T19:37:19Z

If we wanted to devise some kind of standard for model interoperability around HDF5, we would likely come up with a HDF5 product definition. Interesting excerpts from [HDF Product Designer](https://wiki.earthdata.nasa.gov/display/HPD/HDF+Product+Designer ++):

The Hierarchical Data Format (HDF5) provides a flexible container that supports groups and datasets, each of which can have attributes. In many ways, HDF5 is similar to a directory structure in a file and, like directory structures, the same data can be structured and annotated in many ways. This flexibility empowers HDF5 users to arrange data in ways that make sense to them. However, it can make it difficult to share data ...
Many communities have successfully addressed this problem by creating conventional structures and annotations for data in HDF5. This approach depends on data files (e.g., products) that carefully follow these conventions.
A HDF5 product is the content that should exist in a single HDF5 file.
This content is defined by the HDF5 objects (groups, attributes, datasets), their names, the hierarchies they create (links and references), and attribute values. Dataset values are typically not stored in such files (unless they qualify as metadata) thus this software cannot be used as a data server. Once completed, a HDF5 product is replicated in many files (commonly on the order of tens of thousands or more) and filled with real data.

How would the use of HDF5 help us in forming totally independent DL containers that would contain all the information needed for interoperability ? Could we implement something in relation to "standardised environments" as per OGC Testbed 14 ?

ymoisan · 2019-02-08T21:10:12Z

How well does HDF5 play with Big Data infrastructures and OGC services like WCS ? Could the H5Server be useful ?

ymoisan · 2019-04-05T17:25:16Z

Could we integrate STAC fields ?

ymoisan · 2019-07-16T16:02:02Z

deepdish ? torch hdf5 ?

ymoisan · 2019-08-01T20:02:37Z

EO profile of STAC includes items such as sun azimuth and elevation : https://github.com/radiantearth/stac-spec/blob/master/extensions/eo/schema.json. Type 20170831_162740_ssc1d1 in your browser search bar and you'll en up here :

All we need is there...

I suggest we investigate creating STAC Items of the label extension type. Note : models per se are not STAC Items for now. I think there is an opportunity for us to think about how we could make that happen.

CharlesAuthier · 2023-03-07T20:59:21Z

@mpelchat04 is it something that we still want to do?

remtav · 2023-05-04T18:11:28Z

Work is ongoing to develop a STAC extension applied to models. The GDL team will check on this as the extension is developed. We will close the issue for now.

ymoisan mentioned this issue Feb 7, 2019

Update software stack -- and start geo-deep-learning release numbering #44

Closed

ymoisan mentioned this issue Feb 8, 2019

Add training metadata in weights file name #56

Closed

ymoisan added the P2 Medium priority label Feb 12, 2019

mpelchat04 added P1 High priority and removed P2 Medium priority labels Jun 12, 2019

mpelchat04 added this to the V1.1 milestone Jun 12, 2019

valhassan changed the title ~~Store model definitions and metadata in HDF~~ Define and Store metadata in STAC Aug 11, 2020

ymoisan added question Further information is requested and removed P1 High priority labels Aug 11, 2020

mpelchat04 removed this from the V1.1 milestone Aug 12, 2020

remtav closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define and Store metadata in STAC #51

Define and Store metadata in STAC #51

ymoisan commented Feb 5, 2019 •

edited

Loading

mpelchat04 commented Feb 6, 2019

ymoisan commented Feb 7, 2019

ymoisan commented Feb 8, 2019

ymoisan commented Feb 8, 2019

ymoisan commented Apr 5, 2019

ymoisan commented Jul 16, 2019 •

edited

Loading

ymoisan commented Aug 1, 2019 •

edited

Loading

CharlesAuthier commented Mar 7, 2023

remtav commented May 4, 2023

Define and Store metadata in STAC #51

Define and Store metadata in STAC #51

Comments

ymoisan commented Feb 5, 2019 • edited Loading

mpelchat04 commented Feb 6, 2019

Mandatory information to store with the model, for re-usability:

Optional information to store:

ymoisan commented Feb 7, 2019

ymoisan commented Feb 8, 2019

ymoisan commented Feb 8, 2019

ymoisan commented Apr 5, 2019

ymoisan commented Jul 16, 2019 • edited Loading

ymoisan commented Aug 1, 2019 • edited Loading

CharlesAuthier commented Mar 7, 2023

remtav commented May 4, 2023

ymoisan commented Feb 5, 2019 •

edited

Loading

ymoisan commented Jul 16, 2019 •

edited

Loading

ymoisan commented Aug 1, 2019 •

edited

Loading