-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define and Store metadata in STAC #51
Comments
Mandatory information to store with the model, for re-usability:
Optional information to store:
|
A nice way of validating if inputs are applicable to a given model implemented as a decorator : see "input validation" in A comprehensive guide to putting a machine learning model in production using Flask, Docker, and Kubernetes. |
If we wanted to devise some kind of standard for model interoperability around HDF5, we would likely come up with a HDF5 product definition. Interesting excerpts from [HDF Product Designer](https://wiki.earthdata.nasa.gov/display/HPD/HDF+Product+Designer ++):
How would the use of HDF5 help us in forming totally independent DL containers that would contain all the information needed for interoperability ? Could we implement something in relation to "standardised environments" as per OGC Testbed 14 ? |
How well does HDF5 play with Big Data infrastructures and OGC services like WCS ? Could the H5Server be useful ? |
Could we integrate STAC fields ? |
deepdish ? torch hdf5 ? |
EO profile of STAC includes items such as sun azimuth and elevation : https://github.com/radiantearth/stac-spec/blob/master/extensions/eo/schema.json. Type All we need is there... I suggest we investigate creating STAC Items of the label extension type. Note : models per se are not STAC Items for now. I think there is an opportunity for us to think about how we could make that happen. |
@mpelchat04 is it something that we still want to do? |
Work is ongoing to develop a STAC extension applied to models. The GDL team will check on this as the extension is developed. We will close the issue for now. |
Currently, training is performed on a list of GeoTIFF input images using reference data in GeoPackage files. That list of inputs is stored in csv files. For the results we store just the weights of our model (.pth file).
To make our models interoperable, we need to write out the model together with related weights; those items are our final shareable outputs. Also, should we care to implement checks on whether a particular dataset is amenable to inference using a given model, we need to store all inputs somewhere.
Initially we thought of using HDF to store both the inputs to and outputs of our models. It now appears one of the STAC extensions might be a more logical approach, as STAC is much more web-friendly than HDF.
The text was updated successfully, but these errors were encountered: