Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CRUD GUI for STAC Catalogs, Collections, and Items, Including an Asset Uploader #127

Open
zacharyDez opened this issue Aug 23, 2023 · 8 comments

Comments

@zacharyDez
Copy link
Contributor

Background

STAC (SpatioTemporal Asset Catalog) provides a standardized way to describe geospatial information, facilitating the discovery and usage of spatial assets. In the eoAPI project, we need to manage STAC catalogs, collections, and items effectively. A few end-users have specifically requested this functionality, emphasizing the need for user-friendly interactions with STAC components. In most cases, we actually want to avoid having manual operations

Issue

We require a user-friendly interface to perform CRUD operations (Create, Read, Update, Delete) on STAC catalogs, collections, and items, and to load the underlying referenced assets of a STAC item based on the minimum use case described in issue #5.

Requirements

The initial scope discussed with @batpad only addressed the Update capabilities, but it would be great to have a more complete tool to manage STAC catalogs visually.

  1. Create:

    • Implement forms to create new STAC catalogs, collections, and items.
    • Validate the user input according to the STAC specification.
    • Provide user feedback on successful creation or errors.
  2. Read:

    • Display existing STAC catalogs, collections, and items.
    • Allow filtering and sorting based on various criteria.
    • Include pagination for large datasets.
  3. Update:

    • Enable editing of existing STAC catalogs, collections, and items.
    • Validate changes according to the STAC specification.
    • Provide user feedback on successful updates or errors.
  4. Delete:

    • Implement the functionality to delete STAC catalogs, collections, and items.
    • Include confirmation dialogs to prevent accidental deletions.
    • Provide user feedback on successful deletions or errors.
  5. Asset Uploader:

    • Implement a GUI data uploader to load the underlying referenced assets of a STAC item.
    • Follow the minimum use case based on issue #5.
    • Ensure compatibility and smooth integration with the existing CRUD operations.

Additional Considerations

@vincentsarago
Copy link
Member

vincentsarago commented Aug 23, 2023

🙏

just FYI there is a draft OGC feature extension for most of this https://docs.ogc.org/DRAFTS/20-002.html and there is a STAC extension https://github.com/radiantearth/stac-api-spec/tree/master/ogcapi-features#transaction-extension

  1. Asset Uploader:
    Implement a GUI data uploader to load the underlying referenced assets of a STAC item.
    Follow the minimum use case based on COG storage/translation and/or STAC API as a service  #5.
    Ensure compatibility and smooth integration with the existing CRUD operations.

IMHO, this is (and will be) really complex to implement. Doing 1-4 will already be great. Adding 5, will mean that you have to take care of the storage type and some 🕳️

@sharkinsspatial
Copy link
Member

@zacharyDez A few high level thoughts here and then some lower level thoughts 😄

The creation of STAC metadata for most datasets/formats is relatively complex process that normally requires some specialized knowledge about the dataset and its intended use cases. Like you mentioned, to avoid manual operations we've been trying to capture this specialized domain specific knowledge in easier to use modules via stactools-packages. Let's look in detail at each of the CRUD GUI proposed requirements

  1. Create - if we're generating a STAC item for a granule, most of the information required in the core STAC Item Specification (geometry, bbox, and assets) requires programatic access to the underlying granule assets to be generated. The majority of the most useful extensions also require programatic access to the underlying granule assets. In most cases, the bulk of the STAC item information needs to be generated by some code that reads the granule assets, extracts information and populates the STAC item. A small amount of the item's information is not generated from the granule assets and requires some human intervention (id, description, etc). On the validation point, we support validating the semantic correctness of STAC items and catching some common data consistency problems (most notably the inaccessibility of an asset contained in the item) in the validation portion of the STAC Ingestor API included with eoapi-cdk.

  2. Read - this is the current goal of the stac-browser project (with the exception of limited support for filtering criteria). If there are features we want associated with general purpose static catalog and API browsing that stac-browser does not currently provide we may want to focus effort on contributing those features to this existing project.

  3. Update - due to the underlying storage architecture used by pgSTAC, updates in eoAPI are always pseudo-upsert (how's that for an obscure term 😸 ) so that we actually overwrite an existing item with whatever the new json we insert with the same 'id'. See the pgstac documentation for more details. As mentioned in requirement 1, the bulk of the STAC item's information needs to be generated from code rather than manually so I'm curious to hear more details about the end-user requirements for manual property updates. I'm betting this is a very valid use case and we might have to consider some other workflow options for solving this.

  4. Delete - this is a very interesting case and tightly related to requirement 5. In the majority of systems we work with the STAC items are the source of truth about what data is in the system. We often want to maintain absolute synchronization between STAC items and their underlying assets so that when an asset is removed from storage, that is reflected in the STAC item and vice versa (we've seen the phenomena of orphaned items and assets in several systems we manage previously). As @vincentsarago mentioned, this synchronization management is a difficult technical problem. The big question is if we want this advanced asset storage logic included in the eoAPI project or we think it should live somewhere else.

  5. Asset Uploader - see above for more details.

Additional considerations -

  • We have implemented an authentication / authorization layer for the eoapi-cdk stac ingestor.
  • We have several workflow examples already making use of the stac ingestor API endpoints in fully automated pipelines.

I like a lot of the concepts here but I wonder if we're at too early a stage to build a GUI around STAC item manipulation. Given our limited dev resources perhaps we can focus first on implementing the STAC transaction extension as outlined in the eoAPI transition roadmap so that we can have parity between our cdk and k8s implementations when it comes to ingestion APIs. Then we could consider building a common GUI on top of the API provided by the transaction extension if there is sufficient demand.

One area of great interest where manual intervention is necessary and a GUI would be immediately valuable is around the STAC Catalog specification which would allow a user to potentially group any arbitrary set of items together into a cohesive unit. We had explored this concept on several projects where an item (which may only have one parent collection) can be a member of multiple catalogs. A GUI where a user could select and group items without defined criteria into a catalog would be awesome. Unfortunately, this would only be supported in static STAC implementations at the moment as the catalog concept is not modeled in pgSTAC as I understand it.

@zacharyDez
Copy link
Contributor Author

@sharkinsspatial ; Thanks for the detailed message about the CRUD GUI for eoAPI. It was super helpful for my understanding and very detailed. You are correct that we are still at the point of working on an implementation. Your comment raised many questions for me that we can keep tabs on while focusing on other higher-priority issues.

I think that for specific EO data, you are correct that we need to observe the dataset programmatically to populate the STAC items. Organizations cataloging smaller datasets, potentially vector ones, would correspond to end-users most interested in an interface to manage their catalogs. I still feel we could make that access happen via the GUI for creation. Access to the underlying granules to generate the bbox, and geometry should not be an issue. A lot of organizations have custom metadata schemas that they need to convert to STAC at the time of ingestion - I don't think it is entirely unreasonable that we could be able to map these custom schemas to the STAC specification. I see the main challenge in managing the potential collection vs. item-level metadata translation.

The main update task that I can think of is updating the description which is likely to happen at the collection level and not the item level. Perhaps some manual QA and updates could be justifiable in some cases? It's hard to see what changes would not require modifying the underlying data pipeline to avoid reproducing.

About the upsert logic, do you see this being a blocker to implementing any versioning, given we are overwriting items with identical IDs? I recognize we can create snapshots, but a changeset seems like a more lightweight approach.

I'm interested to hear more about grouping items into cohesive catalogs; let's set some time to chat about previous implementations we've worked on.

@j08lue
Copy link
Member

j08lue commented Oct 11, 2023

An example of a metadata catalog software with editing UI is GeoNetwork - I don't have experience with it and it is probably a different ball game (INSPIRE etc 😱) but just stumbled over this and maybe these references are somehow helpful...

@j08lue
Copy link
Member

j08lue commented Dec 8, 2023

We have this on our roadmap for VEDA: https://github.com/NASA-IMPACT/veda-architecture/issues/336

Looks like @oliverroick actually implemented it in https://github.com/developmentseed/stac-admin?

@oliverroick
Copy link
Member

Looks like @oliverroick actually implemented it in https://github.com/developmentseed/stac-admin?

Some of it. The functionality of stac-admin is still quite limited. So far you can:

  • Read collections and items (this work is ongoing)
  • Update collections and items (but that is limited to common metadata, no support for fields from extensions)

All other requirements outlined above would be great additions to what we have so far; happy to work with you on that.

@emmanuelmathot
Copy link

FYI a collection transaction extension is in WIP with the same CRUD approach as per items.

@m-mohr
Copy link

m-mohr commented Apr 25, 2024

fyi: There are also plans to add the missing Create, Update, Delete operations via the well-known Item and Collection Transaction extensions to STAC Browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants