Contextual catalogs #3

rogerkuou · 2024-03-25T13:44:10Z

Example workflow of creating a STAC catalog for contextual data.

Example with data available at Public view on Spider: https://public.spider.surfsara.nl/project/caroline/demo_mobyle/stac_catalog_contextual/

Location: /project/caroline/Public/demo_mobyle/stac_catalog_contextual/

Two example datasets are used:

BAG cadastral dataset: this is a big gpkg file with many polygons and related attributes
KNMI wether data: this includes 1) a csv file of station info and 2) a .txt file with temporal info of one station

The catalog is created in three steps, separated in three notebooks:

Data conversion: convert data into a format supporting chunk: parquet or zarr;
Create catalog of the converted datasets
Query which dataset intersects an example STM

Remaining issues:

Dask-geopandas cannot digest "geometry" column for now, seem to be a WIP development.
There are now very limited support of spatial commands for dask-geopandas
How can we create a catalog searchable by pystac-client?

rogerkuou · 2024-03-25T13:45:37Z

Hi @fnattino, this is an example workflow of creating STAC catalog for MobyLe contextual data. I uploaded the example to the Public view on Spider. Do you mind review this when you have time?

fnattino

Hi @rogerkuou, nice work! I have left some comments on the notebooks below with links to things that could be interesting to check out on the topic.

fnattino · 2024-03-26T10:05:26Z

stac_catalog/step1_format_conversion.ipynb

Nice! One comment on the conversion notebook (we have already discussed it this morning). For handling parquet/geoparquet file, you could check out some of the blog posts from Chris Holmes (see here), who has converted some large Google building dataset to geoparquet (see repository, especially the "processing" section of the README). He is advocating a lot for DuckDB, maybe it could be worth to try it out.

Also a new geoparquet release is coming up: https://medium.com/radiant-earth-insights/geoparquet-1-1-coming-soon-9b72c900fbf2

fnattino · 2024-03-26T10:07:08Z

stac_catalog/README.md

Just note this README reads nicely in "raw" mode, but not in the formatted version (lines are wrapped)

fnattino · 2024-03-26T10:36:07Z

stac_catalog/step2_create_contextual_data_stac_catalog.ipynb

I am thinking whether it makes more sense to link the directory containing the full BAG dataset within a single item (as it is done here) or to link all the partitions individually (i.e. one in each item). The latter approach could make sense if one manages to split the partitions on the basis of some spatial index, so that contiguous polygons are grouped together in a file. I guess this also depends on the tool(s) that will be used to load/generate the (geo)parquet files: if one manages to load the full collection using something like dask-geopandas, then maybe linking the directory in a single item is a good approach.

To specify the asset projection, there is a dedicated STAC extension: https://github.com/stac-extensions/projection . Also, I think that is a STAC norm to use WGS84 for the geometry/bbox of the items, whatever the projection of the linked assets. This also allows one to search a catalog that has items in different projections. Also note that the projection metadata is missing for the KNMI dataset.

If all data is going to be placed in the same folder ("data"), close to the catalog, you might consider using relative paths for the catalog, so that you can simply move the full directory of catalog + data without having to renormalise the files.

Thanks Francesco for the nice ideas! They are definitely worth thinking, it's just for some practical reasons I implemented in this way.
For now there is no spatial index implemented for BAG so from what I see it's still make sense to treat BAG parquets as a whole.
Regarding the path, TUDelft side now want to keep it possible to de-couple the catalog and data. Hence I implemented the abs path.

fnattino · 2024-03-26T10:55:58Z

stac_catalog/step3_query_from_catalog.ipynb

See comment above on projection: if the item's bbox is not in WGS84, you don't have a way to straightforwardly check whether a given item intersects you region of interest.

Unfortunately, I think you cannot use the search functionality of pystac_client on a static catalog (it only works for APIs).
So one needs to setup a server, example implementations are stac-server and stac-fastapi. Also this discussion on the topic of catalogs that needs to be often updated might be interesting. Just for visualization of a catalog via the web browser, one can use the STAC browser, which can be pointed to a static catalog. However, this does not seem to work with the public view of Spider (it works for instance with a static catalog on GitHub - try paste this link into the STAC browser).

1. In file conversions: changed the RD units from km to meter 2. In file conversions: updated the time coords with hours info 3. In query: select STM by filtering None values instead of space subsetting

rogerkuou · 2024-04-03T11:59:42Z

Thanks @fnattino. I reflected on some of your comments. For others I documented them in discussion: TUDelftGeodesy/stmtools#70 for further actions.

rogerkuou added 3 commits March 20, 2024 16:16

add example for BAG data

ea9064c

add example for converting knmi data

da3166b

update notebooks

f5c3bea

rogerkuou requested a review from fnattino March 25, 2024 13:45

rogerkuou added 2 commits March 25, 2024 14:59

add env setup file

c6db58c

add example of stm querying

712d17f

fnattino reviewed Mar 26, 2024

View reviewed changes

rogerkuou added 2 commits April 3, 2024 13:40

formatting readme

3ff5f68

Updated Notebooks:

6b1adae

1. In file conversions: changed the RD units from km to meter 2. In file conversions: updated the time coords with hours info 3. In query: select STM by filtering None values instead of space subsetting

rogerkuou merged commit 6c3d28b into main Apr 3, 2024

rogerkuou mentioned this pull request Apr 3, 2024

Investigate catalog method for contextual data archiving TUDelftGeodesy/stmtools#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contextual catalogs #3

Contextual catalogs #3

rogerkuou commented Mar 25, 2024 •

edited

Loading

rogerkuou commented Mar 25, 2024

fnattino left a comment

fnattino Mar 26, 2024

fnattino Apr 3, 2024

fnattino Mar 26, 2024

fnattino Mar 26, 2024

rogerkuou Apr 3, 2024

fnattino Mar 26, 2024

rogerkuou commented Apr 3, 2024

Contextual catalogs #3

Contextual catalogs #3

Conversation

rogerkuou commented Mar 25, 2024 • edited Loading

rogerkuou commented Mar 25, 2024

fnattino left a comment

Choose a reason for hiding this comment

fnattino Mar 26, 2024

Choose a reason for hiding this comment

fnattino Apr 3, 2024

Choose a reason for hiding this comment

fnattino Mar 26, 2024

Choose a reason for hiding this comment

fnattino Mar 26, 2024

Choose a reason for hiding this comment

rogerkuou Apr 3, 2024

Choose a reason for hiding this comment

fnattino Mar 26, 2024

Choose a reason for hiding this comment

rogerkuou commented Apr 3, 2024

rogerkuou commented Mar 25, 2024 •

edited

Loading