Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run a UDF on vector cube to access geometry? #449

Open
soxofaan opened this issue Aug 2, 2023 · 7 comments
Open

How to run a UDF on vector cube to access geometry? #449

soxofaan opened this issue Aug 2, 2023 · 7 comments
Labels
Milestone

Comments

@soxofaan
Copy link
Member

soxofaan commented Aug 2, 2023

We're trying to support/implement a use case that depends heavily on vector cube manipulation, in particular running some custom (Python) UDF on the geometry data.

For example some (GeoJSON) feature collection manipulation use cases that are straightforward to implement the classic way, outside the openEO framework:

  • filter out features, based on geometry (area, location) and/or properties
  • add properties (in GeoJSON sense) to the features, based on geometry and or existing properties, using third party library, tool or service (e.g. lookup of population for a geometry or country code property)

How should these be implemented with openEO processes?
Note that with vector cubes, a lot of the relevant data to work with is in the labels (geometry data), which is quite different compared to how we typically work with raster data cubes

@m-mohr m-mohr added this to the 2.0.0 milestone Aug 2, 2023
@m-mohr
Copy link
Member

m-mohr commented Aug 2, 2023

It would be good to discuss this based on specific use cases.

For label filtering there's filter_labels, maybe apply_dimension works, but other processes might be needed.
Fallback could be run_udf at the top level, but it doesn't imply a good chunking strategy.

@jdries
Copy link
Contributor

jdries commented Aug 3, 2023

To modify the 'bands' (I'm assuming properties are stored in the bands dimension') we typically use apply_dimension(cube,dimension='bands',process=my_callback)

my_callback then gets a labeled array, where the labels are band/property names
What we lack however, are the 'coordinates' of the labeled array in the other dimensions. Perhaps we can specify how these can be passed into the context object?

@m-mohr
Copy link
Member

m-mohr commented Sep 30, 2023

I don't quite get the question @jdries:

  • For x,y the labels are the x/y part respecively of the coordinate
  • For t the labels are (usually) the RFC3339 timestamps
  • For the bands the labels are the band names
  • For the geometries we agreed on WKT of the geometry
  • For "other" there always needs to be a number/string label (see also e.g. https://openeo.org/documentation/1.0/processes.html#rename_labels)

See https://openeo.org/documentation/1.0/datacubes.html#dimensions for details/examples.

So you don't need the context for this, you always have a numerical or string based label for these dimensions.

@jdries
Copy link
Contributor

jdries commented Oct 3, 2023

Yes, but the question is how to get to those labels from within a callback passed to apply_dimension:

apply_dimension(cube,dimension='bands',process=my_callback)

This passes on a labeled array to my process, where the labels of the array will be band names, but what if I in addition need to know the single geometry label? (So the WKT string, for that labeled array.)

@m-mohr
Copy link
Member

m-mohr commented Oct 3, 2023

You could get all WKT strings the array elements using array_labels: https://processes.openeo.org/#array_labels

@jdries
Copy link
Contributor

jdries commented Oct 3, 2023

I think that would give me an array of band names in the above example, whereas I'm rather looking for the single label of the 'geometry' dimension.

@m-mohr
Copy link
Member

m-mohr commented Oct 3, 2023

How does your data cube look like? It feels this issue is lacking context.
Do you want the labels of the remaining dimensions? So if you run the process above for a geometry, bands dimension you want the label for the geometry and if you run over x,y,t you want the labels of x,y,t? Could you explain the use case a bit more? I assume the answer is "you can't do it" for apply_dimension right now. We could add a parameter to the callback that provides the information as an array.

@m-mohr m-mohr added the help wanted Extra attention is needed label Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants