Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector cube design: property preservation when using aggregate spatial #466

Open
soxofaan opened this issue Aug 2, 2023 · 7 comments
Open

Comments

@soxofaan
Copy link
Member

soxofaan commented Aug 2, 2023

(Related to use case experiments discussed in #448 #449)

Set up:

  • vector cube vc1, loaded from a GeoJSON feature collection, where each feature is some polygon with some properties, e.g. crop type, population, an ML target value or class, ...
  • raster cube cube with e.g. NDVI data
  • vc2 = aggregate_spatial(data=cube, geometries=vc1, reducer="mean")

Question: are the original GeoJSON-style properties of vc1 still available in vc2?

  • Yes -> E.g. so that vc2 can directly be used to train a ML model?
  • No -> aggregate_spatial only considers vc1's geometry and ignores any existing cube data? The user then has to take some tedious steps to "join"/merge vc1 and vc2 again in order to use it for ML applications.

I kind of remember vector cube discussions where we wanted preservation of properties (the "Yes" approach), e.g. using aggregate_spatial to "enrich" a vector cube with additional "columns" of aggregation data.
However, I think the current design of vector cubes enforces the "No" approach because there are just cube values and you can not generically/automatically combine pre-existing cube data with new (aggregation) cube data.

soxofaan referenced this issue in Open-EO/openeo-python-driver Aug 2, 2023
- flattening: move options to export phase iso vector cube constructor
- introduce `VectorCube.from_geodataframe` wiith support for promoting selected columns to cube values
- regardless of promotion: all properties are still associated with `VectorCube.geometries` for now (otherwise properties can not be preserved when using `aggregate_spatial`, see Open-EO/openeo-api#504)
- only promote numerical values by default for now
soxofaan referenced this issue in Open-EO/openeo-python-driver Aug 3, 2023
- flattening: move options to export phase iso vector cube constructor
- introduce `VectorCube.from_geodataframe` wiith support for promoting selected columns to cube values
- regardless of promotion: all properties are still associated with `VectorCube.geometries` for now (otherwise properties can not be preserved when using `aggregate_spatial`, see Open-EO/openeo-api#504)
- only promote numerical values by default for now
soxofaan referenced this issue in Open-EO/openeo-python-driver Aug 3, 2023
- flattening: move options to export phase iso vector cube constructor
- introduce `VectorCube.from_geodataframe` wiith support for promoting selected columns to cube values
- regardless of promotion: all properties are still associated with `VectorCube.geometries` for now (otherwise properties can not be preserved when using `aggregate_spatial`, see Open-EO/openeo-api#504)
- only promote numerical values by default for now
@jdries
Copy link
Contributor

jdries commented Aug 3, 2023

Don't have a good answer to this one. For now, we kind of implement the 'no' approach, but we do try to preserve things like the feature identifier, as this is relevant to keep track of which timeseries belongs to which geometry.

@m-mohr
Copy link
Member

m-mohr commented Sep 30, 2023

The feature identifier is not part of the (GeoJSON) properties and belongs to the "core metadata" (as it resides at the top-level). That was at least always my "mental" model, based on GeoJSON. I'd think it's probably a good idea to keep track of it anyway. Maybe we need to clarify this?

My aim in 2.0.0 was to clearly communicate whether properties are preserved or not. Maybe it's not clear enough in all processes, but at least aggregate_spatial mentioned in the geometries parameter:

Feature properties are preserved for vector data cubes and all GeoJSON Features.

load_geojson, vector_to_random_points, vector_to_regular_points and vector_buffer similarly say:

Feature properties are preserved.

@m-mohr m-mohr transferred this issue from Open-EO/openeo-api Sep 30, 2023
@soxofaan
Copy link
Member Author

The feature identifier is not part of the (GeoJSON) properties and belongs to the "core metadata" (as it resides at the top-level).

I assume you are talking here about a "id" member of a Feature object, e.g.

{
  "type": "Feature", 
  "id": "abc123", 
  "geometry": {...}, 
  "properties": {...}

While this seems to be part of the GeoJSON RFC (If a Feature has a commonly used identifier, that identifier SHOULD be included as a member of the Feature object with the name "id"), I think I've seen quite some cases in the wild where the "id" is under the "properties" instead of more at the "Feature" top level. Maybe this can be fixed by adding an option to GeoJSON loading processes to promote a given property to more standard "id".

@soxofaan
Copy link
Member Author

My aim in 2.0.0 was to clearly communicate whether properties are preserved or not. Maybe it's not clear enough in all processes, but at least aggregate_spatial mentioned in the geometries parameter:

Feature properties are preserved for vector data cubes and all GeoJSON Features.

Well, part of the problem I'm trying to raise here is that there is a conflict here regarding vector cube design:

  • the consensus has risen to "load" GeoJSON properties as vector cube values (like pixel values in a raster cube)
  • when using a vector cube as geometry in a process like aggregate_spatial, you can not preserve its original cube values because you are generating (aggregating) new cube values, which can not combined, generically, with the original cube values.

For example:

  • initial vector cube with dimensions ["geometry", "property"], where the "property" dimension has labels "population" and "cropclass"
  • you use that vector cube with aggregate_spatial on a raster cube with dimensions ["time", "bands", "x, "y"] (e.g. bands "R", "G", "B")
  • the output of aggregate_spatial is a vector cube with dimensions ["time", "bands", "geometry"]

You can not combine the original cube data ["geometry", "property"] with the aggregated cube data ["time", "bands", "geometry"] in a single cube, e.g. because the number of dimensions is different. The dimension type of "property" (type "other"?) and "bands" (type "bands") is probably also not compatible strictly speaking, but that could be adapted to relatively easy I guess.

So what I'm trying to say is this current statement in aggregate_spatial

Feature properties are preserved for vector data cubes and all GeoJSON Features.

is incompatible with the current consensus for vector cube design (store properties as cube values).

@pankajdpatil

This comment was marked as off-topic.

@m-mohr

This comment was marked as off-topic.

@soxofaan

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants