aggregate_spatial: crs of data and geometry mismatch undefined #499

ValentinaHutter · 2024-03-25T09:45:30Z

aggregate_spatial: crs of data and geometry mismatch undefined

Undefined crs handling:
In aggregate_spatial, the input parameters data and geometries specify a raster datacube and a vector datacube, the output is a vector datacube.
A user might define the geometries in the wgs84 CRS, while the data coming from openeo must not necessarily be in wgs84.
There are three options to handle this:

The geometries get reprojected to the data CRS and the resulting vector datacube has the CRS of the input data.
The data gets reprojected to the geometries CRS and the resulting vector datacube has the CRS of the input geometries.
An error is thrown, as the user should reproject data or geometries before.

Wouldn't it make sense to define or at least note it on the specification level?

Proposed solution:
We currently use Nr. 1, as it means using the CRS from the first input parameter.

Additional context:
The process is currently being tested on various input datasets, which differ in their CRS.

Backends normally handle the data CRS themselves, but for using this process, it can easily happen that there are two CRS options, which are both equally valid.

m-mohr · 2024-03-31T10:02:58Z

Option 1 was the intended behavior. We should clarify this indeed. PRs are welcome.

soxofaan · 2024-04-05T10:55:15Z

The geometries get reprojected to the data CRS and the resulting vector datacube has the CRS of the input data.

The data gets reprojected to the geometries CRS and the resulting vector datacube has the CRS of the input geometries.

I think it should be a mix of 1 and 2.

While one could argue that in most use cases it's practically not very relevant which one you reproject to the other, I guess the most obvious choice for backend implementations is to reproject the geometries to the raster data CRS (e.g. geometries are typically just EPSG4326, and relatively cheap to reproject to the native raster data CRS, which you want to stay in for processing efficiency).

On the other hand, I think the user expectation is to get the same output geometries (in same CRS) as in the input vector cube. This might be even vital for ML application where you want to "join" the aggregation output data with target variables that are associated with the original input geometries.

So:

1+2: The geometries are reprojected to the data CRS and resulting vector datacube has the CRS of the input geometries

ValentinaHutter · 2024-04-05T13:03:30Z

Thanks for the input - I agree, it makes sense to have both options! I think it is fine, to leave this decision to the individual backend implementations - but this should really be documented somewhere for new developers. Does it make sense to include this in the specification? Maybe, we could add a section to the specification, such as "note for developers" or simply "note"?

soxofaan · 2024-04-05T13:58:19Z

Maybe my explanation was a bit messy, I didn't mean to have/keep both options. I meant:

for the aggregation itself (mainly relevant for backend developer): project geometries to the raster data (first part of option 1)
for the resulting output (relevant to end user too): use the input geometry CRS (seconds part of option 2)

That second bullet point should certainly be documented in the process spec (as it is both relevant to backend dev and end user).
The first bullet point could be a recommendation, but I'm not sure it is a vital part to be documented

clausmichele · 2024-07-10T12:10:56Z

If we agree on what @soxofaan proposed, we have to make sure to reuse the input geometries without reprojecting them twice, since it could lead to differences due to floating point rounding (it just append to me!).

m-mohr · 2024-07-10T12:53:35Z

Yeah, thinking more about it the important part is probably that you get the geometries as provided, i.e. in the CRS of the source geometry without changes to the coordinates etc.

We should probably describe that and then also name how this is done in the background. Maybe it's actually simpler to reproject only the source data to the geometry CRS?

soxofaan · 2024-07-16T07:57:42Z

The key point is indeed that the output geometries are exactly the same as the input geometries, without any (back and forth) projections or CRS changes in between. This probably needs some clarification in the docs.

How the aggregations are practically calculated is more an implementation detail for the backend implementer I think and I'm not sure the typical user will care about the difference between:

reproject the geometries to the raster data CRS and aggregate.
- This is probably the most straightforward and cheapest approach in most, if not all, practical use cases.
- This is the approach we currently take in the geopyspark backend FYI.
reproject the raster data to the geometry CRS and aggregate.
- maybe there are reasons this is scientifically/mathematically more correct?
- but what if there is no reasonable (implied) spatial resolution associated with the geometry CRS? If your geometry is in a default EPSG 4326, you don't want to reproject your 10 meter raster data pixels to 1 degree pixels, right?

Regardless, I'm not sure we should over-specify this aspect. However putting a recommendation (for option 1 I think) for the sake of reproducible results could be an option

clausmichele · 2024-07-16T12:45:54Z

The process definition should also be fixed since now it's mentioned to use dimension of type geometries, but from the parameter description below and from the official API docs it should be geometry instead:
https://processes.openeo.org/#aggregate_spatial
https://api.openeo.org/#tag/EO-Data-Discovery/operation/describe-collection

clausmichele · 2024-07-23T10:15:02Z

1. reproject the geometries to the raster data CRS and aggregate.
   
   * This is probably the most straightforward and cheapest approach in most, if not all, practical use cases.
   * This is the approach we currently take in the geopyspark backend FYI.

@soxofaan which CRS should we use in the output vector cube in your opinion? The raster cube projection or the vector cube projection?

soxofaan · 2024-07-25T18:00:37Z

In my opinion the output vector cube should have the exactly geometry dimension as a whole (so same geometries, same order, same CRS) as the input vector cube.

This is what I intended to say with:

The key point is indeed that the output geometries are exactly the same as the input geometries, without any (back and forth) projections or CRS changes in between. This probably needs some clarification in the docs.

ValentinaHutter added bug patch labels Mar 25, 2024

soxofaan mentioned this issue Jul 10, 2024

aggregate_spatial: vector cube with different CRS than raster cube #516

Closed

clausmichele mentioned this issue Jul 10, 2024

aggregate_spatial improvements EOEPCA/openeo-processes-dask#1

Open

jzvolensky mentioned this issue Sep 3, 2024

Update aggregate_spatial process implementation Open-EO/openeo-processes-dask#268

Draft

m-mohr added this to the 2.0.0 milestone Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aggregate_spatial: crs of data and geometry mismatch undefined #499

aggregate_spatial: crs of data and geometry mismatch undefined #499

ValentinaHutter commented Mar 25, 2024

m-mohr commented Mar 31, 2024

soxofaan commented Apr 5, 2024

ValentinaHutter commented Apr 5, 2024

soxofaan commented Apr 5, 2024 •

edited

Loading

clausmichele commented Jul 10, 2024

m-mohr commented Jul 10, 2024

soxofaan commented Jul 16, 2024 •

edited

Loading

clausmichele commented Jul 16, 2024

clausmichele commented Jul 23, 2024

soxofaan commented Jul 25, 2024

aggregate_spatial: crs of data and geometry mismatch undefined #499

aggregate_spatial: crs of data and geometry mismatch undefined #499

Comments

ValentinaHutter commented Mar 25, 2024

m-mohr commented Mar 31, 2024

soxofaan commented Apr 5, 2024

ValentinaHutter commented Apr 5, 2024

soxofaan commented Apr 5, 2024 • edited Loading

clausmichele commented Jul 10, 2024

m-mohr commented Jul 10, 2024

soxofaan commented Jul 16, 2024 • edited Loading

clausmichele commented Jul 16, 2024

clausmichele commented Jul 23, 2024

soxofaan commented Jul 25, 2024

soxofaan commented Apr 5, 2024 •

edited

Loading

soxofaan commented Jul 16, 2024 •

edited

Loading