-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aggregate_spatial: crs of data and geometry mismatch undefined #499
Comments
Option 1 was the intended behavior. We should clarify this indeed. PRs are welcome. |
I think it should be a mix of 1 and 2. While one could argue that in most use cases it's practically not very relevant which one you reproject to the other, I guess the most obvious choice for backend implementations is to reproject the geometries to the raster data CRS (e.g. geometries are typically just EPSG4326, and relatively cheap to reproject to the native raster data CRS, which you want to stay in for processing efficiency). On the other hand, I think the user expectation is to get the same output geometries (in same CRS) as in the input vector cube. This might be even vital for ML application where you want to "join" the aggregation output data with target variables that are associated with the original input geometries. So: 1+2: The geometries are reprojected to the data CRS and resulting vector datacube has the CRS of the input geometries |
Thanks for the input - I agree, it makes sense to have both options! I think it is fine, to leave this decision to the individual backend implementations - but this should really be documented somewhere for new developers. Does it make sense to include this in the specification? Maybe, we could add a section to the specification, such as "note for developers" or simply "note"? |
Maybe my explanation was a bit messy, I didn't mean to have/keep both options. I meant:
That second bullet point should certainly be documented in the process spec (as it is both relevant to backend dev and end user). |
If we agree on what @soxofaan proposed, we have to make sure to reuse the input geometries without reprojecting them twice, since it could lead to differences due to floating point rounding (it just append to me!). |
Yeah, thinking more about it the important part is probably that you get the geometries as provided, i.e. in the CRS of the source geometry without changes to the coordinates etc. We should probably describe that and then also name how this is done in the background. Maybe it's actually simpler to reproject only the source data to the geometry CRS? |
The key point is indeed that the output geometries are exactly the same as the input geometries, without any (back and forth) projections or CRS changes in between. This probably needs some clarification in the docs. How the aggregations are practically calculated is more an implementation detail for the backend implementer I think and I'm not sure the typical user will care about the difference between:
Regardless, I'm not sure we should over-specify this aspect. However putting a recommendation (for option 1 I think) for the sake of reproducible results could be an option |
The process definition should also be fixed since now it's mentioned to use |
@soxofaan which CRS should we use in the output vector cube in your opinion? The raster cube projection or the vector cube projection? |
In my opinion the output vector cube should have the exactly geometry dimension as a whole (so same geometries, same order, same CRS) as the input vector cube. This is what I intended to say with:
|
aggregate_spatial: crs of data and geometry mismatch undefined
Undefined crs handling:
In aggregate_spatial, the input parameters data and geometries specify a raster datacube and a vector datacube, the output is a vector datacube.
A user might define the geometries in the wgs84 CRS, while the data coming from openeo must not necessarily be in wgs84.
There are three options to handle this:
Wouldn't it make sense to define or at least note it on the specification level?
Proposed solution:
We currently use Nr. 1, as it means using the CRS from the first input parameter.
Additional context:
The process is currently being tested on various input datasets, which differ in their CRS.
Backends normally handle the data CRS themselves, but for using this process, it can easily happen that there are two CRS options, which are both equally valid.
The text was updated successfully, but these errors were encountered: