Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apply_polygon #287 #298

Merged
merged 8 commits into from
Mar 15, 2023
Merged

Add apply_polygon #287 #298

merged 8 commits into from
Mar 15, 2023

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Nov 10, 2021

A proposal to discuss for issue #287.

(Not needed in deliverable)

@m-mohr m-mohr marked this pull request as ready for review December 22, 2021 10:46
proposals/chunk_polygon.json Outdated Show resolved Hide resolved
@soxofaan
Copy link
Member

soxofaan commented Apr 6, 2022

FYI @JeroenVerstraelen added the mask_value argument for chunk_polygon in VITO backend implementation (Open-EO/openeo-python-driver@f7db07b) and python client (Open-EO/openeo-python-client@f439672) which is not included in this PR yet

@m-mohr
Copy link
Member Author

m-mohr commented Apr 6, 2022

Sad to see that there was no communication in the PR to further discuss and add this to this PR during the implementation phase. Now, I can't guarantee that the definition here will be exactly what you expose and we may diverge.

@JeroenVerstraelen
Copy link

The mask_value parameter was requested by the internal VITO member who asked for this process.

The issue was that a UDF receives data for a rectangular area, but geometries can have any shape. So any pixels not in the geometry had to be masked out with a value provided by the user.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 26, 2022

This parameter will likely not become part of the spec though as this usecase is already covered by mask_polygon. I'd suggest using that instead of a proprietary addition. @JeroenVerstraelen

@JeroenVerstraelen
Copy link

@m-mohr There is an issue with mask_polygon when you have e.g. two polygons that lie close to each other. The UDF will run separately for each polygon, but the extent for polygon 1 can also contain some pixels from polygon 2.

The mask_value parameter solves this by masking out polygon 1 when running the UDF on polygon 1 and the same for polygon 2. That way the user is sure that the data in their UDF represents only one polygon.

This value defaults to NoData, but this can be problematic when the user wants to calculate e.g. the cloud percentage of a polygon. For those purposes the user can provide their own mask value.

@m-mohr
Copy link
Member Author

m-mohr commented Aug 4, 2022

I've just added a definition of mask_value to chunk_polygon. I've aligned it with existing specifications, so it's not a 1:1 copy of your implementation, but should be compatible (except for the order or the parameters, I think). Please review it.

@soxofaan
Copy link
Member

soxofaan commented Jan 6, 2023

It's probably a bit late to raise this,
but while working on documentation it appeared to me that the name "chunk_polygon" does not align well with comparable processes like apply, apply_dimension, apply_neighborhood, reduce_dimension. ... which contain a verb like "apply" or "reduce" that make it self-documenting.
I think chunk_polygon belongs to the "apply" family, but that is not obvious: usage-wise the process does not really "chunk" your data (in the sense that it would return an array of chunks or something like that); it "applies" a child process to chunks (internally), but still returns a single data cube.
Is there still opportunity to rename to, e.g. "apply_chunks" , "apply_chunked" or "apply_polygon"?

@m-mohr
Copy link
Member Author

m-mohr commented Jan 6, 2023

@soxofaan I think you need to raise this internally, because I think only VITO is implementing it.

@soxofaan
Copy link
Member

soxofaan commented Jan 9, 2023

indeed, I should have cc'ed @jdries .
Still, as it's a naming/allignment thing, it doesn't hurt to discuss this publicly too.

@m-mohr
Copy link
Member Author

m-mohr commented Jan 9, 2023

Sure, but what I meant is whether it's fine for you too change at all. It VITO says you don't want to change for backward compatibility, it doesn't make a lot of sense to discuss it here, too.

Edit: I just realized we've never merged this....

@jdries
Copy link
Contributor

jdries commented Jan 9, 2023

The name can still change, we know most of the cases and persons that use this, and can always keep an alias for backwards compatibility.
Note that the process is quite popular, it's used by:

  • electrical conductivity service
  • water reservoir mappoing
  • geo informed

@m-mohr
Copy link
Member Author

m-mohr commented Jan 9, 2023

Okay, sounds good.

Any preferences? Some thoughts:

  • This process is either a variant of apply or mask.
  • apply_chunked and similarly generic names don't reflect the spatial nature. How would we call a chunking process over e.g. the temporal dimension?
  • apply_polygon would be a short variant that hopefully implies chunking (my current favourite, but that may change ;-) )
  • apply_chunks_polygon could be a bit more clear, but it a mouthful...

@soxofaan
Copy link
Member

I also think apply_polygon is the best candidate. It aligns very well with apply_neighborhood in terms of the chunking aspect

@m-mohr m-mohr added this to the 2.0.0 milestone Feb 1, 2023
@jdries
Copy link
Contributor

jdries commented Feb 10, 2023

Looks good, is there a need to say something about non-spatial dimensions? Do we need to mention explicitly that chunking happens along the x,y dimensions, and all others are passed on fully into the 'chunk'?

@jdries
Copy link
Contributor

jdries commented Feb 14, 2023

@JeroenVerstraelen can you add your review?

@m-mohr
Copy link
Member Author

m-mohr commented Feb 14, 2023

@jdries That's probably a good clarificaton to add. I'll work on it later.

Another question that is still open is #298 (comment) @JeroenVerstraelen

Copy link
Contributor

@LukeWeidenwalker LukeWeidenwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nitpick and one general comment:
Is there anything this process should say about what happens with overlapping geometries, or is handling of this entirely the business of backends? I guess since there's no focal aspect to this process, the results will be the same regardless of whether geometries overlap, and it's thus only a potential performance optimisation?

proposals/apply_polygon.json Outdated Show resolved Hide resolved
proposals/apply_polygon.json Outdated Show resolved Hide resolved
proposals/apply_polygon.json Outdated Show resolved Hide resolved
proposals/apply_polygon.json Show resolved Hide resolved
@m-mohr
Copy link
Member Author

m-mohr commented Mar 9, 2023

I've updated the descriptions of the process according to the comments and also added vector data cube support. Please let me know what you think, I hope we can approve this soon.

Copy link

@JeroenVerstraelen JeroenVerstraelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@m-mohr
Copy link
Member Author

m-mohr commented Mar 10, 2023

@LukeWeidenwalker @clausmichele @dthiex I'd appreciate your final reviews on this PR.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 10, 2023

I'll merge this EOB on Tuesday if no other comments come in.

@LukeWeidenwalker
Copy link
Contributor

One question still open from me: What's the expected behaviour for pixels that intersect multiple of the provided geometries?

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2023

@LukeWeidenwalker Hmm... good question. Open to suggestions. As it is hard to actually get reproducible results if you run over an area multiple times, should we throw an error if overlapping geometries are provided?

@LukeWeidenwalker
Copy link
Contributor

@LukeWeidenwalker Hmm... good question. Open to suggestions. As it is hard to actually get reproducible results if you run over an area multiple times, should we throw an error if overlapping geometries are provided?

I think throwing an error is the best option here, doing this multiple times over an area just seems ill-defined to me!

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2023

Okay, I've added an exception.

@jdries
Copy link
Contributor

jdries commented Mar 14, 2023

Okay, I've added an exception.

If polygons overlap, I believe the function is still invoked once per polygon and pixels are masked according to the polygon that is being handled. Not sure if this is ill defined and requires an exception? @JeroenVerstraelen

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2023

I assumed @JeroenVerstraelen thumbs up to Lukas' comment meant agreement.

@JeroenVerstraelen
Copy link

I assumed @JeroenVerstraelen thumbs up to Lukas' comment meant agreement.

Yes, at VITO we agree that an exception is best. There are some edge cases where you have parcels that touch each other, small coordinate errors can then cause them to slightly overlap. But in general you do not want the polygons to overlap because their output will be ill-defined.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 15, 2023

Great, thanks. So I'll merge this now. If there's further room for improvement, please open a new issue and/or PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply_polygon: callback on pixels inside a polygon
5 participants