New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hole intersection at tangential coordinate #10332
Conversation
OGC SFA 2.1.10 assertion 3 allows interior boundaries to touch exterior boundaries provided they intersect at a single point. Issue elastic#9511 provides an example where a valid shape is incorrectly interpreted as invalid (a false violation of assertion 3). When the intersecting point appears as the first and last coordinate of the interior boundary in a polygon, the ShapeBuilder incorrectly counted this as multiple intersecting vertices. The fix required a little more than just a logic check. Passing the duplicate vertices resulted in a connected component in the edge graph causing an invalid self crossing polygon. This required additional logic to the edge assignment in order to correctly segment the connected components. Finally, an additional hole validation has been added along with proper unit tests for testing valid and invalid conditions (including dateline crossing polys). closes elastic#9511
* Validates only 1 vertex is tangential (shared) between the interior and exterior of a polygon | ||
*/ | ||
protected void validateHole(BaseLineStringBuilder shell, BaseLineStringBuilder hole) { | ||
ArrayList exterior = Lists.newArrayList(Sets.newHashSet(shell.points)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the intermediate hash set? The hash set will have random iteration order right? So then the retainAll below becomes O(N^2)? Why not just copy the array lists, sort, and then call retainAll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like retainAll always uses contains (I was thinking it could be smarter with a sorted list, but of course it has no way to know it is sorted!). So I would copy, sort, then just iterate both in step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed HashSet inherits retainAll from AbstractCollection. So I eliminated the intermediate ArrayList and used it directly to avoid both unnecessary memory use and having to sort.
Thanks for the feedback @rjernst. You're pretty busy with field mappings so maybe @colings86 has some feedback? |
@colings86 I assigned you here |
The PR review process caught an infinite loop for polygons containing duplicate sequential coordinates (excluding start and end points). This created a self loop in the edge graph that caused an infinite loop during traversal. One proposed solution was to check for duplicate points, ignore, and proceed. This violates the OGC SFA definition of a Simple Curve (the base of a LinearRing). Per 6.1.6.1 in v.1.2.1, "A Curve is simple if it does not pass through the same point twice with the possible exception of the two end points...A curve that is simple and closed is a Ring". Keeping true to "fail early", this commit catches self-loops during creation of the edge and throws an IllegalShapeException. It complys with the OGC spec and avoids any further processing if an invalid ring is provided. One potential downside is the additional rigor it adds to loosely defined GeoJSON.
Many thanks for the catch @clement-tourriere! I suppose this is a good time for us to re-enable our random shape testing. I've added a commit that catches self-loops during edge creation and throws an InvalidShapeException if found. This does add more rigor to the shape validation, but it also complies with OGC SFA 6.1.6.1 while staying true to the "fail early" design. |
This kind of validation could be problematic. JTS library considers these kind of shapes as valid. So we can't easily know, before sending shapes to ES cluster, which ones will be valid. I really think that ignoring duplicate sequential coordinates (excluding first and last) in ring could be a better solution. |
You raise an interesting point re: pre-validation. JTS violates other OGC specs (e.g., orientation) and handles dateline/pole wrapping differently, so I wouldn't necessarily suggest it as ground truth for pre-validation. Many of the JTS issues will be refactored and corrected. In the meantime you can pre-validate using the ES java API prior to sending it to the cluster. Its an extra step but more cost effective than validating with JTS - and you'll get ground truth w.r.t ES expectations. |
HashSet interior = Sets.newHashSet(hole.points); | ||
exterior.retainAll(interior); | ||
if (exterior.size() >= 2) { | ||
throw new InvalidShapeException("Invalid polygon, interior cannot share more than one point"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be clearer as "Invalid polygon, interior cannot share more than one point with the exterior"?
// found a closed loop - we have two connected components so we need to slice into two distinct components | ||
if (visitedEdge.containsKey(current.coordinate)) { | ||
if (connectedComponents > 0 && current.next != edge) { | ||
throw new InvalidShapeException("Shape contains more than one tangential point"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this message a bit more user friendly? I'm not sure may users will know what a tangential point is
@nknize left one more error message comment but otherwise LGTM |
merged |
OGC SFA 2.1.10 assertion 3 (https://portal.opengeospatial.org/files/?artifact_id=829 ) allows interior boundaries to touch exterior boundaries provided they intersect at a single point. Issue #9511 provides an example where a valid shape is incorrectly interpreted as invalid (a false violation of assertion 3). When the intersecting point appears as the first and last coordinate of the interior boundary in a polygon, the ShapeBuilder incorrectly counted this as multiple intersecting vertices.
The fix required a little more than just a logic check. Passing the duplicate vertices resulted in a connected component in the edge graph causing a false invalid self crossing polygon. This required additional logic to the edge assignment in order to correctly segment the connected components. Finally, an additional hole validation has been added along with proper unit tests for testing valid and invalid conditions (including dateline crossing polys).
closes #9511