Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersection gives wrong result #1216

Closed
theroggy opened this issue Nov 3, 2021 · 17 comments
Closed

Intersection gives wrong result #1216

theroggy opened this issue Nov 3, 2021 · 17 comments

Comments

@theroggy
Copy link
Contributor

theroggy commented Nov 3, 2021

Expected behavior and actual behavior.

I ran into a case where an intersection between two (valid) (multi)polygons results in a wrong result. One example results in an "invalid" wrong result, another one results in a "valid" wrong result.

Probably the problem is in geos... but because my test case is using shapely I thought it was more logical to post it here.

Polygon 1 in the example below is the culprit:

image

Steps to reproduce the problem.

import matplotlib.pyplot as plt
import geopandas as gpd
import shapely
import shapely.wkt

# Calculate intersection
print(shapely.geos.geos_version_string)
wkt1_str = "MultiPolygon (((66697.40120137332996819 185279.95469107336248271, 66698.375 185273.625, 66697.375 185280.125, 66697.40120137332996819 185279.95469107336248271)),((66705.875 185229.625, 66704.875 185234.625, 66703.875 185240.625, 66702.875 185245.375, 66706.90145934304746334 185229.45392344283754937, 66710.375 185228.875, 66705.875 185229.625)))"
wkt2_str = "MultiPolygon (((66720 185280, 66720 185200, 66680 185200, 66680 185280, 66720 185280)))"
wkt3_str = "MultiPolygon (((66720 185320, 66720 185280, 66680 185280, 66680 185320, 66720 185320)))"
poly1 = shapely.wkt.loads(wkt1_str)
print(f"poly1.is_valid: {poly1.is_valid}")
poly2 = shapely.wkt.loads(wkt2_str)
print(f"poly2.is_valid: {poly2.is_valid}")
poly3 = shapely.wkt.loads(wkt3_str)
print(f"poly3.is_valid: {poly3.is_valid}")
poly1_intersect_poly2 = poly1.intersection(poly2)
print(f"poly1_intersect_poly2.is_valid: {poly1_intersect_poly2.is_valid}")
poly1_intersect_poly3 = poly1.intersection(poly3)
print(f"poly1_intersect_poly3.is_valid: {poly1_intersect_poly3.is_valid}")

# Plot
print(poly1_intersect_poly2.wkt)
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True)
gpd.GeoSeries([poly1]).boundary.plot(ax=ax[0][0], edgecolor='blue')
ax[0][0].set_title('polygon 1')
ax[0][0].ticklabel_format(useOffset=False)
ax[0][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly2]).boundary.plot(ax=ax[0][1], edgecolor='green')
ax[0][1].set_title('polygon 2')
ax[0][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly2]).boundary.plot(ax=ax[0][2], edgecolor='red')
ax[0][2].set_title('intersection poly1 + poly2')
ax[0][2].tick_params(axis='x', rotation=45)

gpd.GeoSeries([poly1]).boundary.plot(ax=ax[1][0], edgecolor='blue')
ax[1][0].set_title('polygon 1')
ax[1][0].ticklabel_format(useOffset=False)
ax[1][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly3]).boundary.plot(ax=ax[1][1], edgecolor='green')
ax[1][1].set_title('polygon 3')
ax[1][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly3]).boundary.plot(ax=ax[1][2], edgecolor='red')
ax[1][2].set_title('intersection poly1 + poly3')
ax[1][2].tick_params(axis='x', rotation=45)
plt.show()

Operating system

Windows 10

Shapely version and provenance

shapely 1.8.0 from conda-forge
geos 3.10.0 from conda-forge

@sgillies
Copy link
Contributor

sgillies commented Nov 3, 2021

Thanks for the report @theroggy. Nothing to be done about this in Shapely, but it will be good to track it.

@theroggy
Copy link
Contributor Author

theroggy commented Nov 3, 2021

OK. I logged an issue in the GEOS issue tracker as well: https://trac.osgeo.org/geos/ticket/1144

FYI: I was also able to reproduce it in the Java version (JTS) using JTS TestBuilder and it seems to have been beeen introduced in OverlayNG.

@dr-jts
Copy link

dr-jts commented Nov 3, 2021

Yup. this is a nasty little robustness problem alright. I will open an issue in JTS to track this (and hopefully a fix).

@theroggy
Copy link
Contributor Author

theroggy commented Nov 3, 2021

@dr-jts I have some more examples, but I suppose chances are reasonable that it boils down to the same issue? If it is useful I can chase some more down, but it is always some searching to find the culprit triggering the problem.

This is another example:
image

And the code to reproduce:

import matplotlib.pyplot as plt
import geopandas as gpd
import shapely
import shapely.wkt

# Calculate intersection
print(shapely.geos.geos_version_string)
wkt1_str = "MultiPolygon (((61607.62679190188646317 190194.15554780140519142, 61620.53899189829826355 190197.49904780089855194, 61619.64380966498720227 190197.2672482781636063, 61607.62679190188646317 190194.15554780140519142)))"
wkt2_str = "MultiPolygon (((61597.29377387490967521 190196.3718038662627805, 61596.98839818270789692 190204.92232324797078036, 61627.6855975429789396 190206.04539151725475676, 61628.15039204242202686 190200.46785752387950197, 61626.73510000109672546 190200.2800000011920929, 61626.49939999729394913 190200.24870000034570694, 61620.04309999942779541 190199.41420000046491623, 61620.41290000081062317 190197.98570000007748604, 61620.53899999707937241 190197.49890000000596046, 61607.62680000066757202 190194.15549999848008156, 61607.07620000094175339 190196.27259999886155128, 61606.72070000320672989 190197.6396000012755394, 61605.02780000120401382 190197.40810000151395798, 61602.65460000187158585 190197.08339999988675117, 61597.29377387490967521 190196.3718038662627805)))"
wkt3_str = "MultiPolygon (((61628.4118790013162652 190200.50256576138781384, 61628.4118790013162652 190189.34462376977899112, 61597.14578879964392399 190189.34462376977899112, 61597.29377387490967521 190196.3718038662627805, 61602.65460000187158585 190197.08339999988675117, 61605.02780000120401382 190197.40810000151395798, 61606.72070000320672989 190197.6396000012755394, 61607.07620000094175339 190196.27259999886155128, 61607.62680000066757202 190194.15549999848008156, 61620.53899999707937241 190197.49890000000596046, 61620.41290000081062317 190197.98570000007748604, 61620.04309999942779541 190199.41420000046491623, 61626.49939999729394913 190200.24870000034570694, 61626.73510000109672546 190200.2800000011920929, 61628.4118790013162652 190200.50256576138781384)))"
poly1 = shapely.wkt.loads(wkt1_str)
print(f"poly1.is_valid: {poly1.is_valid}")
poly2 = shapely.wkt.loads(wkt2_str)
print(f"poly2.is_valid: {poly2.is_valid}")
poly3 = shapely.wkt.loads(wkt3_str)
print(f"poly3.is_valid: {poly3.is_valid}")
poly1_intersect_poly2 = poly1.intersection(poly2)
print(f"poly1_intersect_poly2.is_valid: {poly1_intersect_poly2.is_valid}")
poly1_intersect_poly3 = poly1.intersection(poly3)
print(f"poly1_intersect_poly3.is_valid: {poly1_intersect_poly3.is_valid}")

# Plot
print(poly1_intersect_poly2.wkt)
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True)
gpd.GeoSeries([poly1]).boundary.plot(ax=ax[0][0], edgecolor='blue')
ax[0][0].set_title('polygon 1')
ax[0][0].ticklabel_format(useOffset=False)
ax[0][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly2]).boundary.plot(ax=ax[0][1], edgecolor='green')
ax[0][1].set_title('polygon 2')
ax[0][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly2]).boundary.plot(ax=ax[0][2], edgecolor='red')
ax[0][2].set_title('intersection poly1 + poly2')
ax[0][2].tick_params(axis='x', rotation=45)

gpd.GeoSeries([poly1]).boundary.plot(ax=ax[1][0], edgecolor='blue')
ax[1][0].set_title('polygon 1')
ax[1][0].ticklabel_format(useOffset=False)
ax[1][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly3]).boundary.plot(ax=ax[1][1], edgecolor='green')
ax[1][1].set_title('polygon 3')
ax[1][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly3]).boundary.plot(ax=ax[1][2], edgecolor='red')
ax[1][2].set_title('intersection poly1 + poly3')
ax[1][2].tick_params(axis='x', rotation=45)
plt.show()

@dr-jts
Copy link

dr-jts commented Nov 3, 2021

@theroggy thanks, always good to have more than one example to verify any potential fix.

Indeed, the cause looks like the same thing - a very narrow triangle causing a robustness problem. It may be that this problem only occurs with narrow triangles (as opposed to say larger polygons with narrow spikes). If so, this is a much simpler case to check for, which is good.

if you are looking for a workaround, you could detect the narrow triangles and take some evasive action (what that might be depends on your use case).

@theroggy
Copy link
Contributor Author

theroggy commented Nov 3, 2021

@dr-jts final example for now:

image

Code:

import matplotlib.pyplot as plt
import geopandas as gpd
import shapely
import shapely.wkt

# Calculate intersection
print(shapely.geos.geos_version_string)
wkt1_str = "MultiPolygon (((181093.43788657014374621 172099.7837627058615908, 181093.375 172099.375, 181093.57688359552412294 172100.68724337089224719, 181093.43788657014374621 172099.7837627058615908)))"
wkt2_str = "MultiPolygon (((181072.91911515005631372 172110.76716212875908241, 181118.78476615424733609 172110.76716212875908241, 181118.78476615424733609 172087.40163114664028399, 181114.16759999841451645 172089.59329999983310699, 181113.89500000327825546 172089.72269999980926514, 181097.49019999802112579 172097.50959999859333038, 181081.35400000214576721 172105.30889999866485596, 181073.53490000218153 172103.68899999931454659, 181073.24379999935626984 172107.0348999984562397, 181072.91911515005631372 172110.76716212875908241)))"
wkt3_str = "MultiPolygon (((181123.48368334578117356 172085.17115684598684311, 181073.50145793828414753 172085.17115684598684311, 181073.52319999784231186 172097.18879999965429306, 181073.53490000218153 172103.68899999931454659, 181081.35400000214576721 172105.30889999866485596, 181097.49019999802112579 172097.50959999859333038, 181113.89500000327825546 172089.72269999980926514, 181114.16759999841451645 172089.59329999983310699, 181123.48368334578117356 172085.17115684598684311)))"
poly1 = shapely.wkt.loads(wkt1_str)
print(f"poly1.is_valid: {poly1.is_valid}")
poly2 = shapely.wkt.loads(wkt2_str)
print(f"poly2.is_valid: {poly2.is_valid}")
poly3 = shapely.wkt.loads(wkt3_str)
print(f"poly3.is_valid: {poly3.is_valid}")
poly1_intersect_poly2 = poly1.intersection(poly2)
print(f"poly1_intersect_poly2.is_valid: {poly1_intersect_poly2.is_valid}")
poly1_intersect_poly3 = poly1.intersection(poly3)
print(f"poly1_intersect_poly3.is_valid: {poly1_intersect_poly3.is_valid}")

# Plot
print(poly1_intersect_poly2.wkt)
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True)
gpd.GeoSeries([poly1]).boundary.plot(ax=ax[0][0], edgecolor='blue')
ax[0][0].set_title('polygon 1')
ax[0][0].ticklabel_format(useOffset=False)
ax[0][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly2]).boundary.plot(ax=ax[0][1], edgecolor='green')
ax[0][1].set_title('polygon 2')
ax[0][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly2]).boundary.plot(ax=ax[0][2], edgecolor='red')
ax[0][2].set_title('intersection poly1 + poly2')
ax[0][2].tick_params(axis='x', rotation=45)

gpd.GeoSeries([poly1]).boundary.plot(ax=ax[1][0], edgecolor='blue')
ax[1][0].set_title('polygon 1')
ax[1][0].ticklabel_format(useOffset=False)
ax[1][0].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly3]).boundary.plot(ax=ax[1][1], edgecolor='green')
ax[1][1].set_title('polygon 3')
ax[1][1].tick_params(axis='x', rotation=45)
gpd.GeoSeries([poly1_intersect_poly3]).boundary.plot(ax=ax[1][2], edgecolor='red')
ax[1][2].set_title('intersection poly1 + poly3')
ax[1][2].tick_params(axis='x', rotation=45)
plt.show()

@theroggy
Copy link
Contributor Author

theroggy commented Nov 3, 2021

@dr-jts for a workaround: yes, the pattern seems to be quite specific, so indeed no issue to make one. But the current case I'm working on is only calculating statistics and the problem isn't frequent enough to give an impact on those.

I only noticed it because I was looking into another issue: my (windows) PC crashes (reboots) when calculating the intersections between some (largish) files. When i noticed the invalid polygons created by this issue I was hopeful they were the trigger for the reboot, but they are not :-(. I found a workaround for it but if you hear something similar from someone else, don't hesitate ;-).

Anyway, hopefully a fix will come though because for many other scripts it would be more of a problem if it would occur and it would be quite a pain to put a workaround everywhere, as it feels like the issue isn't quite a one-off...

@dr-jts
Copy link

dr-jts commented Nov 4, 2021

Thanks for the third test case. It shows the same pattern - a narrow triangle where the noding causes the apex vertex to cross the noded hypotenuse, thus inverting the triangle topology.

Agreed, a fix is important, since this could occur at any time (in data of this pattern), and might cause more of a problem in other situations.

@dr-jts
Copy link

dr-jts commented Dec 4, 2021

This is fixed in JTS-812 by locationtech/jts#812. This will get ported to GEOS soon.

@diabolo-dan
Copy link

I'm also seeing what I believe is incorrect intersection behaviour. Not sure whether it's the same underlying issue (or if I'm misunderstanding the intended behaviour.

Running the code:

reference_line = LineString(
    [
        (1.0, 0.0),
        (-32.0, 5.820766091346741e-11),
        (2.384185791015633e-07, 0.0),
    ]
)
target = LineString([(0.0, 0.0), (1.0, 995187.0)])
buffered_target = target.buffer(1e-6)
intersection_points = reference_line.intersection(buffered_target)

print("inputs:")
print(f"{reference_line}")
print(f"{target}")

print("buffered target:")
print(f"{buffered_target}")
print("intersections:")
for p in intersection_points.geoms:
    print(f"{p}")

Gives the output:


inputs:
LINESTRING (1 0, -32 0.0000000000582077, 0.0000002384185791 0)
LINESTRING (0 0, 1 995187)
buffered target:
POLYGON ((0.999999 995187, 0.9999990048153719 995187.000000098, 0.9999990192149156 995187.0000001951, 0.9999990430599559 995187.0000002903, 0.999999076120852 995187.0000003827, 0.9999991180792094 995187.0000004714, 0.999999168530946 995187.0000005555, 0.9999992269901841 995187.0000006343, 0.9999992928939293 995187.0000007071, 0.9999993656074926 995187.000000773, 0.9999994444306025 995187.0000008314, 0.9999995286041493 995187.000000882, 0.999999617317496 995187.0000009239, 0.9999997097162843 995187.0000009569, 0.9999998049106635 995187.0000009808, 0.9999999019838597 995187.0000009952, 1.0000000000010048 995187.000001, 1.0000000980181403 995187.0000009952, 1.0000001950913076 995187.0000009808, 1.0000002902856389 995187.0000009569, 1.0000003826843606 995187.0000009239, 1.000000471397623 995187.000000882, 1.0000005555710685 995187.0000008314, 1.0000006343940608 995187.000000773, 1.0000007071074917 995187.0000007071, 1.000000773011091 995187.0000006343, 1.0000008314701705 995187.0000005555, 1.000000881921738 995187.0000004714, 1.000000923879917 995187.0000003827, 1.0000009569406274 995187.0000002903, 1.0000009807854764 995187.0000001951, 1.0000009951848252 995187.000000098, 1.000001 995187, 0.000001 -0.0000000000010048, 0.0000009951846282 -0.0000000980181403, 0.0000009807850844 -0.0000001950913075, 0.000000956940044 -0.0000002902856388, 0.000000923879148 -0.0000003826843607, 0.0000008819207907 -0.000000471397623, 0.000000831469054 -0.0000005555710685, 0.0000007730098159 -0.0000006343940609, 0.0000007071060707 -0.0000007071074917, 0.0000006343925074 -0.0000007730110908, 0.0000005555693975 -0.0000008314701706, 0.0000004713958506 -0.000000881921738, 0.000000382682504 -0.000000923879917, 0.0000002902837157 -0.0000009569406274, 0.0000001950893365 -0.0000009807854764, 0.0000000980161403 -0.0000009951848252, -0.0000000000010048 -0.000001, -0.0000000980181403 -0.0000009951846282, -0.0000001950913075 -0.0000009807850844, -0.0000002902856388 -0.000000956940044, -0.0000003826843607 -0.000000923879148, -0.000000471397623 -0.0000008819207907, -0.0000005555710685 -0.000000831469054, -0.0000006343940609 -0.0000007730098159, -0.0000007071074917 -0.0000007071060707, -0.0000007730110908 -0.0000006343925074, -0.0000008314701706 -0.0000005555693975, -0.000000881921738 -0.0000004713958506, -0.000000923879917 -0.000000382682504, -0.0000009569406274 -0.0000002902837157, -0.0000009807854764 -0.0000001950893365, -0.0000009951848252 -0.0000000980161403, -0.000001 0.0000000000010048, 0.999999 995187))
intersections:
LINESTRING (1 0, 0.0000002384185791 0)
LINESTRING (0.0000002384185791 0, 0.0000000980161403 -0.0000009951848252)
LINESTRING (0.0000000980161403 -0.0000009951848252, -0.0000008314701706 -0.0000005555693975)

The first of those intersections (LINESTRING (1 0, 0.0000002384185791 0) starts at (1,0), which is not especially near the target line (LINESTRING (0 0, 1 995187)).

@dr-jts
Copy link

dr-jts commented Jun 30, 2022

That's a different issue. The result is probably due to the snapping heuristics used to produce a robust intersection result. In this the geometry tolerances are so small (as you would expect with a buffer of 1e-6) that the snapping distorts the results on the same scale as the input.

The overlay algorithm in JTS/GEOS is intended to produce reasonable results for well-conditioned input, and to provide a "best-effort" approximation in less tractable situations. It's functioning as designed in this case, so I call that a win. :)

Is this a real-world situation? If so, is there any way to condition the input to be more suitable for processing?

@diabolo-dan
Copy link

I see. Is there some documentation of the snapping behaviour? (Or the intersection behaviour more generally, the only thing I've found is "Returns a representation of the intersection of this object with the other geometric object." which has no reference to snapping or tolerances). I appreciate that tolerances are necessary, but I wouldn't expect them to be at the scale of 1. I also don't understand why a small tolerance would lead to a larger distortion from snapping? I'd naïvely expect the opposite.

This seems like it's getting off-track with the issue though. What would be the best place to pose these questions? Open a new issue?

For reference, the 1e-6 buffer was introduced because lines that were coincident were producing large numbers of intersections, presumably because of lack of tolerance. That was having a large impact on performance for a common usage. The 1e-6 buffer resolves that within acceptable tolerances most of the time, but this example demonstrates not always.

@dr-jts
Copy link

dr-jts commented Jul 1, 2022

The only documentation is the code. It's fairly readable.

I agree that the computed result is anomalous. I suspect it's a result of the snapping tolerance being close to the resolution of the input polygon, and the arrangement of the inputs being such that this creates a distorted topology internally.

Developing the current robust algorithm took months of work. Unfortunately I don't have the time to initiate a potentially similar-length investigation into new algorithms that might produce a better result for this case.

I don't understand the problem with the use case. If you provide more details and an example it might be possible to suggest another work-around. I'll just note that introducing polygonal geometry into operations on lines usually just complicates the situation (since the requirements for topologically correctness are much greater for polygons).

@diabolo-dan
Copy link

Thanks for your help.

In our use case, we want to find the first point a line is intersected by another (first from the perspective of the first line). To do this we use .intersection to get the intersection geometries, and then, if it's a multi-geometry, sort them by their .projection onto the first line. Due to the way the lines are created, it's fairly common that the lines will be on top of each other for some sections, and we found that that would often result in a large number of short LineString geometries in the intersection, rather than a single long LineString, presumably due to floating point imprecision. That caused a noticeable impact on performance, on common cases, so, as a cheap solution, we introduced a buffer to the intersecting line so that we'd expect a single LineString in that common case.

The problem case above isn't necessarily representative, and was picked up by Hypothesis, but it's concerning to us that it could occur. There are other, arguably better, approaches from our side to resolve the performance issue without the use of a buffer, so we will explore those further when time allows.

@theroggy
Copy link
Contributor Author

theroggy commented Sep 1, 2022

Fix was shipped in version 3.11 of geos on 2022-07-01: https://github.com/libgeos/geos/releases/tag/3.11.0

I retested and the test case now produces a correct result, so the problem is solved. Thanks!

@theroggy theroggy closed this as completed Sep 1, 2022
@kveretennicov
Copy link
Contributor

🎉

What is needed to benefit from this fix on a Shapely 1.8.x in an official python docker image? Do I need to install libgeos 3.11 system library? Or will there a be a Shapely patch release to take care of that, @dr-jts?

@jorisvandenbossche
Copy link
Member

Do I need to install libgeos 3.11 system library? Or will there a be a Shapely patch release to take care of that,

Yes, for now you will need to ensure GEOS 3.11 is installed, and then install shapely from source (the binary wheel includes GEOS 3.10.3). I think we will probably keep using that GEOS version for Shapely 1.8.x, but Shapely 2.0alpha1 already uses GEOS 3.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants