Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geo/geomfn: implement ST_MaxDistance / ST_DFullyWithin #49094

Merged
merged 1 commit into from May 21, 2020

Conversation

otan
Copy link
Contributor

@otan otan commented May 15, 2020

Based on top of #49085.

MaxDistance/DFullyWithin is a specialisation of distance, where it takes
the maximum of distances found instead and disregards intersections and
closest points. The stopAfterLE condition becomes stopAfterGT.

Resolves #48983, resolves #48916

Release note (sql change): Implemented the ST_MaxDistance and
ST_DFullyWithin function for geometries.

@otan otan requested review from sumeerbhola and a team May 15, 2020 01:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan force-pushed the implement_dfullywithin branch 2 times, most recently from 41499eb to b51da62 Compare May 15, 2020 02:24
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

// DFullyWithin determines if any part of geometry A is fully within D units of
// geometry B.

Is this just the maximum of the pair-wise min distance?
The PostGIS doc comment is quite useless "Returns true if the geometries is fully within the specified distance of one another."

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

Previously, sumeerbhola wrote…

Is this just the maximum of the pair-wise min distance?
The PostGIS doc comment is quite useless "Returns true if the geometries is fully within the specified distance of one another."

pair-wise distance. not quite "min distance", because intersection doesn't really count.
the better way to describe is "all points in geometry A is within D units of geometry B", but it can be thought of as MaxDistance(a,b) <= d.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan)


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):
Can you add a code comment with a clear definition.

pair-wise distance. not quite "min distance", because intersection doesn't really count.

what does "intersection doesn't really count" mean?

the better way to describe is "all points in geometry A is within D units of geometry B", but it can be thought of as MaxDistance(a,b) <= d.

There is something about the geometric structure that allows almost the same algorithm to be used in that when doing max distance we still don't need to compare the interiors. Do you have thoughts on this and is it worth writing a code comment that justifies the algorithmic correctness?

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

what does "intersection doesn't really count" mean?

never mind.

There is something about the geometric structure that allows almost the same algorithm to be used in that when doing max distance we still don't need to compare the interiors. Do you have thoughts on this and is it worth writing a code comment that justifies the algorithmic correctness?

I've updated the comment to maxDistanceInternal since its algorithm related and applies to both these commands.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 16 files at r1, 4 of 8 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)


pkg/geo/geodist/geodist.go, line 164 at r3 (raw file):

			return true
		}
		if !c.DistanceUpdater().IsMaxDistance() {

I don't see why this is correct -- I thought we wanted the minimum distance even in this case, and later do the max of the min distance between shapes. If this is being used where the point is not a shape in the original slices then it may be a different calculation.


pkg/geo/geodist/geodist.go, line 192 at r3 (raw file):

	// If the exterior ring does not contain the point, we just need to calculate the distance to
	// the exterior ring.
	// Also, if we are just calculating max distance, we only want the distance to the exterior.

I don't understand this. If the point is contained in the polygon, isn't it incorrect to measure its distance from the exterior. The max distance is 0, yes?
Also, it may not be contained in the polygon, but contained in a hole, in which case the max distance will be the distance to the hole.
In short, the distance calculation where one shape is a point should be the same for max and min


pkg/geo/geodist/geodist.go, line 226 at r3 (raw file):

			// Compare each vertex against the edge of the other.
			for _, toCheck := range []struct {

I don't see why this decomposition is correct when being called from ShapeDistance().
I think we want
max over all i (min over all j (distance(vertex a[i], edge b[j]))


pkg/geo/geodist/geodist.go, line 265 at r3 (raw file):

	// We use the first point of the linestring for this check.
	// If we are looking for max distance, we only need to check against the outer ring anyway.
	if c.DistanceUpdater().IsMaxDistance() || !c.PointInLinearRing(a.Vertex(0), b.LinearRing(0)) {

same question as with the point and polygon comparison.


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

what does "intersection doesn't really count" mean?

never mind.

There is something about the geometric structure that allows almost the same algorithm to be used in that when doing max distance we still don't need to compare the interiors. Do you have thoughts on this and is it worth writing a code comment that justifies the algorithmic correctness?

I've updated the comment to maxDistanceInternal since its algorithm related and applies to both these commands.

I still don't see why this decomposition is correct:
If A is composed of shapes A1, A2 and B is composed of shapes B1, B2
Say every point in B1 is within 1 unit of A1 and every point in B2 is within 1 unit of A2.
But every point in B1 is within 10 units of A2 and every point in B2 is within 10 units of A1.
I am assuming the max distance between A and B is 1.
But if we calculate every pair-wise distance and take the max we will get 10.
Am I missing something?

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/geo/geodist/geodist.go, line 164 at r3 (raw file):

Previously, sumeerbhola wrote…

I don't see why this is correct -- I thought we wanted the minimum distance even in this case, and later do the max of the min distance between shapes. If this is being used where the point is not a shape in the original slices then it may be a different calculation.

Ok, I think that comment is wrong / the understanding of max distance is different between us.

We don't actually want the minimum distance - not sure how to describe it with eloquent math, but basically you want to find the distance between every point of everything in A and everything in B, and take the maximum of that.

Let's take for example max distance of the same polygon in postgis:

otan=# select ST_MaxDistance('POLYGON((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 1.0, 0.0 0.0))',  'POLYGON((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 1.0, 0.0 0.0))');
   st_maxdistance
--------------------
 1.4142135623730951
(1 row)

the maximum distance is from (0.0 0.0) to (1.0 1.0), which is sqrt(2). even if things are the same polygon, to be "fully within" they must be within sqrt(2) with each other.
even in the case of multiple polygons, this calculation is still correct. note that it is not 0, that's the minimum distance.

perhaps the test cases will help give an idea of the calculations -- note that i generated those cases by plugging the output of PostGIS into the "expected".


pkg/geo/geodist/geodist.go, line 192 at r3 (raw file):

Previously, sumeerbhola wrote…

I don't understand this. If the point is contained in the polygon, isn't it incorrect to measure its distance from the exterior. The max distance is 0, yes?
Also, it may not be contained in the polygon, but contained in a hole, in which case the max distance will be the distance to the hole.
In short, the distance calculation where one shape is a point should be the same for max and min

No, max distance is never 0 unless its just two points.
This is correct because the exterior rings are always guaranteed to be the things furthest away from each other.


pkg/geo/geodist/geodist.go, line 226 at r3 (raw file):

Previously, sumeerbhola wrote…

I don't see why this decomposition is correct when being called from ShapeDistance().
I think we want
max over all i (min over all j (distance(vertex a[i], edge b[j]))

See first answer.


pkg/geo/geodist/geodist.go, line 265 at r3 (raw file):

Previously, sumeerbhola wrote…

same question as with the point and polygon comparison.

See answer above.


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

Previously, sumeerbhola wrote…

I still don't see why this decomposition is correct:
If A is composed of shapes A1, A2 and B is composed of shapes B1, B2
Say every point in B1 is within 1 unit of A1 and every point in B2 is within 1 unit of A2.
But every point in B1 is within 10 units of A2 and every point in B2 is within 10 units of A1.
I am assuming the max distance between A and B is 1.
But if we calculate every pair-wise distance and take the max we will get 10.
Am I missing something?

See above comment.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 16 files at r1, 1 of 7 files at r2, 1 of 8 files at r3, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)


docs/generated/sql/functions.md, line 753 at r4 (raw file):

<p>This function will automatically use any available index.</p>
</span></td></tr>
<tr><td><a name="st_dfullywithin"></a><code>st_dfullywithin(geometry_a: geometry, geometry_b: geometry, distance: <a href="float.html">float</a>) &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Returns true if all of geometry_a is in distance units of geometry_b. In other words, the max distance between geometry_a and geometry_b is less than distance units.</p>

This is open to interpretation. How about:
Returns true if every pair of points comprising geometry_a and geometry_b respectively, are within distance units.


docs/generated/sql/functions.md, line 870 at r4 (raw file):

<tr><td><a name="st_makepoint"></a><code>st_makepoint(x: <a href="float.html">float</a>, y: <a href="float.html">float</a>) &rarr; geometry</code></td><td><span class="funcdesc"><p>Returns a new Point with the given X and Y coordinates.</p>
</span></td></tr>
<tr><td><a name="st_maxdistance"></a><code>st_maxdistance(geometry_a: geometry, geometry_b: geometry) &rarr; <a href="float.html">float</a></code></td><td><span class="funcdesc"><p>Returns the maximum distance between the given geometries. Note if the geometries are the same, it will return the maximum distance between the geometry’s vertexes.</p>

Can we make this more specific. Something like:
Returns the maximum distance across every pair of points comprising the given geometries. Note if the ...


pkg/geo/geodist/geodist.go, line 192 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

No, max distance is never 0 unless its just two points.
This is correct because the exterior rings are always guaranteed to be the things furthest away from each other.

This is somewhat nuanced and deserves a comment. I managed to convince myself based on breaking down the cases as follows, but if you can find a reference that would be better.
// When computing the maximum distance, the cases are:
// - The point P is not contained in the exterior of the polygon G. Say vertex V is the vertex of the exterior of the polygon that is furthest away from point P (among all the exterior vertices). One can prove that any vertex of the holes will be closer to point P than vertex V. Similarly we can prove that any point in the interior of the polygin is closer to P than vertex V. Therefore we only need to compare with the exterior.
// - The point P is contained in the exterior and inside a hole of polygon G. One can again prove that the furthest point in the polygon from P is one of the vertices of the exterior.
// - The point P is contained in the polygon. One can again prove the same property.
// So we only need to compare with the exterior.


pkg/geo/geodist/geodist.go, line 156 at r4 (raw file):

// onPointToEdgesExceptFirstEdgeStart updates the distance using the edges between a point and a shape.
// It will only check the ends of the edges, and assumes the check against .Edge(0).V0 is not required.

The "It will only check the ends of the edges" doesn't reflect the changed function name from the previous PR.


pkg/geo/geodist/geodist.go, line 164 at r4 (raw file):

			return true
		}
		if !c.DistanceUpdater().IsMaxDistance() {

Can you add a comment before the if-block
// Max distance between a point and the set of points representing an edge is the maximum distance from the point and the pair of end-points of the edge, so we don't need update the distance using the projected point.


pkg/geo/geodist/geodist.go, line 217 at r4 (raw file):

		crosser := c.NewEdgeCrosser(aEdge, b.Edge(0).V0)
		for bEdgeIdx := 0; bEdgeIdx < b.NumEdges(); bEdgeIdx++ {
			bEdge := b.Edge(bEdgeIdx)

// Max distance between 2 edges is the maximum of the distance across pairs of vertices chosen from each edge.
// It does not matter whether the edges cross, so we skip that check.


pkg/geo/geodist/geodist.go, line 264 at r4 (raw file):

	// and the exterior ring.
	// We use the first point of the linestring for this check.
	// If we are looking for max distance, we only need to check against the outer ring anyway.

Let's update this comment as
// Min Distance:
// ....
// Max Distance:
// The max distance is the distance between the line string and the exterior ring of the polygon (using the same reasoning as described in onPointToPolygon()


pkg/geo/geodist/geodist.go, line 318 at r4 (raw file):

	// As such, we only need to compare the exterior rings if we detect this.
	//
	// If we are only looking at the max distance, we only want to compare exteriors.

Like my comment on onPointToPolygon this is non-obvious. Do you happen to have a reference?


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

See above comment.

How about:

// DFullyWithin determines whether the maximum distance across every pair of points comprising geometries A and B is within D units.


pkg/geo/geomfn/distance.go, line 31 at r4 (raw file):

}

// MaxDistance returns the maximum distance between geometries A and B.

How about:
// MaxDistance returns the maximum distance across every pair of points comprising geometries A and B.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


docs/generated/sql/functions.md, line 753 at r4 (raw file):

Previously, sumeerbhola wrote…

This is open to interpretation. How about:
Returns true if every pair of points comprising geometry_a and geometry_b respectively, are within distance units.

Done.


docs/generated/sql/functions.md, line 870 at r4 (raw file):

Previously, sumeerbhola wrote…

Can we make this more specific. Something like:
Returns the maximum distance across every pair of points comprising the given geometries. Note if the ...

Done.


pkg/geo/geodist/geodist.go, line 192 at r3 (raw file):

Previously, sumeerbhola wrote…

This is somewhat nuanced and deserves a comment. I managed to convince myself based on breaking down the cases as follows, but if you can find a reference that would be better.
// When computing the maximum distance, the cases are:
// - The point P is not contained in the exterior of the polygon G. Say vertex V is the vertex of the exterior of the polygon that is furthest away from point P (among all the exterior vertices). One can prove that any vertex of the holes will be closer to point P than vertex V. Similarly we can prove that any point in the interior of the polygin is closer to P than vertex V. Therefore we only need to compare with the exterior.
// - The point P is contained in the exterior and inside a hole of polygon G. One can again prove that the furthest point in the polygon from P is one of the vertices of the exterior.
// - The point P is contained in the polygon. One can again prove the same property.
// So we only need to compare with the exterior.

I don't think it's a reference and it seems somewhat intuitive to me (I've been starting at it for a long time HAH!) -- I'll add this comment.


pkg/geo/geodist/geodist.go, line 226 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

See first answer.

Done.


pkg/geo/geodist/geodist.go, line 265 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

See answer above.

Done.


pkg/geo/geodist/geodist.go, line 156 at r4 (raw file):

Previously, sumeerbhola wrote…

The "It will only check the ends of the edges" doesn't reflect the changed function name from the previous PR.

How so? I've updated it slightly.


pkg/geo/geodist/geodist.go, line 164 at r4 (raw file):

Previously, sumeerbhola wrote…

Can you add a comment before the if-block
// Max distance between a point and the set of points representing an edge is the maximum distance from the point and the pair of end-points of the edge, so we don't need update the distance using the projected point.

Done.


pkg/geo/geodist/geodist.go, line 217 at r4 (raw file):

Previously, sumeerbhola wrote…

// Max distance between 2 edges is the maximum of the distance across pairs of vertices chosen from each edge.
// It does not matter whether the edges cross, so we skip that check.

Done.


pkg/geo/geodist/geodist.go, line 264 at r4 (raw file):

Previously, sumeerbhola wrote…

Let's update this comment as
// Min Distance:
// ....
// Max Distance:
// The max distance is the distance between the line string and the exterior ring of the polygon (using the same reasoning as described in onPointToPolygon()

Done.


pkg/geo/geodist/geodist.go, line 318 at r4 (raw file):

Previously, sumeerbhola wrote…

Like my comment on onPointToPolygon this is non-obvious. Do you happen to have a reference?

Nope. But your analysis is correct - I've updated the comment


pkg/geo/geomfn/distance.go, line 55 at r2 (raw file):

Previously, sumeerbhola wrote…

How about:

// DFullyWithin determines whether the maximum distance across every pair of points comprising geometries A and B is within D units.

Done.


pkg/geo/geomfn/distance.go, line 31 at r4 (raw file):

Previously, sumeerbhola wrote…

How about:
// MaxDistance returns the maximum distance across every pair of points comprising geometries A and B.

Done.

@otan otan force-pushed the implement_dfullywithin branch 2 times, most recently from 1e74871 to 7eb6238 Compare May 20, 2020 01:36
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)


pkg/geo/geodist/geodist.go, line 192 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

I don't think it's a reference and it seems somewhat intuitive to me (I've been starting at it for a long time HAH!) -- I'll add this comment.

I've found intuition (both mine and others) to be incorrect enough times. Given the complexity of holes and concave polygons, intuition can be tripped up here. But this is fine for now.


pkg/geo/geodist/geodist.go, line 167 at r5 (raw file):

		// The max distance between a point and the set of points representing an edge is the
		// maximum distance from the point and the pair of end-points of the edge, so we don't
		// need update the distance using the projected point.

... need to update ...

MaxDistance/DFullyWithin is a specialisation of distance, where it takes
the maximum of distances found instead and disregards intersections and
closest points. The `stopAfterLE` condition becomes `stopAfterGT`.

Release note (sql change): Implemented the ST_MaxDistance and
ST_DFullyWithin function for geometries.
@otan
Copy link
Contributor Author

otan commented May 21, 2020

tftr!

bors r=sumeerbhola

@craig
Copy link
Contributor

craig bot commented May 21, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants