[GH-2971] Add ST_DWithin(Box2D, Box2D, distance) overload#2969
Merged
Conversation
Closes the last remaining gap from the spatial-join work in apache#2939: until now `ST_DWithin` only accepted (Geometry, Geometry, d) and (Geography, Geography, d), so distance joins on Box2D columns were rejected at analysis. This PR adds a planar (Box2D, Box2D, double) overload that computes the closed-interval AABB-to-AABB Euclidean distance. - `Predicates.dWithin(Box2D, Box2D, double)` in `common`: closed-interval distance test. Overlapping or edge/corner-touching boxes have distance 0 and match for any non-negative radius. Inverted bounds throw the same IllegalArgumentException raised by ST_BoxIntersects / ST_BoxContains (reserved for future antimeridian wraparound). - `ST_DWithin` in `Predicates.scala`: registered the Box2D overload as a fourth `inferrableFunction3` entry. The pre-existing 3-arg geometry entry needed an explicit lambda to disambiguate now that there are two arity-3 Java overloads named `Predicates.dWithin`. - The join planner needs no changes: it already routes ST_DWithin through `toExpandedEnvelopeRDD`, which uses the Box2D → polygon dispatch landed in apache#2939. - Tests: scalar coverage in `PredicatesTest` (overlap, edge/corner touching, separation on one axis, Pythagorean separation, negative radius, inverted-bound rejection); join coverage in `Box2DJoinSuite` (BroadcastIndexJoinExec for radius=1.0 and radius=6.0; DistanceJoinExec for the non-broadcast path; zero-radius edge-touching). - Docs: new `docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.md` and a row in the Box2D-Functions.md predicates table. Verified locally: PredicatesTest 15/15, Box2DJoinSuite 12/12, regression across BroadcastIndexJoinSuite + SpatialJoinSuite + KnnJoinSuite 254/254. mkdocs --strict build is clean modulo pre-existing unrelated warnings.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends Sedona’s ST_DWithin predicate to support planar distance checks between two Box2D values, enabling ST_DWithin(box_col_a, box_col_b, d) to be recognized and optimized as a distance join (broadcast-index or partition-based) in the existing Spark join planner.
Changes:
- Add
Predicates.dWithin(Box2D, Box2D, double)with ordered-bound validation and a squared-distance implementation. - Wire a new
(Box2D, Box2D, Double) => Booleanoverload into Spark SQLST_DWithinexpression inference (and disambiguate the existing Geometry 3-arg overload). - Add unit + Spark-plan tests and Box2D SQL documentation for the new overload.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/main/java/org/apache/sedona/common/Predicates.java | Adds the Box2D×Box2D dWithin implementation with inverted-bound validation and squared-distance comparison. |
| common/src/test/java/org/apache/sedona/common/PredicatesTest.java | Adds unit tests covering Box2D dWithin semantics (touching/overlap, separations, negative radius, inverted bounds). |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Predicates.scala | Adds Spark SQL expression inference entry for (Box2D, Box2D, Double) and resolves Scala overload ambiguity for Geometry dWithin. |
| spark/common/src/test/scala/org/apache/sedona/sql/Box2DJoinSuite.scala | Adds Spark join-planning tests verifying ST_DWithin on Box2D routes to BroadcastIndexJoinExec / DistanceJoinExec and returns correct counts. |
| docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.md | New Box2D-specific ST_DWithin documentation page, including optimizer behavior and error semantics. |
| docs/api/sql/box2d/Box2D-Functions.md | Adds ST_DWithin to the Box2D predicates table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Add ST_DWithin(Box2D, Box2D, distance) overload #2971.What changes were proposed in this PR?
Adds a planar
Box2D × Box2Doverload toST_DWithin. Distance joins on Box2D columns are now accepted at analysis and routed through the existing distance-join planning machinery (broadcast-index or partition-basedDistanceJoinExec), with no new physical operator.Scalar implementation
Predicates.dWithin(Box2D, Box2D, double)incommon:sqrt(dx² + dy²)wheredx = max(0, max(a.xmin - b.xmax, b.xmin - a.xmax))and similarly fordy. Overlapping or edge/corner-touching boxes have distance0and therefore match any non-negative radius.sqrt).Geometry.isWithinDistancetreats negative distance).xmin > xmaxorymin > ymax) raise the sameIllegalArgumentExceptionraised byST_BoxIntersects/ST_BoxContains. Inverted-bound values are reserved for a future antimeridian-wraparound semantics; planar predicates have no defined meaning on them.Catalyst wiring
ST_DWithininspark/common/.../Predicates.scala: added a fourthinferrableFunction3entry typed(Box2D, Box2D, Double) => Boolean. The pre-existing 3-arg geometry entry needed an explicit lambda because there are now two arity-3 Java overloads namedPredicates.dWithinand Scala can't pick between them through eta-expansion alone.Join planner
No code change needed.
JoinQueryDetector'sST_DWithin(Seq(left, right, distance))case already produces aJoinQueryDetectionwithSpatialPredicate.INTERSECTSand the per-row distance, andOptimizableJoinCondition.isOptimizablePredicatealready acceptsST_DWithinregardless of operand types. The expansion-then-index pipeline runs throughTraitJoinQueryBase.toExpandedEnvelopeRDD, which uses the Box2D → polygon dispatch from #2939. Per-pair refine then dispatches back to the new Box2D overload in this PR.How was this patch tested?
PredicatesTest: newtestDWithinBox2D(overlap, edge-touching, corner-touching, separation on one axis, Pythagorean separation, negative radius) andtestDWithinBox2DRejectInvertedBounds. 15/15 pass locally.Box2DJoinSuite: four new tests covering broadcast index join at radius 1.0 and 6.0, non-broadcastDistanceJoinExecat radius 6.0, and zero-radius edge-touching. 12/12 pass locally (8 pre-existing + 4 new).BroadcastIndexJoinSuite+SpatialJoinSuite+KnnJoinSuite254/254 still pass.mkdocs build --strictis clean modulo pre-existing unrelated warnings.Did this PR include necessary documentation updates?
Yes — new
docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.mdpage and a row in theBox2D-Functions.mdpredicates table.