Skip to content

[GH-2971] Add ST_DWithin(Box2D, Box2D, distance) overload#2969

Merged
jiayuasu merged 1 commit into
apache:masterfrom
jiayuasu:feature/box2d-dwithin
May 19, 2026
Merged

[GH-2971] Add ST_DWithin(Box2D, Box2D, distance) overload#2969
jiayuasu merged 1 commit into
apache:masterfrom
jiayuasu:feature/box2d-dwithin

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 18, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Adds a planar Box2D × Box2D overload to ST_DWithin. Distance joins on Box2D columns are now accepted at analysis and routed through the existing distance-join planning machinery (broadcast-index or partition-based DistanceJoinExec), with no new physical operator.

Scalar implementation

Predicates.dWithin(Box2D, Box2D, double) in common:

  • Computes the closed-interval Euclidean distance between two AABBs as sqrt(dx² + dy²) where dx = max(0, max(a.xmin - b.xmax, b.xmin - a.xmax)) and similarly for dy. Overlapping or edge/corner-touching boxes have distance 0 and therefore match any non-negative radius.
  • Bails out via the squared-radius comparison if either delta already exceeds the supplied radius (avoids a sqrt).
  • Negative radius never matches (consistent with how JTS Geometry.isWithinDistance treats negative distance).
  • Inverted bounds (xmin > xmax or ymin > ymax) raise the same IllegalArgumentException raised by ST_BoxIntersects / ST_BoxContains. Inverted-bound values are reserved for a future antimeridian-wraparound semantics; planar predicates have no defined meaning on them.

Catalyst wiring

ST_DWithin in spark/common/.../Predicates.scala: added a fourth inferrableFunction3 entry typed (Box2D, Box2D, Double) => Boolean. The pre-existing 3-arg geometry entry needed an explicit lambda because there are now two arity-3 Java overloads named Predicates.dWithin and Scala can't pick between them through eta-expansion alone.

Join planner

No code change needed. JoinQueryDetector's ST_DWithin(Seq(left, right, distance)) case already produces a JoinQueryDetection with SpatialPredicate.INTERSECTS and the per-row distance, and OptimizableJoinCondition.isOptimizablePredicate already accepts ST_DWithin regardless of operand types. The expansion-then-index pipeline runs through TraitJoinQueryBase.toExpandedEnvelopeRDD, which uses the Box2D → polygon dispatch from #2939. Per-pair refine then dispatches back to the new Box2D overload in this PR.

How was this patch tested?

  • PredicatesTest: new testDWithinBox2D (overlap, edge-touching, corner-touching, separation on one axis, Pythagorean separation, negative radius) and testDWithinBox2DRejectInvertedBounds. 15/15 pass locally.
  • Box2DJoinSuite: four new tests covering broadcast index join at radius 1.0 and 6.0, non-broadcast DistanceJoinExec at radius 6.0, and zero-radius edge-touching. 12/12 pass locally (8 pre-existing + 4 new).
  • Regression run: BroadcastIndexJoinSuite + SpatialJoinSuite + KnnJoinSuite 254/254 still pass.
  • mkdocs build --strict is clean modulo pre-existing unrelated warnings.

Did this PR include necessary documentation updates?

Yes — new docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.md page and a row in the Box2D-Functions.md predicates table.

Closes the last remaining gap from the spatial-join work in apache#2939: until
now `ST_DWithin` only accepted (Geometry, Geometry, d) and (Geography,
Geography, d), so distance joins on Box2D columns were rejected at
analysis. This PR adds a planar (Box2D, Box2D, double) overload that
computes the closed-interval AABB-to-AABB Euclidean distance.

- `Predicates.dWithin(Box2D, Box2D, double)` in `common`: closed-interval
  distance test. Overlapping or edge/corner-touching boxes have distance
  0 and match for any non-negative radius. Inverted bounds throw the
  same IllegalArgumentException raised by ST_BoxIntersects /
  ST_BoxContains (reserved for future antimeridian wraparound).
- `ST_DWithin` in `Predicates.scala`: registered the Box2D overload as a
  fourth `inferrableFunction3` entry. The pre-existing 3-arg geometry
  entry needed an explicit lambda to disambiguate now that there are
  two arity-3 Java overloads named `Predicates.dWithin`.
- The join planner needs no changes: it already routes ST_DWithin
  through `toExpandedEnvelopeRDD`, which uses the Box2D → polygon
  dispatch landed in apache#2939.
- Tests: scalar coverage in `PredicatesTest` (overlap, edge/corner
  touching, separation on one axis, Pythagorean separation, negative
  radius, inverted-bound rejection); join coverage in `Box2DJoinSuite`
  (BroadcastIndexJoinExec for radius=1.0 and radius=6.0; DistanceJoinExec
  for the non-broadcast path; zero-radius edge-touching).
- Docs: new `docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.md` and a
  row in the Box2D-Functions.md predicates table.

Verified locally: PredicatesTest 15/15, Box2DJoinSuite 12/12, regression
across BroadcastIndexJoinSuite + SpatialJoinSuite + KnnJoinSuite 254/254.
mkdocs --strict build is clean modulo pre-existing unrelated warnings.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Sedona’s ST_DWithin predicate to support planar distance checks between two Box2D values, enabling ST_DWithin(box_col_a, box_col_b, d) to be recognized and optimized as a distance join (broadcast-index or partition-based) in the existing Spark join planner.

Changes:

  • Add Predicates.dWithin(Box2D, Box2D, double) with ordered-bound validation and a squared-distance implementation.
  • Wire a new (Box2D, Box2D, Double) => Boolean overload into Spark SQL ST_DWithin expression inference (and disambiguate the existing Geometry 3-arg overload).
  • Add unit + Spark-plan tests and Box2D SQL documentation for the new overload.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/Predicates.java Adds the Box2D×Box2D dWithin implementation with inverted-bound validation and squared-distance comparison.
common/src/test/java/org/apache/sedona/common/PredicatesTest.java Adds unit tests covering Box2D dWithin semantics (touching/overlap, separations, negative radius, inverted bounds).
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Predicates.scala Adds Spark SQL expression inference entry for (Box2D, Box2D, Double) and resolves Scala overload ambiguity for Geometry dWithin.
spark/common/src/test/scala/org/apache/sedona/sql/Box2DJoinSuite.scala Adds Spark join-planning tests verifying ST_DWithin on Box2D routes to BroadcastIndexJoinExec / DistanceJoinExec and returns correct counts.
docs/api/sql/box2d/Box2D-Predicates/ST_DWithin.md New Box2D-specific ST_DWithin documentation page, including optimizer behavior and error semantics.
docs/api/sql/box2d/Box2D-Functions.md Adds ST_DWithin to the Box2D predicates table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu changed the title Add ST_DWithin(Box2D, Box2D, distance) overload [GH-2971] Add ST_DWithin(Box2D, Box2D, distance) overload May 19, 2026
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone May 19, 2026
@jiayuasu jiayuasu merged commit 2a523e7 into apache:master May 19, 2026
44 checks passed
@jiayuasu jiayuasu deleted the feature/box2d-dwithin branch May 19, 2026 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ST_DWithin(Box2D, Box2D, distance) overload

2 participants