-
Notifications
You must be signed in to change notification settings - Fork 748
[GH-1979] Fix ST_Envelope and ST_Envelope_Aggr empty geometry handling #2622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cf0ccf9 to
a581107
Compare
a581107 to
04f06cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes empty-geometry handling for ST_Envelope and ST_Envelope_Aggr to match PostGIS semantics (skip empty geometries during aggregation; return NULL when no non-empty inputs contribute an extent).
Changes:
- Update Spark/Flink/Snowflake aggregate implementations to ignore empty geometries and return
NULLwhen appropriate. - Add/adjust tests across Spark, Flink, and common modules to validate empty/mixed-input behavior.
- Update docs to clarify that empty geometries and
NULLs are skipped and all-empty/all-null returnsNULL.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala | Adds Spark SQL tests for ST_Envelope_Aggr empty/mixed inputs; updates null+empty expectation to NULL. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/AggregateFunctions.scala | Skips empty geometries during ST_Envelope_Aggr reduction in Spark. |
| snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Aggr.java | Skips empty geometries in Snowflake aggregator and avoids emitting output when buffer is null. |
| snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Agg.java | Same as above for the alternate Snowflake aggregate implementation. |
| flink/src/test/java/org/apache/sedona/flink/AggregatorTest.java | Adds Flink test ensuring ST_Envelope_Aggr over all-empty returns NULL. |
| flink/src/main/java/org/apache/sedona/flink/expressions/Aggregators.java | Skips empty geometries and returns NULL when accumulator has no extent in Flink. |
| docs/api/sql/AggregateFunction.md | Documents skip-empty/skip-null and all-empty/all-null returns NULL behavior. |
| docs/api/snowflake/vector-data/AggregateFunction.md | Same documentation update for Snowflake docs. |
| docs/api/flink/Aggregator.md | Same documentation update for Flink docs. |
| common/src/test/java/org/apache/sedona/common/FunctionsTest.java | Adds tests verifying scalar ST_Envelope returns same-type EMPTY for empty inputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Geometry geometry = GeometrySerde.deserialize(geom); | ||
| if (geometry.isEmpty()) { |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process calls geometry.isEmpty() without guarding for geom == null (SQL NULL input) or geometry == null (if deserialization yields null). This can throw NPE and contradict the documented “null values are skipped” behavior. Add a null check (e.g., return Stream.empty() when geom/geometry is null) before calling isEmpty().
| Geometry geometry = GeometrySerde.deserialize(geom); | |
| if (geometry.isEmpty()) { | |
| // Skip null input geometries (SQL NULL) as per "null values are skipped" behavior. | |
| if (geom == null) { | |
| return Stream.empty(); | |
| } | |
| Geometry geometry = GeometrySerde.deserialize(geom); | |
| // Skip if deserialization yields null or an empty geometry. | |
| if (geometry == null || geometry.isEmpty()) { |
|
|
||
| public Stream<OutputRow> endPartition() { | ||
| if (buffer == null || buffer.isNull()) { | ||
| return Stream.empty(); |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning Stream.empty() from endPartition() may produce no output row rather than a single row whose value is NULL, depending on how this UDTF is wrapped/queried. If the intended SQL semantics are “aggregate result is NULL”, consider emitting one OutputRow with a null geometry payload instead of emitting zero rows (or ensure the wrapper layer converts empty-stream to NULL reliably).
| return Stream.empty(); | |
| // No non-empty geometries were seen; emit a single row with a NULL envelope | |
| return Stream.of(new OutputRow(null)); |
| Geometry geometry = GeometrySerde.deserialize(geom); | ||
| if (geometry.isEmpty()) { |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as ST_Envelope_Aggr: geometry.isEmpty() can NPE for SQL NULL inputs (or null deserialization). Guard geom/geometry for null to correctly “skip null values”.
| Geometry geometry = GeometrySerde.deserialize(geom); | |
| if (geometry.isEmpty()) { | |
| if (geom == null) { | |
| // Skip SQL NULL inputs | |
| return Stream.empty(); | |
| } | |
| Geometry geometry = GeometrySerde.deserialize(geom); | |
| if (geometry == null || geometry.isEmpty()) { | |
| // Skip null or empty geometries |
flink/src/main/java/org/apache/sedona/flink/expressions/Aggregators.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testEnvelop_Aggr_EmptyGeometries() { |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the new test name: testEnvelop_Aggr_EmptyGeometries is missing the “e” in “Envelope”. Renaming to testEnvelope_Aggr_EmptyGeometries improves clarity and consistency with the function name.
| public void testEnvelop_Aggr_EmptyGeometries() { | |
| public void testEnvelope_Aggr_EmptyGeometries() { |
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala
Show resolved
Hide resolved
04f06cc to
6c0843c
Compare
ST_Envelope (scalar): Already correctly returns same-type EMPTY geometry for empty inputs (matching PostGIS). Added test to verify this behavior. ST_Envelope_Aggr (aggregate): Skip empty geometries during accumulation and return null when all inputs are empty, matching PostGIS ST_Extent behavior. Fixed in Spark (AggregateFunctions.scala), Flink (Aggregators.java), and Snowflake (ST_Envelope_Aggr.java, ST_Envelope_Agg.java). Added tests in Spark (aggregateFunctionTestScala: all-empty returns null, mixed empty/non-empty preserves valid envelope), Flink (AggregatorTest: all-empty returns null), and common (FunctionsTest: scalar envelope with empty geometries). Fixes #1979
6c0843c to
3d87f1d
Compare
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes ST_Envelope and ST_Envelope_Aggr have inconsistent edge case handling with PostGIS #1979What changes were proposed in this PR?
ST_Envelope (scalar): Already correctly returns same-type EMPTY geometry for empty inputs (matching PostGIS). Added test to verify this behavior.
ST_Envelope_Aggr (aggregate): Skip empty geometries during accumulation and return null when all inputs are empty, matching PostGIS ST_Extent behavior. Fixed in Spark (AggregateFunctions.scala), Flink (Aggregators.java), and Snowflake (ST_Envelope_Aggr.java, ST_Envelope_Agg.java).
PostGIS behavior: ST_Extent is a standard SQL aggregate. Like all SQL aggregates (SUM, AVG, etc.), it skips NULL values. If all values are NULL, it returns NULL. For empty geometries, PostGIS treats them as valid non-null geometries whose envelope is empty — ST_Extent over only empty geometries returns NULL because empty geometries have no extent to contribute (their internal envelope is "null"/empty in JTS terms).
This is actually the standard behavior:
All NULL inputs → NULL (standard SQL aggregate behavior)
All empty geometry inputs → NULL (empty geometries have no spatial extent)
Mixed NULL + empty inputs → NULL (same reason — no geometry has a spatial extent)
Mixed empty + non-empty → envelope of the non-empty geometries only
How was this patch tested?
Added tests in Spark (aggregateFunctionTestScala: all-empty returns null, mixed empty/non-empty preserves valid envelope), Flink (AggregatorTest: all-empty returns null), and common (FunctionsTest: scalar envelope with empty geometries).
Did this PR include necessary documentation updates?