Skip to content

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Feb 7, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

ST_Envelope (scalar): Already correctly returns same-type EMPTY geometry for empty inputs (matching PostGIS). Added test to verify this behavior.

ST_Envelope_Aggr (aggregate): Skip empty geometries during accumulation and return null when all inputs are empty, matching PostGIS ST_Extent behavior. Fixed in Spark (AggregateFunctions.scala), Flink (Aggregators.java), and Snowflake (ST_Envelope_Aggr.java, ST_Envelope_Agg.java).

PostGIS behavior: ST_Extent is a standard SQL aggregate. Like all SQL aggregates (SUM, AVG, etc.), it skips NULL values. If all values are NULL, it returns NULL. For empty geometries, PostGIS treats them as valid non-null geometries whose envelope is empty — ST_Extent over only empty geometries returns NULL because empty geometries have no extent to contribute (their internal envelope is "null"/empty in JTS terms).

This is actually the standard behavior:

All NULL inputs → NULL (standard SQL aggregate behavior)
All empty geometry inputs → NULL (empty geometries have no spatial extent)
Mixed NULL + empty inputs → NULL (same reason — no geometry has a spatial extent)
Mixed empty + non-empty → envelope of the non-empty geometries only

How was this patch tested?

Added tests in Spark (aggregateFunctionTestScala: all-empty returns null, mixed empty/non-empty preserves valid envelope), Flink (AggregatorTest: all-empty returns null), and common (FunctionsTest: scalar envelope with empty geometries).

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

@jiayuasu jiayuasu force-pushed the fix/1979-envelope-empty branch from cf0ccf9 to a581107 Compare February 8, 2026 00:00
@jiayuasu jiayuasu marked this pull request as draft February 8, 2026 00:03
@jiayuasu jiayuasu force-pushed the fix/1979-envelope-empty branch from a581107 to 04f06cc Compare February 8, 2026 00:05
@github-actions github-actions bot added the docs label Feb 8, 2026
@jiayuasu jiayuasu marked this pull request as ready for review February 8, 2026 00:07
@jiayuasu jiayuasu requested a review from Copilot February 8, 2026 00:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes empty-geometry handling for ST_Envelope and ST_Envelope_Aggr to match PostGIS semantics (skip empty geometries during aggregation; return NULL when no non-empty inputs contribute an extent).

Changes:

  • Update Spark/Flink/Snowflake aggregate implementations to ignore empty geometries and return NULL when appropriate.
  • Add/adjust tests across Spark, Flink, and common modules to validate empty/mixed-input behavior.
  • Update docs to clarify that empty geometries and NULLs are skipped and all-empty/all-null returns NULL.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala Adds Spark SQL tests for ST_Envelope_Aggr empty/mixed inputs; updates null+empty expectation to NULL.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/AggregateFunctions.scala Skips empty geometries during ST_Envelope_Aggr reduction in Spark.
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Aggr.java Skips empty geometries in Snowflake aggregator and avoids emitting output when buffer is null.
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/udtfs/ST_Envelope_Agg.java Same as above for the alternate Snowflake aggregate implementation.
flink/src/test/java/org/apache/sedona/flink/AggregatorTest.java Adds Flink test ensuring ST_Envelope_Aggr over all-empty returns NULL.
flink/src/main/java/org/apache/sedona/flink/expressions/Aggregators.java Skips empty geometries and returns NULL when accumulator has no extent in Flink.
docs/api/sql/AggregateFunction.md Documents skip-empty/skip-null and all-empty/all-null returns NULL behavior.
docs/api/snowflake/vector-data/AggregateFunction.md Same documentation update for Snowflake docs.
docs/api/flink/Aggregator.md Same documentation update for Flink docs.
common/src/test/java/org/apache/sedona/common/FunctionsTest.java Adds tests verifying scalar ST_Envelope returns same-type EMPTY for empty inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 52 to 53
Geometry geometry = GeometrySerde.deserialize(geom);
if (geometry.isEmpty()) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process calls geometry.isEmpty() without guarding for geom == null (SQL NULL input) or geometry == null (if deserialization yields null). This can throw NPE and contradict the documented “null values are skipped” behavior. Add a null check (e.g., return Stream.empty() when geom/geometry is null) before calling isEmpty().

Suggested change
Geometry geometry = GeometrySerde.deserialize(geom);
if (geometry.isEmpty()) {
// Skip null input geometries (SQL NULL) as per "null values are skipped" behavior.
if (geom == null) {
return Stream.empty();
}
Geometry geometry = GeometrySerde.deserialize(geom);
// Skip if deserialization yields null or an empty geometry.
if (geometry == null || geometry.isEmpty()) {

Copilot uses AI. Check for mistakes.

public Stream<OutputRow> endPartition() {
if (buffer == null || buffer.isNull()) {
return Stream.empty();
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning Stream.empty() from endPartition() may produce no output row rather than a single row whose value is NULL, depending on how this UDTF is wrapped/queried. If the intended SQL semantics are “aggregate result is NULL”, consider emitting one OutputRow with a null geometry payload instead of emitting zero rows (or ensure the wrapper layer converts empty-stream to NULL reliably).

Suggested change
return Stream.empty();
// No non-empty geometries were seen; emit a single row with a NULL envelope
return Stream.of(new OutputRow(null));

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 53
Geometry geometry = GeometrySerde.deserialize(geom);
if (geometry.isEmpty()) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as ST_Envelope_Aggr: geometry.isEmpty() can NPE for SQL NULL inputs (or null deserialization). Guard geom/geometry for null to correctly “skip null values”.

Suggested change
Geometry geometry = GeometrySerde.deserialize(geom);
if (geometry.isEmpty()) {
if (geom == null) {
// Skip SQL NULL inputs
return Stream.empty();
}
Geometry geometry = GeometrySerde.deserialize(geom);
if (geometry == null || geometry.isEmpty()) {
// Skip null or empty geometries

Copilot uses AI. Check for mistakes.
}

@Test
public void testEnvelop_Aggr_EmptyGeometries() {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the new test name: testEnvelop_Aggr_EmptyGeometries is missing the “e” in “Envelope”. Renaming to testEnvelope_Aggr_EmptyGeometries improves clarity and consistency with the function name.

Suggested change
public void testEnvelop_Aggr_EmptyGeometries() {
public void testEnvelope_Aggr_EmptyGeometries() {

Copilot uses AI. Check for mistakes.
@jiayuasu jiayuasu force-pushed the fix/1979-envelope-empty branch from 04f06cc to 6c0843c Compare February 8, 2026 00:45
ST_Envelope (scalar): Already correctly returns same-type EMPTY geometry for
empty inputs (matching PostGIS). Added test to verify this behavior.

ST_Envelope_Aggr (aggregate): Skip empty geometries during accumulation and
return null when all inputs are empty, matching PostGIS ST_Extent behavior.
Fixed in Spark (AggregateFunctions.scala), Flink (Aggregators.java), and
Snowflake (ST_Envelope_Aggr.java, ST_Envelope_Agg.java).

Added tests in Spark (aggregateFunctionTestScala: all-empty returns null,
mixed empty/non-empty preserves valid envelope), Flink (AggregatorTest:
all-empty returns null), and common (FunctionsTest: scalar envelope with
empty geometries).

Fixes #1979
@jiayuasu jiayuasu force-pushed the fix/1979-envelope-empty branch from 6c0843c to 3d87f1d Compare February 8, 2026 05:56
@jiayuasu jiayuasu merged commit 540549b into master Feb 8, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ST_Envelope and ST_Envelope_Aggr have inconsistent edge case handling with PostGIS

1 participant