Skip to content

chore(ci): Port distance and predicate geography tests from s2geography and run them against BigQuery and PostGIS where possible#816

Merged
paleolimbot merged 30 commits into
apache:mainfrom
paleolimbot:geog-testing
May 13, 2026
Merged

chore(ci): Port distance and predicate geography tests from s2geography and run them against BigQuery and PostGIS where possible#816
paleolimbot merged 30 commits into
apache:mainfrom
paleolimbot:geog-testing

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot commented May 5, 2026

This PR adds tests for predicates and distance functions. These tests were scraped from s2geography but here they also run against bigquery (where possible) and postgis (where possible). This PR is a pretty good summary of our geography implementation and the functions we support (as well as our behaviour differs from PostGIS or BigQuery). I know this is a lot of lines but I promise it's almost all tests 🙂

I opened up some follow-ups for tests that probably should pass but don't yet as well as some functions that need implementing upstream in s2geography to complete parity with BigQuery.

@paleolimbot paleolimbot marked this pull request as draft May 5, 2026 17:09
paleolimbot and others added 7 commits May 5, 2026 21:21
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
paleolimbot added a commit to paleolimbot/s2geography that referenced this pull request May 6, 2026
In apache/sedona-db#816 I ran the functions
based on these against PostGIS and BigQuery and it turned up a few
things.

There is also a predicate issue (and probably more predicate issues that
I haven't considered yet), and I'm collecting those separately:
apache/sedona-db#817
"st_equals",
"st_intersects",
"st_within",
];
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a code change...I added kernels to match things like st_intersects(geography, NULL). This is mostly for parameter binding...params are inserted into a plan as a Null type and the plan is validated before it's replace with something of the correct type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed a few issues in s2geography and updated the submodule.

@paleolimbot paleolimbot marked this pull request as ready for review May 7, 2026 03:06
@paleolimbot paleolimbot requested a review from Copilot May 7, 2026 04:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports a large suite of geography predicate, distance/measure, overlay, accessor, transformation, and S2-specific integration tests (originally from s2geography) into sedonadb, running them against SedonaDB and (where supported) BigQuery/PostGIS. It also makes small kernel-level changes needed for those tests, notably enabling ST_NPoints on geography and adding explicit NULL-type kernel matchers for selected s2geography-backed functions.

Changes:

  • Add extensive Python integration tests for geography predicates, distances/measures, overlays, transformations, accessors, constructors/formatters, and S2 helpers across SedonaDB/BigQuery/PostGIS where possible.
  • Extend ST_NPoints type matching to accept geography inputs.
  • Add Rust-side NULL-type helper kernels for selected s2geography scalar UDFs to handle DataFusion NULL-typed arguments.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rust/sedona-functions/src/st_points.rs Allow ST_NPoints to accept geography in addition to geometry.
c/sedona-s2geography/src/kernels.rs Add NULL-type helper kernels for select s2geography functions + tests validating NULL dispatch.
python/sedonadb/tests/geography/test_geog_transformations.py Add/expand geography transformation tests (centroid, convex hull, buffering, simplify, reduce precision, etc.).
python/sedonadb/tests/geography/test_geog_predicates.py Add/expand geography predicate behavior tests (intersects/contains/within/equals/disjoint).
python/sedonadb/tests/geography/test_geog_measures.py Add/expand geography measurement tests (area/length/perimeter/line locate point).
python/sedonadb/tests/geography/test_geog_distance.py Add comprehensive geography distance-related tests (distance/dwithin/maxdistance/closestpoint/shortestline/longestline).
python/sedonadb/tests/geography/test_geog_overlay.py Add geography overlay operation tests (intersection/difference/union/symdifference) and empty-handling cases.
python/sedonadb/tests/geography/test_geog_accessors.py Add/expand geography accessor tests (dimension/isempty/npoints/numgeometries/x/y/geometrytype/isclosed/iscollection).
python/sedonadb/tests/geography/test_geog_s2.py Add S2 helper function tests for geography (cell id from point, covering cell ids).
python/sedonadb/tests/geography/test_constructors_parsers_formatters.py Add ST_AsText geography roundtrip tests (SedonaDB/PostGIS) including Z/M/ZM coverage (SedonaDB only).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/sedonadb/tests/geography/test_geog_transformations.py
Comment thread c/sedona-s2geography/src/kernels.rs
Comment on lines +121 to +124
if eng.name() == "postgis":
eps = 1e-2
else:
eps = 1e-15
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we encode this into the assert_query_results method somehow? maybe an is_spheroidal flag?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added eng.geography_numeric_epsilon() and I think I managed to remove all of these

10007559.105973553,
id="point_distance_wraparound_lng",
),
# Point x linestring (point on linestring)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider doing some more "difficult" point on linestring, specifically a line that is not axis aligned?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point...I added a polar case. These are coming from C++ where I'm absolutely sure nothing planar is happening but here I suppose that's not a given.

0.0,
id="polygon_distance_linestring_through",
),
# Linestring x polygon (linestring fully outside)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of transposing these cases, should we build that into the harness/fixture? then we could also automatically test:

  • both cases individually
  • equality between both cases

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped this one for now to avoid more testing infrastructure. The distance cases in C++ have a bit more structure (e.g., checking that dwithin on the exact size works...)

Copy link
Copy Markdown
Contributor

@james-willis james-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some ideas/nitpicks. up to you

],
)
def test_st_max_distance_zm(geom1, geom2, expected):
eng = SedonaDB.create_or_skip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we always parameterize for consistency?

pytest.param(
"MULTIPOINT ((0 0), (1 1))",
"POINT (2 2)",
"POINT (nan nan)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really the right thing to return? I would think POINT EMPTY is more correct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in the testing framework/geoarrow-c. I annotated this like is done elsewhere in the tests

),
],
)
def test_st_union(eng, geom1, geom2, expected):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlapping polygon and overlapping at terminus linestring could be good cases.

),
],
)
def test_st_symdifference(eng, geom1, geom2, expected):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could use some partially overlapping cases too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these for all the overlays (a few had to go in the bug bin of follow-ups)

id="empty_point_equals_empty_linestring",
),
# Fast path for identical values
pytest.param(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a case for wrap around?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one for wraparound for the predicates (the equals one had to go in the bug overflow bin to fix in s2geography)

# Linestrings
pytest.param("LINESTRING (0 0, 0 1)", "POINT (0 0.5)", id="linestring"),
pytest.param(
"LINESTRING (0 0, 0 1, 0 5)", "POINT (0 2.5)", id="linestring_two_segments"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a case where the midpoint isnt colinear?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a polar case here!

Comment on lines +472 to +477
pytest.param(
"POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))",
100000.0,
88052039626.29015,
id="polygon_positive_distance",
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do a triangle? rectangle? concave geometry?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly just checking that it's wired up to the correct buffer-er...as further cases come up we can add to this list.

"POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))",
id="polygon_snap",
),
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about polygon where two coordinates get rounded to the same value? also smae case but it degenerates the polygon

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a case for this but it doesn't pass yet 😬 #822

Copy link
Copy Markdown
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The only thing we might consider is to see if documents for these functions have been updated to accept both geometry and geography, for example, STPoints, should say "Native implementation to count all the points of a geometry or a geography".

Otherwise, LGTM!

Copy link
Copy Markdown
Contributor

@prantogg prantogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@paleolimbot
Copy link
Copy Markdown
Member Author

The only thing we might consider is to see if documents for these functions have been updated to accept both geometry and geography, for example, STPoints, should say "Native implementation to count all the points of a geometry or a geography".

Good point! I added #836 to handle this one. It's a pretty straightforward but also somewhat large change

@paleolimbot paleolimbot merged commit 24d0935 into apache:main May 13, 2026
17 checks passed
@paleolimbot paleolimbot deleted the geog-testing branch May 13, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants