chore(ci): Port distance and predicate geography tests from s2geography and run them against BigQuery and PostGIS where possible#816
Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
In apache/sedona-db#816 I ran the functions based on these against PostGIS and BigQuery and it turned up a few things. There is also a predicate issue (and probably more predicate issues that I haven't considered yet), and I'm collecting those separately: apache/sedona-db#817
| "st_equals", | ||
| "st_intersects", | ||
| "st_within", | ||
| ]; |
There was a problem hiding this comment.
This was a code change...I added kernels to match things like st_intersects(geography, NULL). This is mostly for parameter binding...params are inserted into a plan as a Null type and the plan is validated before it's replace with something of the correct type.
There was a problem hiding this comment.
I fixed a few issues in s2geography and updated the submodule.
There was a problem hiding this comment.
Pull request overview
This PR ports a large suite of geography predicate, distance/measure, overlay, accessor, transformation, and S2-specific integration tests (originally from s2geography) into sedonadb, running them against SedonaDB and (where supported) BigQuery/PostGIS. It also makes small kernel-level changes needed for those tests, notably enabling ST_NPoints on geography and adding explicit NULL-type kernel matchers for selected s2geography-backed functions.
Changes:
- Add extensive Python integration tests for geography predicates, distances/measures, overlays, transformations, accessors, constructors/formatters, and S2 helpers across SedonaDB/BigQuery/PostGIS where possible.
- Extend
ST_NPointstype matching to accept geography inputs. - Add Rust-side NULL-type helper kernels for selected s2geography scalar UDFs to handle DataFusion
NULL-typed arguments.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-functions/src/st_points.rs | Allow ST_NPoints to accept geography in addition to geometry. |
| c/sedona-s2geography/src/kernels.rs | Add NULL-type helper kernels for select s2geography functions + tests validating NULL dispatch. |
| python/sedonadb/tests/geography/test_geog_transformations.py | Add/expand geography transformation tests (centroid, convex hull, buffering, simplify, reduce precision, etc.). |
| python/sedonadb/tests/geography/test_geog_predicates.py | Add/expand geography predicate behavior tests (intersects/contains/within/equals/disjoint). |
| python/sedonadb/tests/geography/test_geog_measures.py | Add/expand geography measurement tests (area/length/perimeter/line locate point). |
| python/sedonadb/tests/geography/test_geog_distance.py | Add comprehensive geography distance-related tests (distance/dwithin/maxdistance/closestpoint/shortestline/longestline). |
| python/sedonadb/tests/geography/test_geog_overlay.py | Add geography overlay operation tests (intersection/difference/union/symdifference) and empty-handling cases. |
| python/sedonadb/tests/geography/test_geog_accessors.py | Add/expand geography accessor tests (dimension/isempty/npoints/numgeometries/x/y/geometrytype/isclosed/iscollection). |
| python/sedonadb/tests/geography/test_geog_s2.py | Add S2 helper function tests for geography (cell id from point, covering cell ids). |
| python/sedonadb/tests/geography/test_constructors_parsers_formatters.py | Add ST_AsText geography roundtrip tests (SedonaDB/PostGIS) including Z/M/ZM coverage (SedonaDB only). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if eng.name() == "postgis": | ||
| eps = 1e-2 | ||
| else: | ||
| eps = 1e-15 |
There was a problem hiding this comment.
should we encode this into the assert_query_results method somehow? maybe an is_spheroidal flag?
There was a problem hiding this comment.
I added eng.geography_numeric_epsilon() and I think I managed to remove all of these
| 10007559.105973553, | ||
| id="point_distance_wraparound_lng", | ||
| ), | ||
| # Point x linestring (point on linestring) |
There was a problem hiding this comment.
should we consider doing some more "difficult" point on linestring, specifically a line that is not axis aligned?
There was a problem hiding this comment.
That's a good point...I added a polar case. These are coming from C++ where I'm absolutely sure nothing planar is happening but here I suppose that's not a given.
| 0.0, | ||
| id="polygon_distance_linestring_through", | ||
| ), | ||
| # Linestring x polygon (linestring fully outside) |
There was a problem hiding this comment.
instead of transposing these cases, should we build that into the harness/fixture? then we could also automatically test:
- both cases individually
- equality between both cases
There was a problem hiding this comment.
I skipped this one for now to avoid more testing infrastructure. The distance cases in C++ have a bit more structure (e.g., checking that dwithin on the exact size works...)
james-willis
left a comment
There was a problem hiding this comment.
some ideas/nitpicks. up to you
| ], | ||
| ) | ||
| def test_st_max_distance_zm(geom1, geom2, expected): | ||
| eng = SedonaDB.create_or_skip() |
There was a problem hiding this comment.
should we always parameterize for consistency?
| pytest.param( | ||
| "MULTIPOINT ((0 0), (1 1))", | ||
| "POINT (2 2)", | ||
| "POINT (nan nan)", |
There was a problem hiding this comment.
is this really the right thing to return? I would think POINT EMPTY is more correct.
There was a problem hiding this comment.
This is a bug in the testing framework/geoarrow-c. I annotated this like is done elsewhere in the tests
| ), | ||
| ], | ||
| ) | ||
| def test_st_union(eng, geom1, geom2, expected): |
There was a problem hiding this comment.
overlapping polygon and overlapping at terminus linestring could be good cases.
| ), | ||
| ], | ||
| ) | ||
| def test_st_symdifference(eng, geom1, geom2, expected): |
There was a problem hiding this comment.
this could use some partially overlapping cases too
There was a problem hiding this comment.
I added these for all the overlays (a few had to go in the bug bin of follow-ups)
| id="empty_point_equals_empty_linestring", | ||
| ), | ||
| # Fast path for identical values | ||
| pytest.param( |
There was a problem hiding this comment.
should there be a case for wrap around?
There was a problem hiding this comment.
I added one for wraparound for the predicates (the equals one had to go in the bug overflow bin to fix in s2geography)
| # Linestrings | ||
| pytest.param("LINESTRING (0 0, 0 1)", "POINT (0 0.5)", id="linestring"), | ||
| pytest.param( | ||
| "LINESTRING (0 0, 0 1, 0 5)", "POINT (0 2.5)", id="linestring_two_segments" |
There was a problem hiding this comment.
should there be a case where the midpoint isnt colinear?
There was a problem hiding this comment.
I added a polar case here!
| pytest.param( | ||
| "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))", | ||
| 100000.0, | ||
| 88052039626.29015, | ||
| id="polygon_positive_distance", | ||
| ), |
There was a problem hiding this comment.
should we do a triangle? rectangle? concave geometry?
There was a problem hiding this comment.
These are mostly just checking that it's wired up to the correct buffer-er...as further cases come up we can add to this list.
| "POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))", | ||
| id="polygon_snap", | ||
| ), | ||
| ], |
There was a problem hiding this comment.
what about polygon where two coordinates get rounded to the same value? also smae case but it degenerates the polygon
There was a problem hiding this comment.
I have a case for this but it doesn't pass yet 😬 #822
zhangfengcdt
left a comment
There was a problem hiding this comment.
Nitpick: The only thing we might consider is to see if documents for these functions have been updated to accept both geometry and geography, for example, STPoints, should say "Native implementation to count all the points of a geometry or a geography".
Otherwise, LGTM!
Good point! I added #836 to handle this one. It's a pretty straightforward but also somewhat large change |
This PR adds tests for predicates and distance functions. These tests were scraped from s2geography but here they also run against bigquery (where possible) and postgis (where possible). This PR is a pretty good summary of our geography implementation and the functions we support (as well as our behaviour differs from PostGIS or BigQuery). I know this is a lot of lines but I promise it's almost all tests 🙂
I opened up some follow-ups for tests that probably should pass but don't yet as well as some functions that need implementing upstream in s2geography to complete parity with BigQuery.