Skip to content

Add Geospatial geography files with statistics#109

Open
paleolimbot wants to merge 4 commits into
apache:masterfrom
paleolimbot:geospatial-geography-files
Open

Add Geospatial geography files with statistics#109
paleolimbot wants to merge 4 commits into
apache:masterfrom
paleolimbot:geospatial-geography-files

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

This PR adds some geography test files with statistics that may be useful in testing implementations. Notably, geography statistics can have xmin > xmax such that row group statistics can "wrap around" the antimeridian (e.g., so that ship position statistics in the pacific ocean, or a catalogue of wildlife in Fiji do not have longitude bounds that span the globe).

I recently implemented this in SedonaDB ( apache/sedona-db#805 ) based on the pluggable statistics writer in arrow-rs ( apache/arrow-rs#8414 ).

The underlying stats are coming from s2geometry's S2LatLngRectBounder ( https://github.com/google/s2geometry/blob/master/src/s2/s2latlng_rect_bounder.h ) via s2geography ( https://github.com/paleolimbot/s2geography/blob/main/src/s2geography/coverings.h#L13-L19 ). I'd love to simplify that and just have it all in a self-contained implementation but certain components of bounding on the sphere (e.g., if a polygon contains the north pole) are non-trivial.

The files are basically uniformly distributed (on the sphere) points, segements (basically sequential points sorted on a hilbert curve), and polygons (buffered points, basically rectangles). Both lines and polygon have some geographies that cross the antimeridian, and all the files have at least two row groups with wraparound statistics. All the files have at least one geometry intersecting the north pole and one intersecting the south pole (for polygons, the geometry contains it).

These aren't exhaustive cases for geographical testing but the addition of the wraparound statistics will hopefully help ensure pruning is correct.

Copy link
Copy Markdown
Member

@jiayuasu jiayuasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 13, 2026

Hmm, can the files be smaller?

@paleolimbot
Copy link
Copy Markdown
Member Author

That's a great point...the biggest one is now 60 KB and gets the same point (or line or polygon, as it may be) across.

@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 13, 2026

That's a great point...the biggest one is now 60 KB and gets the same point (or line or polygon, as it may be) across.

Accross the antimeridian, right?

@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 13, 2026

Both lines and polygon have some geographies that cross the antimeridian, and all the files have at least two row groups with wraparound statistics. All the files have at least one geometry intersecting the north pole and one intersecting the south pole (for polygons, the geometry contains it).

Can you add this to the geospatial README?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants