-
Notifications
You must be signed in to change notification settings - Fork 36
Description
While most physical manipulations for the geography type (e.g., to/from WKB/WKT/Point) are implemented outside of s2geography, most operations that require doing math currently go through the s2geography C++ library which currently does a rather naive approach similar to what we do for GEOS: make me a "Geography", do something, then write the output. This is implemented in the s2geography project ( https://github.com/paleolimbot/s2geography ) which powers Spherely in Python and s2 in R.
Just like creating GEOS geometries from WKB in a loop incurs substantial overhead, so does creating geographies. There are a few things we can do to speed up s2-based operations:
- For index-assisted operations like the predicates, implement
S2Shapeon top of WKB, similar to how ourwkb::Wkbrust object is a zero-copy wrapper around the WKB buffer that doesn't copy it. I did a blog post several years ago demonstrating this concept: https://dewey.dunnington.ca/post/2021/prototyping-an-apache-arrow-representation-of-geometry/#zero-copy-s2arrow , but never did anything with it. Now is the time! - For simpler operations like perimeter and length, just iterate over the coordinates and use S2's fantastic set of primitives to do the work.
- For repeated operations that benefit from preparedness, use the existing ShapeIndexGeography as a prepared geometry. This can be serialized as well such that we could implement a prepared geography Arrow type. This will probably be necessary to support a reasonable spatial join implementation for Geography.
I'm planning on making updates to the geoarrow-c stack (to support the zero-copy shim on top of WKB that we'll need in s2geography) and s2geography (to implement the S2Shape wrapper + optimized predicates) in the next few weeks, targeting SedonaDB 0.2.