-
Notifications
You must be signed in to change notification settings - Fork 43
feat(rust/sedona-geo): Add ST_AsGeoJSON #469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this!
From CI I see:
error: use of a fallible conversion when an infallible one could be used
--> rust/sedona-functions/src/st_asgeojson.rs:179:71
|
179 | let geojson_geom: geojson::Geometry = (&geom).try_into().map_err(|e| {
| ^^^^^^^^ help: use: `into`
|
= note: converting `&Geometry` to `Geometry` cannot fail
= help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.92.0/index.html#unnecessary_fallible_conversions
= note: `-D clippy::unnecessary-fallible-conversions` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::unnecessary_fallible_conversions)]`
You should be able to run cargo clippy --fix and I think it will fix it for you.
This needs Python integration tests in python/sedonadb/tests (I think there is a file for one of the IO functions). The Python integration tests check against PostGIS, and you can follow the examples from the other functions with respect to which things to test (usually: all the empties, one non-empty for each geometry type).
I might suggest for the initial implementation to skip the output type parameter (i.e., get used to the testing helpers and repo structure and implement the output type parameter in a follow-up PR). I left a note about how we might make this faster using geoarrow-rs but no need to do that here.
| args: &[ColumnarValue], | ||
| geojson_type: GeoJsonType, | ||
| ) -> Result<ColumnarValue> { | ||
| let executor = WkbExecutor::new(arg_types, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good reason to put this in the sedona-geo crate is that there we have the GeoTypesExecutor (which will let you remove the wkb -> geo_types conversion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to moving to this from rust/sedona-functions/ to rust/sedona-geo. Note you need to do this in order to access these helpful methods/structs.
GeoTypesExecutor uses a item_to_geometry() function that implements exactly what you did in wkb_to_geometry() below (hence we can remove it). If you find GeoTypesExecutor isn't flexible enough for this function (like I did for a different function), you could consider using item_to_geometry() directly instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! GeoTypesExecutor is exactly what i needed!
| if let ScalarValue::Utf8(Some(json_str)) = result { | ||
| // Verify it's valid JSON and contains expected structure | ||
| let parsed: serde_json::Value = serde_json::from_str(json_str.as_str()).unwrap(); | ||
| assert_eq!(parsed["type"], "Point"); | ||
| assert!(parsed["coordinates"].is_array()); | ||
| } else { | ||
| panic!("Expected Utf8 scalar value"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our tester helper is probably the best way to test these results:
| if let ScalarValue::Utf8(Some(json_str)) = result { | |
| // Verify it's valid JSON and contains expected structure | |
| let parsed: serde_json::Value = serde_json::from_str(json_str.as_str()).unwrap(); | |
| assert_eq!(parsed["type"], "Point"); | |
| assert!(parsed["coordinates"].is_array()); | |
| } else { | |
| panic!("Expected Utf8 scalar value"); | |
| } | |
| tester.assert_scalar_result_equal(result, #"{"key": "value"}"#); |
| // Verify the array has the expected number of elements | ||
| assert_eq!((*result_array).len(), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably do want to create an explicit array here and check for equality. I believe we do this for other functions that return a string (like ST_GeometryType()).
| impl SedonaScalarKernel for STAsGeoJSON { | ||
| fn return_type(&self, args: &[SedonaType]) -> Result<Option<SedonaType>> { | ||
| let matcher = ArgMatcher::new( | ||
| vec![ArgMatcher::is_geometry_or_geography()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| vec![ArgMatcher::is_geometry_or_geography()], | |
| vec![ArgMatcher::is_geometry()], |
(GeoJSON has explicit planar edges, unlike geographies)
| fn return_type(&self, args: &[SedonaType]) -> Result<Option<SedonaType>> { | ||
| let matcher = ArgMatcher::new( | ||
| vec![ | ||
| ArgMatcher::is_geometry_or_geography(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ArgMatcher::is_geometry_or_geography(), | |
| ArgMatcher::is_geometry(), |
| return Err(DataFusionError::Execution( | ||
| "Second argument to ST_AsGeoJSON must be a string literal".to_string(), | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Err(DataFusionError::Execution( | |
| "Second argument to ST_AsGeoJSON must be a string literal".to_string(), | |
| )); | |
| return exec_err!("Second argument to ST_AsGeoJSON must be a string literal"); |
| // Convert WKB geometry to geo_types::Geometry using geo-traits | ||
| let geo_geometry = wkb_to_geometry(item)?; | ||
|
|
||
| match geo_geometry { | ||
| Some(geom) => { | ||
| // Convert geo_types::Geometry to geojson::Geometry | ||
| let geojson_geom: geojson::Geometry = (&geom).try_into().map_err(|e| { | ||
| DataFusionError::Execution(format!( | ||
| "Failed to convert to GeoJSON: {:?}", | ||
| e | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this in a follow-up PR and not this one; however, this chain of WKB bytes -> Wkb object -> GeoTypes geometry -> GeoJSON geometry is probably very slow. If we can use:
(which should operate directly on a Wkb object), then I think we might be able to make this quite a bit faster (and support Z values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The current implementation still does the full chain (via GeoTypesExecutor -> geo_types::Geometry -> geojson::Geometry -> JSON string).
I'm happy to address the optimization using geoarrow-geojson's encoder in a follow-up PR if that's fine by you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed a ticket so we don't forget: #472
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to rust tests, we need integration tests in test_functions.py. Instructions for testing it locally and iterating are here.
| .with_argument( | ||
| "type", | ||
| "string (optional): Output type - 'Simple' (default), 'Feature', or 'FeatureCollection'", | ||
| ) | ||
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0))") | ||
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'Feature')") | ||
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'FeatureCollection')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you already discussed this or not, but I'll point out that this follows the Sedona API. The PostGIS one is completely different.
text ST_AsGeoJSON(record feature, text geom_column="", integer maxdecimaldigits=9, boolean pretty_bool=false, text id_column='');
text ST_AsGeoJSON(geometry geom, integer maxdecimaldigits=9, integer options=8);
text ST_AsGeoJSON(geography geog, integer maxdecimaldigits=9, integer options=0);
We have generally followed the PostGIS behavior for all previous functions that I know of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, there's also a note on the Sedona page that says
"This method is not recommended. Please use Sedona GeoJSON data source to write GeoJSON files." (bold is mine)
... haha. Maybe it's because the api is different 🤷
| args: &[ColumnarValue], | ||
| geojson_type: GeoJsonType, | ||
| ) -> Result<ColumnarValue> { | ||
| let executor = WkbExecutor::new(arg_types, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to moving to this from rust/sedona-functions/ to rust/sedona-geo. Note you need to do this in order to access these helpful methods/structs.
GeoTypesExecutor uses a item_to_geometry() function that implements exactly what you did in wkb_to_geometry() below (hence we can remove it). If you find GeoTypesExecutor isn't flexible enough for this function (like I did for a different function), you could consider using item_to_geometry() directly instead.
|
Thanks for the feedback! I've moved the code to However, I'm running into some CI python test failures when comparing GeoJSON output between SedonaDB and PostGIS.
What's your preference for handling these format differences? Should we:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The geojson crate's conversion from geo_types::Polygon to geojson::Geometry represents POLYGON EMPTY as {"type":"Polygon","coordinates":[[]]} (one empty ring), while PostGIS returns {"type":"Polygon","coordinates":[]} (no rings).
I do think this is technically a bug upstream (see the toggle for why), and there does seem to be a maintainer responding to new isssues/PRs in the geojson rust repo. I'd say if you're eager to fix it, then go for it, but I think it's fine to kick this down the road since this is such a specific edge case. At least it's valid traditional JSON. We can just file an issue and comment the test out.
Proof from the [GeoJSON spec](https://datatracker.ietf.org/doc/html/rfc7946) (I asked an LLM and verified it 🤠)
A linear ring is a closed LineString with four or more positions.
For type "Polygon", the "coordinates" member MUST be an array of linear ring coordinate arrays.
Coordinate formatting (integers vs floats): The geojson crate always serializes coordinates as floats, even for whole numbers. PostGIS optimizes by omitting .0 for whole numbers.
I think fixing this would be convenient to do when we decide to add support for the integer maxdecimaldigits argument for ST_AsGeoJSON, which would likely involve modifying similar code upstream. Speaking of, it seems like there's an old PR that might be fixing this for us: georust/geojson#234, though I didn't look too deeply into it.
For both cases, I'm in favor of tweaking the tests a bit to pass and keeping existing behavior for convenience. We'll see what Dewey thinks.
| .with_argument("geom", "geometry: Input geometry") | ||
| .with_argument( | ||
| "type", | ||
| "string (optional): Output type - 'Simple' (default), 'Feature', or 'FeatureCollection'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "string (optional): Output type - 'Simple' (default), 'Feature', or 'FeatureCollection'", |
(based on you removing it from the actual kernel implementation)
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'Feature')") | ||
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'FeatureCollection')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'Feature')") | |
| .with_sql_example("SELECT ST_AsGeoJSON(ST_Point(1.0, 2.0), 'FeatureCollection')") |
Co-authored-by: Peter Nguyen <petern0408@gmail.com>
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the edits, and thank you Peter for the review + follow-up!
I'm personally OK with the 1.0 vs 1 difference, although we should solve the empty polygon and empty point here.
| ("POINT (1 2)", '{"type":"Point","coordinates":[1.0,2.0]}'), | ||
| ( | ||
| "LINESTRING (0 0, 1 1)", | ||
| '{"type":"LineString","coordinates":[[0.0,0.0],[1.0,1.0]]}', | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the failing tests, how about something like:
if eng is PostGIS:
expected = expected.replace(".0", "")Slightly less hacky but might not work (not sure if Python dictionary equality works exactly in the way we need it to for this):
result = eng.execute_and_collect()
df = eng.result_to_pandas(result)
assert df.shape == (1, 1)
assert json.loads(df.iloc[0, 0]) == json.loads(expected)Either works for me! I don't particularly mind the difference (in the process of making it faster we might be able to fix the incompatibility as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, just add 0.5 to all of your coordinates (maybe add a comment with a link tp #472 or a link to this PR and a short explanation of why you aren't using any integer coordinates in your test data given that all the other tests do).
| (None, None), | ||
| ("LINESTRING EMPTY", '{"type":"LineString","coordinates":[]}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (None, None), | |
| ("LINESTRING EMPTY", '{"type":"LineString","coordinates":[]}'), | |
| (None, None), | |
| ("POINT EMPTY", '{"type":"Point","coordinates":[]}'), | |
| ("LINESTRING EMPTY", '{"type":"LineString","coordinates":[]}'), |
rust/sedona-geo/src/st_asgeojson.rs
Outdated
| Some(geom) => { | ||
| // Convert geo_types::Geometry to geojson::Geometry | ||
| let geojson_geom: geojson::Geometry = (&geom).into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can solve the lack of empty point support and empty polygon incompatiblity here by special casing them. Maybe:
match geom.as_type() {
geo_traits::GeometryType::Point(pt) => {
if pt.coord().is_none() { /* special case the empty point output and return */ }
}
geo_traits::GeometryType::Polygon(pl) => {
if pl.exterior().is_none() { /* special case the empty polygon output and return */ }
}
let geo_geom = item_to_geometry(geom)?;
/* current implementation */
}This means you'd have to go back to the WkbExecutor because geo_types can't represent an empty point. Most of the functions implemented in this crate have to do that (the ones that don't aren't used by default at the moment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a quick fix for POLYGON EMPTY in geojson: georust/geojson#262. Though we should just move forward with the workaround.
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Implements the
ST_AsGeoJSONfunction for converting geometry objects to GeoJSON format.