-
Notifications
You must be signed in to change notification settings - Fork 35
feat(rust/sedona-functions): Add SRID argument to ST_Point() #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
In Python you should be able to do:
result = eng.execute_and_collect("<sql>")
df = eng.result_to_pandas(result)
geopandas.testing.assert_geodataframe_equal(df, expected)
(In theory result_to_pandas is CRS-aware, even for PostGIS)
| // TODO: This branch is not really the "invalid CRS value" case. | ||
| // If it can be cast to Utf-8, it falls into the first branch. | ||
| return sedona_internal_err!("Invalid CRS value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Can't cast Crs {crs:?} to Utf8?
| ], | ||
| ); | ||
|
|
||
| tester.assert_return_type(WKB_GEOMETRY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great point that the UDF tester doesn't propagate CRSes because it doesn't consider scalar arguments. We could add return_type_with_with_scalars() to the ScalarUdfTester?
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Thanks for the hint! I tried it, but currently I'm seeing this error. I hope if this is just my implementation is not good, but
edit: answer to self. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: answer to self. return_field_from_args() calls return_type(), not return_type_from_args_and_scalars().
I think you're looking at the default trait implementation...I'm pretty the SedonaScalarFunction/SedonaKernel handles that correctly. I think that the return_type_from_args_and_scalars() here is returning None (I can't spot exactly why...popping through that in the debugger might help).
| fn return_type_from_args_and_scalars( | ||
| &self, | ||
| args: &[SedonaType], | ||
| scalar_args: &[Option<&ScalarValue>], | ||
| ) -> Result<Option<SedonaType>> { | ||
| let orig_args_len = args.len() - 1; | ||
| let orig_args = &args[..orig_args_len]; | ||
| let orig_scalar_args = &scalar_args[..orig_args_len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to check the number of arguments here (return None if it's not the expected number) to avoid a panic
Yes. It seems the problem was that |
The |
|
Ah, thanks. Got it at last... |
This reverts commit 64e7d13.
|
Okay, I think I figured out. I see two problems. (Sorry I was a bit confused. You are right in that functions around First, sedona-db/rust/sedona-testing/src/testers.rs Line 406 in 964c4dd
Second, sedona-db/rust/sedona-testing/src/testers.rs Line 187 in 964c4dd
|
|
I think this is ready for review now, but with one caveat about the differences from PostGIS. First, it seems PostGIS treats Also, another difference I found is that PostGIS doesn't accept CRS with authority e.g. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, another difference I found is that PostGIS doesn't accept CRS with authority e.g. EPSG:4326 while SedonaDB accepts. I don't think this difference is a problem, but I'm not sure if I should include this in the test cases.
We added ST_SetCrs() for this case to keep ST_SRID() more similar, although I think that the convenience of ST_Point(1, 2, '<string>') will be worth the slight digression from PostGIS...we also allow this in ST_Transform().
| # TODO: This is a bit tricky, but in PostGIS, NULL and unknown CRS are distinguished. | ||
| # | ||
| # - ST_SRID(ST_POINT(x, y, NULL)) returns NULL | ||
| # - ST_SRID(ST_POINT(x, y, 0)) returns 0 | ||
| # - ST_SRID(ST_POINT(x, y)) returns 0 | ||
| # | ||
| # (1, 1, None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ST_SetSRID() handles this in the same way as PostGIS and it would be helpful to handle that here (I'll leave a suggestion below about how we might do that)
| fn invoke_batch( | ||
| &self, | ||
| arg_types: &[SedonaType], | ||
| args: &[ColumnarValue], | ||
| ) -> Result<ColumnarValue> { | ||
| let orig_args_len = arg_types.len() - 1; | ||
| self.inner | ||
| .invoke_batch(&arg_types[..orig_args_len], &args[..orig_args_len]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the place where we'd have to check if let ColumnarValue::Scalar(sc) = args[orig_args_len] { if sc.is_null(), and perhaps modify the validity buffer of the inner.invoke_batch().to_array(). Feel free to punt on that and file a follow-on ticket 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, thanks! I didn't notice ST_POINT(x, y, NULL) should return NULL... I'll fix.
I tried this approach, but I couldn't find any API that exposes the actual buffer as mutable. So, I chose a different way that skips |
|
(Just a side note) |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I think you're right about propagating the error and there's one small potential improvement; however, this is great and those are unlikely corner cases I'm happy to punt into a future follow-on when we have time.
| // If the specified SRID is NULL, the result is also NULL. So, return | ||
| // NULL early and doesn't run `invoke_batch()`. | ||
| if let ColumnarValue::Scalar(sc) = &args[orig_args_len] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your last point is a good one...we should probably invoke_batch first no matter what (to propagate any errors). I don't think that anybody is relying the performance of returning a column full of nulls because they probably made a mistake 🙂
| // args should consist of the original args and one extra arg for | ||
| // specifying CRS. So, first, validate the length and separate these. | ||
| // | ||
| // [arg0, arg1, ..., crs_arg]; | ||
| // ^^^^^^^^^^^^^^^ | ||
| // orig_args | ||
| let orig_args_len = match (args.len(), scalar_args.len()) { | ||
| (0, 0) => return Ok(None), | ||
| (l1, l2) if l1 == l2 => l1 - 1, | ||
| _ => return sedona_internal_err!("Arg types and arg values have different lengths"), | ||
| }; | ||
|
|
||
| let orig_args = &args[..orig_args_len]; | ||
| let orig_scalar_args = &scalar_args[..orig_args_len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand this better now...this works, although probably if args.len() == 0 { return Ok(None) } is sufficient (there are a lot of places in our code where we rely on DataFusion passing us the right number of things). I was worried before that if somebody passed (e.g.,) a single argument to ST_Point() something funny would happen here, but I see now that the call to the inner.return_type_from_args_and_scalars() will return correctly return Ok(None) for that case.
Totally optional, but the error message for something like ST_Point('gazornenplat') would probably be better if you moved the call to inner.return_type_from_args_and_scalars() before the CRS parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sounds good to me.
|
Done! I hope I got what you meant. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
A quick note that hopefully in the next week or so we'll have item-level CRSes (i.e., for the "srid is an array" case we'll have a separate return type). I don't think that's much code change on top of this but thought I'd put it out there in case it affects anything you're working on!
|
Thanks, I saw the issue about item-level CRSes and was wondering how it relates to here. Looking forward to seeing how it is implemented! |
|
Just curious. I think this doesn't happen yet. Was it that you were simply too busy (I guess releasing is a tough job!), or you found some technical difficulty?
|
|
😬 I just didn't get there (focused on file IO for 0.2.0). The issue for this is #136 (I'll add some background to that on vaguely how I think it will work) |
|
Thanks, good to know! |
Part of #126
This pull request implements the
SRIDifiedKernelwrapper, which is suggested in #126, and applies it toST_Point()to see if it works.Currently, it seems to work. I need to figure out how to test this because
ScalarUdfTesterdoesn't returnSedonaType.