feat(rust/sedona-functions): Add SRID argument to ST_Point() #275

yutannihilation · 2025-11-04T15:35:57Z

Part of #126

This pull request implements the SRIDifiedKernel wrapper, which is suggested in #126, and applies it to ST_Point() to see if it works.

Currently, it seems to work. I need to figure out how to test this because ScalarUdfTester doesn't return SedonaType.

> sd_sql("select st_srid(st_point(1, 1, 4326))")
┌──────────────────────────────────────────────────┐
│ st_srid(st_point(Int64(1),Int64(1),Int64(4326))) │
│                      uint32                      │
╞══════════════════════════════════════════════════╡
│                                             4326 │
└──────────────────────────────────────────────────┘
Preview of up to 6 row(s)

paleolimbot

Awesome!

In Python you should be able to do:

result = eng.execute_and_collect("<sql>")
df = eng.result_to_pandas(result)
geopandas.testing.assert_geodataframe_equal(df, expected)

(In theory result_to_pandas is CRS-aware, even for PostGIS)

paleolimbot · 2025-11-04T16:12:40Z

rust/sedona-functions/src/st_setsrid.rs

+                // TODO: This branch is not really the "invalid CRS value" case.
+                //       If it can be cast to Utf-8, it falls into the first branch.
+                return sedona_internal_err!("Invalid CRS value");


Maybe Can't cast Crs {crs:?} to Utf8?

rust/sedona-functions/src/st_setsrid.rs

paleolimbot · 2025-11-04T16:57:08Z

rust/sedona-functions/src/st_point.rs

+            ],
+        );
+
+        tester.assert_return_type(WKB_GEOMETRY);


It's a great point that the UDF tester doesn't propagate CRSes because it doesn't consider scalar arguments. We could add return_type_with_with_scalars() to the ScalarUdfTester?

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>

yutannihilation · 2025-11-07T14:17:20Z

We could add return_type_with_with_scalars() to the ScalarUdfTester?

Thanks for the hint! I tried it, but currently I'm seeing this error. I hope if this is just my implementation is not good, but ScalarUdfTester::return_type() goes a different code path than actually invoking the function...?

called `Result::unwrap()` on an `Err` value: NotImplemented("st_point([Arrow(Float64), Arrow(Float64), Arrow(UInt16)]): No kernel matching arguments")

edit: answer to self. return_field_from_args() calls return_type(), not return_type_from_args_and_scalars().

https://github.com/apache/datafusion/blob/f32984b2dbf9e5a193c20643ce624167295fbd61/datafusion/expr/src/udf.rs#L628-L637

rust/sedona-testing/src/testers.rs

paleolimbot

edit: answer to self. return_field_from_args() calls return_type(), not return_type_from_args_and_scalars().

I think you're looking at the default trait implementation...I'm pretty the SedonaScalarFunction/SedonaKernel handles that correctly. I think that the return_type_from_args_and_scalars() here is returning None (I can't spot exactly why...popping through that in the debugger might help).

paleolimbot · 2025-11-07T16:12:43Z

rust/sedona-functions/src/st_setsrid.rs

+    fn return_type_from_args_and_scalars(
+        &self,
+        args: &[SedonaType],
+        scalar_args: &[Option<&ScalarValue>],
+    ) -> Result<Option<SedonaType>> {
+        let orig_args_len = args.len() - 1;
+        let orig_args = &args[..orig_args_len];
+        let orig_scalar_args = &scalar_args[..orig_args_len];


I think we may need to check the number of arguments here (return None if it's not the expected number) to avoid a panic

yutannihilation · 2025-11-07T16:24:13Z

I'm pretty the SedonaScalarFunction/SedonaKernel handles that correctly.

Yes. It seems the problem was that ScalarUdfTester holds ScalarUDF, not SedonaScalarUDF.

paleolimbot · 2025-11-07T16:35:37Z

ScalarUdfTester holds ScalarUDF, not SedonaScalarUDF.

The ScalarUDF is what DataFusion will interact with and calling the trait functions should still give correct results 😬

yutannihilation · 2025-11-07T17:04:37Z

Ah, thanks. Got it at last...

This reverts commit 64e7d13.

yutannihilation · 2025-11-08T01:37:35Z

Okay, I think I figured out. I see two problems. (Sorry I was a bit confused. You are right in that functions around return_type() itself works correctly).

First, ScalarUdfTester::invoke() calls ScalarUdfTester::return_type(), which doesn't consider scalar arguments. So, we need some hatch to specify the return type calculated from the scalar arguments. Fortunately, we have the actual scalar arguments in these invoke_*() variants, so this is easy to fix, but a code looks a bit complicated.

sedona-db/rust/sedona-testing/src/testers.rs

Line 406 in 964c4dd

return_field: self.return_type()?.to_storage_field("", true)?.into(),

Second, ScalarUdfTester::assert_scalar_result_equals() also calls return_type(). In this case, we don't have clue to infer the result type, so probably it needs to be provided from outside.

sedona-db/rust/sedona-testing/src/testers.rs

Line 187 in 964c4dd

let return_type = self.return_type().unwrap();

yutannihilation · 2025-11-08T03:05:43Z

I think this is ready for review now, but with one caveat about the differences from PostGIS.

First, it seems PostGIS treats NULL CRS and unknown (0) CRS differently. I'm not sure where we should tweak in SedonaDB to match this behavior, but I decided not to address it in this pull request. So, I commented out the failing test with some comments.

postgres=# SELECT ST_SRID(ST_POINT(1, 1));
 st_srid
---------
       0
(1 row)

postgres=# SELECT ST_SRID(ST_POINT(1, 1, null));
 st_srid
---------

(1 row)

Also, another difference I found is that PostGIS doesn't accept CRS with authority e.g. EPSG:4326 while SedonaDB accepts. I don't think this difference is a problem, but I'm not sure if I should include this in the test cases.

paleolimbot

Also, another difference I found is that PostGIS doesn't accept CRS with authority e.g. EPSG:4326 while SedonaDB accepts. I don't think this difference is a problem, but I'm not sure if I should include this in the test cases.

We added ST_SetCrs() for this case to keep ST_SRID() more similar, although I think that the convenience of ST_Point(1, 2, '<string>') will be worth the slight digression from PostGIS...we also allow this in ST_Transform().

paleolimbot · 2025-11-08T05:12:35Z

python/sedonadb/tests/functions/test_functions.py

+        # TODO: This is a bit tricky, but in PostGIS, NULL and unknown CRS are distinguished.
+        #
+        # - ST_SRID(ST_POINT(x, y, NULL)) returns NULL
+        # - ST_SRID(ST_POINT(x, y, 0)) returns 0
+        # - ST_SRID(ST_POINT(x, y)) returns 0
+        #
+        # (1, 1, None, None),


I think that ST_SetSRID() handles this in the same way as PostGIS and it would be helpful to handle that here (I'll leave a suggestion below about how we might do that)

paleolimbot · 2025-11-08T05:18:17Z

rust/sedona-functions/src/st_setsrid.rs

+    fn invoke_batch(
+        &self,
+        arg_types: &[SedonaType],
+        args: &[ColumnarValue],
+    ) -> Result<ColumnarValue> {
+        let orig_args_len = arg_types.len() - 1;
+        self.inner
+            .invoke_batch(&arg_types[..orig_args_len], &args[..orig_args_len])
+    }


I think this is the place where we'd have to check if let ColumnarValue::Scalar(sc) = args[orig_args_len] { if sc.is_null(), and perhaps modify the validity buffer of the inner.invoke_batch().to_array(). Feel free to punt on that and file a follow-on ticket 🙂

Ahh, thanks! I didn't notice ST_POINT(x, y, NULL) should return NULL... I'll fix.

yutannihilation · 2025-11-08T16:18:15Z

perhaps modify the validity buffer of the inner.invoke_batch().to_array()

I tried this approach, but I couldn't find any API that exposes the actual buffer as mutable. So, I chose a different way that skips invoke_batch().

yutannihilation · 2025-11-08T16:47:46Z

(Just a side note)
After I commented above, I started to wonder if it's really fine to skip invoking the actual logic. For example, should ST_GeomFromText() reject invalid WKT inputs when the SRID is NULL? But, it seems PostGIS also skips any checks, so it should be fine.

postgres=# SELECT ST_GeomFromText('point (1 1', null);
 st_geomfromtext 
-----------------

(1 row)

postgres=# SELECT ST_GeomFromText('point (1 1');
ERROR:  parse error - invalid geometry
HINT:  "point (1 1" <-- parse error at position 12 within geometry

paleolimbot

Thank you!

I think you're right about propagating the error and there's one small potential improvement; however, this is great and those are unlikely corner cases I'm happy to punt into a future follow-on when we have time.

paleolimbot · 2025-11-09T01:43:39Z

rust/sedona-functions/src/st_setsrid.rs

+        // If the specified SRID is NULL, the result is also NULL. So, return
+        // NULL early and doesn't run `invoke_batch()`.
+        if let ColumnarValue::Scalar(sc) = &args[orig_args_len] {


I think your last point is a good one...we should probably invoke_batch first no matter what (to propagate any errors). I don't think that anybody is relying the performance of returning a column full of nulls because they probably made a mistake 🙂

paleolimbot · 2025-11-09T01:50:53Z

rust/sedona-functions/src/st_setsrid.rs

+        // args should consist of the original args and one extra arg for
+        // specifying CRS. So, first, validate the length and separate these.
+        //
+        // [arg0, arg1, ..., crs_arg];
+        //  ^^^^^^^^^^^^^^^
+        //     orig_args
+        let orig_args_len = match (args.len(), scalar_args.len()) {
+            (0, 0) => return Ok(None),
+            (l1, l2) if l1 == l2 => l1 - 1,
+            _ => return sedona_internal_err!("Arg types and arg values have different lengths"),
+        };
+
+        let orig_args = &args[..orig_args_len];
+        let orig_scalar_args = &scalar_args[..orig_args_len];


I think I understand this better now...this works, although probably if args.len() == 0 { return Ok(None) } is sufficient (there are a lot of places in our code where we rely on DataFusion passing us the right number of things). I was worried before that if somebody passed (e.g.,) a single argument to ST_Point() something funny would happen here, but I see now that the call to the inner.return_type_from_args_and_scalars() will return correctly return Ok(None) for that case.

Totally optional, but the error message for something like ST_Point('gazornenplat') would probably be better if you moved the call to inner.return_type_from_args_and_scalars() before the CRS parsing.

Thanks, sounds good to me.

yutannihilation · 2025-11-09T03:00:08Z

Done! I hope I got what you meant.

paleolimbot

Thank you!

A quick note that hopefully in the next week or so we'll have item-level CRSes (i.e., for the "srid is an array" case we'll have a separate return type). I don't think that's much code change on top of this but thought I'd put it out there in case it affects anything you're working on!

yutannihilation · 2025-11-09T04:02:15Z

Thanks, I saw the issue about item-level CRSes and was wondering how it relates to here. Looking forward to seeing how it is implemented!

yutannihilation · 2025-12-01T22:00:46Z

Just curious. I think this doesn't happen yet. Was it that you were simply too busy (I guess releasing is a tough job!), or you found some technical difficulty?

A quick note that hopefully in the next week or so we'll have item-level CRSes

paleolimbot · 2025-12-02T01:52:45Z

😬

I just didn't get there (focused on file IO for 0.2.0). The issue for this is #136 (I'll add some background to that on vaguely how I think it will work)

yutannihilation · 2025-12-02T02:09:43Z

Thanks, good to know!

feat(sql): Add SRID argument to ST_Point()

1360eba

paleolimbot reviewed Nov 4, 2025

View reviewed changes

yutannihilation and others added 4 commits November 6, 2025 09:06

Update rust/sedona-functions/src/st_setsrid.rs

ab65370

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>

Merge remote-tracking branch 'upstream/main' into feat/srid-kernel

7c32e6a

Tweak

ead1ea0

Add return_type_with_scalar()...

272cc62

WIP

64e7d13

yutannihilation commented Nov 7, 2025

View reviewed changes

rust/sedona-testing/src/testers.rs Outdated Show resolved Hide resolved

paleolimbot reviewed Nov 7, 2025

View reviewed changes

yutannihilation added 4 commits November 8, 2025 02:04

Revert "WIP"

9b6f208

This reverts commit 64e7d13.

Check length

23d9095

Specify return type when invoke

39fd6a7

Tweak

7ce3160

yutannihilation added 5 commits November 8, 2025 10:48

Inline variables

5feefc0

Add some comments

00cc68d

Add more test cases

7e90da3

Add a test case of invalid SRID

9ebca35

Add Python test

1bc5611

paleolimbot marked this pull request as ready for review November 8, 2025 02:32

paleolimbot marked this pull request as draft November 8, 2025 02:32

yutannihilation added 3 commits November 8, 2025 11:45

Tweak test cases (not working yet)

21e0a21

Add TODO comment and skip failing test

91fcf64

Add some comment

617f7b8

yutannihilation marked this pull request as ready for review November 8, 2025 02:58

paleolimbot reviewed Nov 8, 2025

View reviewed changes

yutannihilation and others added 4 commits November 8, 2025 23:25

Address comments

6f3fab9

Tweak

3b3ecb1

Fix

8eab637

Improve

5ffdb4f

paleolimbot approved these changes Nov 9, 2025

View reviewed changes

yutannihilation added 3 commits November 9, 2025 11:46

Do not validate CRS when the inner UDF's return type is None

a6d6be0

Propagate errors even when CRS is NULL

efaf059

Refer to the result's length

d25849a

Remove unused imports

e0fc862

paleolimbot approved these changes Nov 9, 2025

View reviewed changes

paleolimbot merged commit 943d149 into apache:main Nov 9, 2025
12 checks passed

yutannihilation deleted the feat/srid-kernel branch November 9, 2025 04:02

paleolimbot added this to the 0.2.0 milestone Nov 27, 2025

yutannihilation mentioned this pull request Nov 30, 2025

docs(examples): Add note about the SRID arg of ST_Point() #392

Merged

feat(rust/sedona-functions): Add SRID argument to ST_Point() #275

feat(rust/sedona-functions): Add SRID argument to ST_Point() #275

Uh oh!

Conversation

yutannihilation commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yutannihilation commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yutannihilation commented Nov 7, 2025

Uh oh!

paleolimbot commented Nov 7, 2025

Uh oh!

yutannihilation commented Nov 7, 2025

Uh oh!

yutannihilation commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yutannihilation commented Nov 8, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yutannihilation commented Nov 8, 2025

Uh oh!

yutannihilation commented Nov 8, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yutannihilation commented Nov 9, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yutannihilation commented Nov 9, 2025

Uh oh!

yutannihilation commented Dec 1, 2025

Uh oh!

paleolimbot commented Dec 2, 2025

Uh oh!

yutannihilation commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yutannihilation commented Nov 4, 2025 •

edited

Loading

yutannihilation commented Nov 7, 2025 •

edited

Loading

yutannihilation commented Nov 8, 2025 •

edited

Loading