[GH-2095] Implement scalable workaround for GeoSeries.__init__() by petern48 · Pull Request #2096 · apache/sedona

petern48 · 2025-07-15T00:06:03Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [GH-XXX] my subject. Closes Geopandas.GeoSeries: Implement scalable workaround for GeoSeries.__init__() #2095

What changes were proposed in this PR?

In a previous PR, I wrote a workaround for the behavior of creating a Series from a Series. That workaround required the use of to_pandas() which is not scalable. I previously submitting a fix to Spark was told this would be a new feature that could not be backported. I then realized, I could just manually implement the logic in our codebase in the mean time since it's all in the constructor.

How was this patch tested?

Ensured existing tests pass

Did this PR include necessary documentation updates?

No, this PR does not affect any public API so no need to change the documentation.

petern48 · 2025-07-15T22:24:17Z

This PR makes the code lose the CRS info again. It's because of the way Sedona serializes the shapely Geometry objects. Unfortunately, it doesn't include the CRS info, so I have to look into the best way to address this.

petern48 · 2025-07-18T19:21:42Z

I think what's happening:
When I call .apply() to convert geometries into EWKB, Spark applies my specified function, but then calls GeometryType()'s serialize and deserialize function on my input thinking they're still Geometry objects, which leads to messed up bytes. Looks like I need to find a different workaround.

petern48 · 2025-07-18T20:41:23Z

@jiayuasu Do you happen to know off the top of your head where Sedona stores it's SRID/CRS info (e.g in a separate column, metadata, in the serialization format)? It's got to be somewhere, since it can be accessed by worker nodes.

EDIT: I've dug into the C code and see that it's serialized in the format. Still not sure why it's not retained once it becomes a shapely object for me.

jiayuasu · 2025-07-19T02:43:33Z

@Kontinuation did you have answers for Peter's question?

petern48 · 2025-07-19T02:49:33Z

I figured it all out after quite a bit of digging. SRID info is included in the serialization process per geometry object. Just a bunch of various bugs in Sedona and Spark covering things up.

Closing. New working PR is here: #2121

Kontinuation · 2025-07-19T03:41:10Z

I'm submitting a PR #2123 to resolve #2122. Hopefully it will resolve the srid preservation issue.

petern48 added 3 commits July 11, 2025 10:34

Save progress

c898c36

Fix tests intersection, from_*, and set_geometry after the merge

66a11e2

Implement scalable workaround

47d5463

github-actions bot added the sedona-python label Jul 15, 2025

petern48 added 3 commits July 14, 2025 17:45

Merge branch 'master' into scalable_workaround

5aaa220

Fix __init__ after merge

f66e2b4

Use wkb.loads instead of from_wkb for version compatibility

c4919a8

petern48 requested a review from zhangfengcdt July 15, 2025 16:46

petern48 marked this pull request as ready for review July 15, 2025 16:46

petern48 requested a review from jiayuasu as a code owner July 15, 2025 16:46

Merge branch 'master' into scalable_workaround

b9ee1a3

petern48 marked this pull request as draft July 15, 2025 22:22

petern48 closed this Jul 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[GH-2095] Implement scalable workaround for GeoSeries.init()#2096

[GH-2095] Implement scalable workaround for GeoSeries.init()#2096
petern48 wants to merge 7 commits intoapache:masterfrom
petern48:scalable_workaround

petern48 commented Jul 15, 2025

Uh oh!

petern48 commented Jul 15, 2025

Uh oh!

petern48 commented Jul 18, 2025 •

edited

Loading

Uh oh!

petern48 commented Jul 18, 2025 •

edited

Loading

Uh oh!

jiayuasu commented Jul 19, 2025

Uh oh!

petern48 commented Jul 19, 2025

Uh oh!

Kontinuation commented Jul 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

petern48 commented Jul 15, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Uh oh!

petern48 commented Jul 15, 2025

Uh oh!

petern48 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petern48 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiayuasu commented Jul 19, 2025

Uh oh!

petern48 commented Jul 19, 2025

Uh oh!

Kontinuation commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

petern48 commented Jul 18, 2025 •

edited

Loading

petern48 commented Jul 18, 2025 •

edited

Loading

Kontinuation commented Jul 19, 2025 •

edited

Loading