Skip to content

Conversation

@petern48
Copy link
Contributor

@petern48 petern48 commented Jul 21, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Refactor to use self.name instead of first_geom column

How was this patch tested?

Added new test test_complex_df to ensure that we are performing the operation on the correct column.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

Comment on lines +4075 to +4088
def _get_series_col_name(ps_series: pspd.Series) -> str:
series_name = ps_series.name if ps_series.name else SPARK_DEFAULT_SERIES_NAME
spark_col_names = set(ps_series._internal.spark_frame.columns)

if series_name in spark_col_names:
return series_name
# Combining different frames (e.g in the GeoDataFrame.setitem method adds these prefixes
# It's easier to check for them at read time than rename them at write time
# For GeoDataFrame.setitem, the left ("this") side if not overridden, so we always prefer the right ("that") side
# which is why it needs to come first in the if/elif/else sequence
elif f"__that_{series_name}" in spark_col_names:
return f"__that_{series_name}"
elif f"__this_{series_name}" in spark_col_names:
return f"__this_{series_name}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to take a different approach that will allow us to avoid ugly code like this in #2131

@petern48
Copy link
Contributor Author

Another reason the spark sql approach just isn't working well is that even with my workarounds, the CI tests fail on different versions:

FAILED tests/geopandas/test_match_geopandas_dataframe.py::TestMatchGeopandasDataFrame::test_rename_geometry - ValueError: Series name random not found in spark_col_names {'__index_level_0__', 'multilinestrings', 'polygons', 'linestrings', 'geomcollection', '__natural_order__', 'multipolygons', 'points', 'multipoints'}

@petern48 petern48 closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geopandas: Refactor to use proper column name instead of first geomcolumn

1 participant