[SPARK-54317][PYTHON][CONNECT] Unify Arrow conversion logic for Classic and Connect toPandas #53045

Yicong-Huang · 2025-11-13T21:56:27Z

What changes were proposed in this pull request?

This PR merges the Arrow conversion code paths between Spark Connect and Classic Spark by extracting shared logic into a reusable helper function _convert_arrow_table_to_pandas.

Why are the changes needed?

This unifies optimizations from two separate PRs:

[SPARK-53967] (Classic): Avoid intermediate pandas DataFrame creation by converting Arrow columns directly to Series
[SPARK-54183] (Connect): Same optimization implemented for Spark Connect

Does this PR introduce any user-facing change?

No. This is a pure refactoring with no API or behavior changes.

How was this patch tested?

Ran existing Arrow test suite: python/pyspark/sql/tests/arrow/test_arrow.py

Was this patch authored or co-authored using generative AI tooling?

Co-Generated-by Cursor with Claude 4.5 Sonnet

Extract shared conversion logic into _convert_arrow_table_to_pandas helper function in conversion.py to avoid code duplication between Classic and Connect. Key changes: - Add _convert_arrow_table_to_pandas helper function in conversion.py - Update Classic toPandas to handle empty tables explicitly (SPARK-51112) - Only apply self_destruct options when table has rows - Connect imports the shared helper from conversion.py This unifies the optimizations from SPARK-53967 and SPARK-54183: - Avoid intermediate pandas DataFrame during conversion - Convert Arrow columns directly to Series with type converters - Better memory efficiency with self_destruct on non-empty tables Co-authored-by: cursor

python/pyspark/sql/connect/client/core.py

python/pyspark/sql/pandas/conversion.py

holdenk · 2025-11-15T00:45:50Z

python/pyspark/sql/connect/client/core.py

-                        for arrow_col, field in zip(table.columns, schema.fields)
-                    ],
-                    axis="columns",
-                )


Could we not also unify the # Restore original column names (including duplicates) pdf.columns = schema.names else: # empty columns pdf = table.to_pandas(**pandas_options) logic?

yes after reading more, I moved more common logic into the helper function to unify them. Thanks!

python/pyspark/sql/connect/client/core.py

zhengruifeng · 2025-11-18T04:54:13Z

merged to master

Yicong-Huang added 2 commits November 13, 2025 13:50

revert change on types.py

4ac2656

github-actions bot added SQL PYTHON CONNECT labels Nov 13, 2025

gaogaotiantian reviewed Nov 13, 2025

View reviewed changes

python/pyspark/sql/connect/client/core.py Outdated Show resolved Hide resolved

Yicong-Huang added 2 commits November 14, 2025 10:19

revert change on types.py

28e755b

fix: handle comments

2037aa8

Yicong-Huang requested a review from gaogaotiantian November 14, 2025 18:21

fix: mypy

a76043d

gaogaotiantian reviewed Nov 14, 2025

View reviewed changes

python/pyspark/sql/pandas/conversion.py Outdated Show resolved Hide resolved

Remove asterisk from function parameters

331eb41

gaogaotiantian approved these changes Nov 14, 2025

View reviewed changes

holdenk reviewed Nov 15, 2025

View reviewed changes

refactor: further simplify logic

9dd85d2

Yicong-Huang requested a review from holdenk November 17, 2025 04:15

Yicong-Huang added 3 commits November 16, 2025 23:39

Remove unused import of _has_type

bd77465

fix: mypy complain

1f8049f

fix: format

1a5fad9

zhengruifeng reviewed Nov 18, 2025

View reviewed changes

python/pyspark/sql/connect/client/core.py Show resolved Hide resolved

Yicong-Huang requested a review from zhengruifeng November 18, 2025 01:28

zhengruifeng approved these changes Nov 18, 2025

View reviewed changes

zhengruifeng closed this in b712700 Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54317][PYTHON][CONNECT] Unify Arrow conversion logic for Classic and Connect toPandas #53045

[SPARK-54317][PYTHON][CONNECT] Unify Arrow conversion logic for Classic and Connect toPandas #53045

Uh oh!

Yicong-Huang commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

holdenk Nov 15, 2025

Uh oh!

Yicong-Huang Nov 17, 2025

Uh oh!

Uh oh!

zhengruifeng commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-54317][PYTHON][CONNECT] Unify Arrow conversion logic for Classic and Connect toPandas #53045

[SPARK-54317][PYTHON][CONNECT] Unify Arrow conversion logic for Classic and Connect toPandas #53045

Uh oh!

Conversation

Yicong-Huang commented Nov 13, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

holdenk Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhengruifeng commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants