[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

ueshin · 2023-09-11T20:28:52Z

What changes were proposed in this pull request?

Refactors converters for complex types to short cut when the element types don't need converters.

The following refactors are done in this PR:

Provide a shortcut when the element types in complex types don't need converters
Check Nones before calling the converter
Remove extra type checks just for assertions

Why are the changes needed?

When the element types in complex types don't need converters, we can provide a shortcut to avoid extra function calls.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added related tests and existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

…ypes don't need converters.

python/pyspark/sql/tests/pandas/test_types.py

allisonwang-db · 2023-09-12T20:13:51Z

python/pyspark/sql/pandas/types.py

@@ -793,51 +851,94 @@ def correct_timestamp(pser: pd.Series) -> pd.Series:
    def _converter(dt: DataType) -> Optional[Callable[[Any], Any]]:

        if isinstance(dt, ArrayType):
-            _element_conv = _converter(dt.elementType) or (lambda x: x)


Does the performance overhead come from this lambda function? Or does it come from this if statement:

def convert_xxx: if value is None: return None

Yes, it's from the unnecessary lambda function call.

ueshin · 2023-09-13T17:54:53Z

The test failure is from org.apache.spark.sql.connect.execution.ReattachableExecuteSuite. I don't think it's related to this PR.

ueshin · 2023-09-14T17:43:00Z

Thanks! merging to master.

Refactor converters for complex types to short cut when the element t…

43994f9

…ypes don't need converters.

ueshin requested review from HyukjinKwon and zhengruifeng September 11, 2023 20:28

github-actions bot added SQL BUILD PYTHON labels Sep 11, 2023

Test.

0b69d9e

HyukjinKwon approved these changes Sep 11, 2023

View reviewed changes

zhengruifeng approved these changes Sep 12, 2023

View reviewed changes

allisonwang-db reviewed Sep 12, 2023

View reviewed changes

ueshin added 3 commits September 12, 2023 14:06

Merge branch 'master' into issues/SPARK-45118/converters

56ef616

Fix.

987b947

Fix.

f73932a

ueshin closed this in 090fd18 Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

ueshin commented Sep 11, 2023

allisonwang-db Sep 12, 2023

ueshin Sep 12, 2023 •

edited

ueshin commented Sep 13, 2023 •

edited

ueshin commented Sep 14, 2023

[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

Conversation

ueshin commented Sep 11, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

allisonwang-db Sep 12, 2023

Choose a reason for hiding this comment

ueshin Sep 12, 2023 • edited

Choose a reason for hiding this comment

ueshin commented Sep 13, 2023 • edited

ueshin commented Sep 14, 2023

ueshin Sep 12, 2023 •

edited

ueshin commented Sep 13, 2023 •

edited