Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters #42874

Closed
wants to merge 5 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Sep 11, 2023

What changes were proposed in this pull request?

Refactors converters for complex types to short cut when the element types don't need converters.

The following refactors are done in this PR:

  • Provide a shortcut when the element types in complex types don't need converters
  • Check Nones before calling the converter
  • Remove extra type checks just for assertions

Why are the changes needed?

When the element types in complex types don't need converters, we can provide a shortcut to avoid extra function calls.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added related tests and existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

python/pyspark/sql/tests/pandas/test_types.py Outdated Show resolved Hide resolved
@@ -793,51 +851,94 @@ def correct_timestamp(pser: pd.Series) -> pd.Series:
def _converter(dt: DataType) -> Optional[Callable[[Any], Any]]:

if isinstance(dt, ArrayType):
_element_conv = _converter(dt.elementType) or (lambda x: x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the performance overhead come from this lambda function? Or does it come from this if statement:

def convert_xxx:
    if value is None:
        return None

Copy link
Member Author

@ueshin ueshin Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's from the unnecessary lambda function call.

@ueshin
Copy link
Member Author

ueshin commented Sep 13, 2023

The test failure is from org.apache.spark.sql.connect.execution.ReattachableExecuteSuite. I don't think it's related to this PR.

@ueshin
Copy link
Member Author

ueshin commented Sep 14, 2023

Thanks! merging to master.

@ueshin ueshin closed this in 090fd18 Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants