-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-36710][PYTHON] Support new typing syntax in function apply APIs in pandas API on Spark #34007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used slice because mypy fails. FYI, the new syntax doesn't fail w/ mypy.
4271ffc to
8f312b3
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
8f312b3 to
0b436b8
Compare
|
Test build #143304 has finished for PR 34007 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143323 has finished for PR 34007 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
will merge in 3 days if there are no more comments. |
|
Merged to master. |
What changes were proposed in this pull request?
This PR proposes the new syntax introduced in #33954. Namely, users now can specify the index type and name as below:
Again, this syntax remains experimental and this is a non-standard way apart from Python standard. We should migrate to proper typing once pandas supports it like
numpy.typing.Why are the changes needed?
The rationale is described in #33954. In order to avoid unnecessary computation for default index or schema inference.
Does this PR introduce any user-facing change?
Yes, this PR affects the following APIs:
DataFrame.apply(..., axis=1)DataFrame.groupby.apply(...)DataFrame.pandas_on_spark.transform_batch(...)DataFrame.pandas_on_spark.apply_batch(...)Now they can specify the index type with the new syntax below:
How was this patch tested?
Manually tested, and unittests were added.