-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36403][PYTHON] Implement Index.putmask
#33744
Conversation
Jenkins, ok to test |
Test build #142472 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@beobest2 Could you enable the GIthub Action in your forked repository to run tests? |
Test build #142522 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Test build #142524 has finished for PR 33744 at commit
|
Kubernetes integration test status success |
Test build #142525 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Test build #142531 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #142538 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #142544 has finished for PR 33744 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Index.putmask
Test build #142550 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #142553 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Test build #142555 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Test build #142556 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status failure |
Kubernetes integration test status success |
Test build #142636 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Could you check beobest2#1 when you find some time ?? The point that we should not use This is a rough fix, so I think an additional reviews should be needed. |
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: | |||
if dtype == np.dtype("object"): | |||
if len(pser) == 0 or pser.isnull().all(): | |||
return types.NullType() | |||
elif hasattr(pser.iloc[0], "__UDT__"): | |||
elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what is this change for ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point : https://github.com/apache/spark/pull/33744/files#diff-c19199b1eb4ba73f00acb31a2c2c055be95b697fd08049ee6ba54655392adfa5R1984
If the type of input parameter value
of this putmask
is Index
,
the function infer_pd_series_spark_type
raises the exception, because Index
type doesn't have iloc
attribute.
This is why I fix this part. I thought that it had no effect on the operation of the existing Series
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the infer_pd_index_spark_type
function would make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not pass the Index
to the infer_pd_series_spark_type
.
At least we should change the function name (such as infer_pd_indexops_spark_type
) and input & output type of the function, or add a new function and use it as you mentioned.
BTW, actually I think we don't really need to use pandas_udf
here, though.
@itholic Thank you for the suggestion, I will review and test this more. |
Test build #144017 has finished for PR 33744 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145165 has finished for PR 33744 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Implement
Index.putmask
This pull request is based on databricks/koalas#1560
Why are the changes needed?
putmask
returns a new Index of the values set with the mask.putmask
is supported in pandas. PySpark should support that as well.Does this PR introduce any user-facing change?
Yes.
Index.putmask
can be used.How was this patch tested?
Unit tests.