New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750
Conversation
Test build #145750 has finished for PR 34750 at commit
|
… when config 'compute.eager_check' is disabled Add user_guide docs Add note to docstring
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145756 has finished for PR 34750 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
CC @HyukjinKwon @itholic @Yikun FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #145785 has finished for PR 34750 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @dchvn
@@ -5781,6 +5781,25 @@ def compare( | |||
""" | |||
Compare to another Series and show the differences. | |||
|
|||
.. note:: This API is slightly different from pandas when indexes from both Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good notes and doctest
Ping @HyukjinKwon :-) thanks |
Merged to master. |
What changes were proposed in this pull request?
Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled
Why are the changes needed?
Identical index checking is expensive, so we should use config 'compute.eager_check' to skip this one
Does this PR introduce any user-facing change?
Yes
Before this PR
After this PR, when config 'compute.eager_check' is False, pandas-on-Spark just proceeds and performs by ignoring the identical index checking.
How was this patch tested?
Unit tests