[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750

dchvn · 2021-11-30T05:44:45Z

What changes were proposed in this pull request?

Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled

Why are the changes needed?

Identical index checking is expensive, so we should use config 'compute.eager_check' to skip this one

Does this PR introduce any user-facing change?

Yes

Before this PR

>>> psser1 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 3, 4, 5]))
>>> psser2 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 4, 3, 6]))
>>> psser1.compare(psser2)
Traceback (most recent call last):                                              
  File "<stdin>", line 1, in <module>
  File "/u02/spark/python/pyspark/pandas/series.py", line 5851, in compare
    raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects

After this PR, when config 'compute.eager_check' is False, pandas-on-Spark just proceeds and performs by ignoring the identical index checking.

>>> with ps.option_context("compute.eager_check", False):
...     psser1.compare(psser2)
... 
   self  other
3   3.0    4.0
4   4.0    3.0
5   5.0    NaN
6   NaN    5.0

How was this patch tested?

Unit tests

SparkQA · 2021-11-30T06:53:42Z

Test build #145750 has finished for PR 34750 at commit 2bbb84d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… when config 'compute.eager_check' is disabled Add user_guide docs Add note to docstring

SparkQA · 2021-11-30T07:12:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50221/

SparkQA · 2021-11-30T07:52:28Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50221/

SparkQA · 2021-11-30T07:55:55Z

Test build #145756 has finished for PR 34750 at commit dabc3c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-30T08:16:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50228/

SparkQA · 2021-11-30T09:01:11Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50228/

dchvn · 2021-11-30T09:52:06Z

CC @HyukjinKwon @itholic @Yikun FYI

python/pyspark/pandas/series.py

HyukjinKwon

Looks good but @itholic and @Yikun FYI to double check.

SparkQA · 2021-12-01T04:04:35Z

Test build #145785 has finished for PR 34750 at commit 37f22b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-12-01T04:26:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50258/

SparkQA · 2021-12-01T05:25:17Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50258/

Yikun

LGTM, thanks @dchvn

Yikun · 2021-12-01T03:48:12Z

python/pyspark/pandas/series.py

@@ -5781,6 +5781,25 @@ def compare(
        """
        Compare to another Series and show the differences.

+        .. note:: This API is slightly different from pandas when indexes from both Series


Good notes and doctest

dchvn · 2021-12-04T00:38:05Z

Ping @HyukjinKwon :-) thanks

HyukjinKwon · 2021-12-04T01:31:20Z

Merged to master.

github-actions bot added CORE PYTHON labels Nov 30, 2021

dchvn marked this pull request as draft November 30, 2021 05:46

[SPARK-37495][PYTHON] Skip identical index checking of Series.compare…

dabc3c5

… when config 'compute.eager_check' is disabled Add user_guide docs Add note to docstring

dchvn force-pushed the SPARK-37495 branch from 2a712db to dabc3c5 Compare November 30, 2021 07:14

dchvn marked this pull request as ready for review November 30, 2021 07:15

HyukjinKwon reviewed Dec 1, 2021

View reviewed changes

python/pyspark/pandas/series.py Outdated Show resolved Hide resolved

discard indices length checking

37f22b9

HyukjinKwon approved these changes Dec 1, 2021

View reviewed changes

Yikun approved these changes Dec 1, 2021

View reviewed changes

HyukjinKwon closed this in b2a4e8f Dec 4, 2021

dchvn deleted the SPARK-37495 branch December 4, 2021 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750

[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750

dchvn commented Nov 30, 2021 •

edited

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

dchvn commented Nov 30, 2021

HyukjinKwon left a comment

SparkQA commented Dec 1, 2021

SparkQA commented Dec 1, 2021

SparkQA commented Dec 1, 2021

Yikun left a comment

Yikun Dec 1, 2021

dchvn commented Dec 4, 2021

HyukjinKwon commented Dec 4, 2021

[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750

[SPARK-37495][PYTHON] Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled #34750

Conversation

dchvn commented Nov 30, 2021 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

dchvn commented Nov 30, 2021

HyukjinKwon left a comment

Choose a reason for hiding this comment

SparkQA commented Dec 1, 2021

SparkQA commented Dec 1, 2021

SparkQA commented Dec 1, 2021

Yikun left a comment

Choose a reason for hiding this comment

Yikun Dec 1, 2021

Choose a reason for hiding this comment

dchvn commented Dec 4, 2021

HyukjinKwon commented Dec 4, 2021

dchvn commented Nov 30, 2021 •

edited