Skip to content

Conversation

@xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jun 4, 2021

What changes were proposed in this pull request?

Completing arithmetic operators involving bool literals, Series, and Index consists of two main tasks:

  • Support arithmetic operations against bool literals
  • Support operators (+, *) between bool Series/Indexes.

Why are the changes needed?

Arithmetic operators involving bool literals, Series, and Index are incomplete now.
We ought to match pandas' behaviors.

Does this PR introduce any user-facing change?

Yes.

Newly supported operations example:

>>> ps.Series([1, 2, 3]) + True
0    2
1    3
2    4
dtype: int64
>>> ps.Series([1, 2, 3]) + False
0    1
1    2
2    3
dtype: int64
>>> ps.Series([True, False, True]) + True
0    True
1    True
2    True
dtype: bool
>>> ps.Series([True, False, True]) + False
0     True
1    False
2     True
dtype: bool
>>> ps.Series([True, False, True]) * True
0     True
1    False
2     True
dtype: bool
>>> ps.Series([True, False, True]) * False
0    False
1    False
2    False
dtype: bool
>>> ps.set_option('compute.ops_on_diff_frames', True)
>>> ps.Series([True, True, False]) + ps.Series([True, False, True])
0    True
1    True
2    True
dtype: bool
>>> ps.Series([True, True, False]) * ps.Series([True, False, True])
0     True
1    False
2    False
dtype: bool

Before the change, operations above are not supported, raising a TypeError such as

>>> ps.Series([True, False, True]) + True
Traceback (most recent call last):
...
TypeError: Addition can not be applied to booleans and the given type.
>>> ps.Series([True, False, True]) + False
Traceback (most recent call last):
...
TypeError: Addition can not be applied to booleans and the given type.

How was this patch tested?

Unit tests.

Keyword: SPARK-35337

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Test build #139348 has finished for PR 32785 at commit 3756dc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43870/

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43870/

@xinrong-meng xinrong-meng changed the title [WIP][SPARK-35601][PYTHON] Support arithmetic operations against bool literals [SPARK-35601][PYTHON] Support arithmetic operations against bool literals Jun 4, 2021
@xinrong-meng xinrong-meng marked this pull request as ready for review June 4, 2021 21:30
@xinrong-meng
Copy link
Member Author

CC @ueshin @HyukjinKwon @itholic

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Test build #139353 has finished for PR 32785 at commit f90000b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43875/

@SparkQA
Copy link

SparkQA commented Jun 4, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43875/

@xinrong-meng xinrong-meng force-pushed the datatypeops_arith_bool branch from f90000b to ee39236 Compare June 7, 2021 17:01
@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139428 has finished for PR 32785 at commit ee39236.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43950/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43950/

@xinrong-meng xinrong-meng force-pushed the datatypeops_arith_bool branch from d7da448 to 2ac8593 Compare June 8, 2021 00:03
@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139438 has finished for PR 32785 at commit 2ac8593.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43961/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43961/

@xinrong-meng
Copy link
Member Author

ERROR [2.315s]: test_termination_sigterm (pyspark.tests.test_daemon.DaemonTests)
Ensure that daemon and workers terminate on SIGTERM.
----------------------------------------------------------------------
Traceback (most recent call last):
...
AssertionError: Expected EnvironmentError to be raised

Might not be related to the PR. Tests are retriggered.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139441 has finished for PR 32785 at commit f2ff9c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43963/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need allow_bool_index_ops? Is there a case where only bool index should be allowed alone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There is NOT a case where only bool index should be allowed alone, but there is a case allow_bool_index_ops is False, whereas allow_bool is True, for example

>>> ps.Series([True, False, True]) + ps.Series([True, False, True])
Traceback (most recent call last):
...
TypeError: Addition can not be applied to booleans and the given type.

>>> ps.Series([True, False, True]) + True
0    True
1    True
2    True
dtype: bool

In case you are interested, there is a case allow_bool_index_ops and allow_bool are both True, for example

>>> ps.Series([1, 2, 3]) + ps.Series([True, False, True])
0    2
1    2
2    4
dtype: int64
>>> ps.Series([1, 2, 3]) + True
0    2
1    3
2    4
dtype: int64

And there is a case allow_bool_index_ops and allow_bool are both False, for example

>> ps.Series([True, False, True]) - ps.Series([True, False, True])
Traceback (most recent call last):
...
TypeError: Subtraction can not be applied to booleans and the given type.
>>> ps.Series([True, False, True]) - True
Traceback (most recent call last):
...
TypeError: Subtraction can not be applied to booleans and the given type.

That's why allow_bool_index_ops might be needed. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first example, pandas works for both cases:

>>> pd.__version__
'1.2.4'

>>> pd.Series([True, False, True]) + pd.Series([True, False, True])
0     True
1    False
2     True
dtype: bool

>>> pd.Series([True, False, True]) + True
0    True
1    True
2    True
dtype: bool

so allow_bool_index_ops and allow_bool should be True for this case as well. Then we can consolidate the two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! https://issues.apache.org/jira/browse/SPARK-35681 is created for supporting arithmetic operators (+, *) among bool Series/Index, the parameters will be consolidated at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@xinrong-meng xinrong-meng Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arithmetic operators (+, *) among bool Series/Index has been implemented in this PR; tickets have been adjusted.
allow_bool_index_ops parameter discussed above has been removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HyukjinKwon, that's helpful!

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43963/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow bools and behave like &?

>>> pd.Series([True, False, True]) * pd.Series([True, False, False])
0     True
1    False
2    False
dtype: bool
>>> pd.Series([True, False, True]) * True
0     True
1    False
2     True
dtype: bool
>>> pd.Series([True, False, True]) * False
0    False
1    False
2    False
dtype: bool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! pd.Series([True, False, True]) * True and pd.Series([True, False, True]) * False are supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+/* between bool Series/Index are supported now.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139507 has finished for PR 32785 at commit afe4044.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44032/

@xinrong-meng xinrong-meng force-pushed the datatypeops_arith_bool branch from afe4044 to c578614 Compare June 9, 2021 01:49
@xinrong-meng xinrong-meng changed the title [SPARK-35601][PYTHON] Support arithmetic operations against bool literals [SPARK-35601][PYTHON] Support arithmetic operators against bool literals, Series, and Index Jun 9, 2021
@xinrong-meng xinrong-meng changed the title [SPARK-35601][PYTHON] Support arithmetic operators against bool literals, Series, and Index [SPARK-35601][PYTHON] Complete arithmetic operators involving bool literals, Series, and Index Jun 9, 2021
@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139521 has finished for PR 32785 at commit 5c0f559.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44046/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44046/

@xinrong-meng xinrong-meng force-pushed the datatypeops_arith_bool branch from 5c0f559 to 1f9f0e4 Compare June 9, 2021 16:46
@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139586 has finished for PR 32785 at commit 1f9f0e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44113/

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44113/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139593 has finished for PR 32785 at commit 0489747.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44120/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44120/

@ueshin
Copy link
Member

ueshin commented Jun 9, 2021

Thanks! merging to master.

@ueshin ueshin closed this in 3c66c11 Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants