-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35601][PYTHON] Complete arithmetic operators involving bool literals, Series, and Index #32785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #139348 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139353 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
f90000b to
ee39236
Compare
|
Test build #139428 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
d7da448 to
2ac8593
Compare
|
Test build #139438 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
Might not be related to the PR. Tests are retriggered. |
|
Test build #139441 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we need allow_bool_index_ops? Is there a case where only bool index should be allowed alone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. There is NOT a case where only bool index should be allowed alone, but there is a case allow_bool_index_ops is False, whereas allow_bool is True, for example
>>> ps.Series([True, False, True]) + ps.Series([True, False, True])
Traceback (most recent call last):
...
TypeError: Addition can not be applied to booleans and the given type.
>>> ps.Series([True, False, True]) + True
0 True
1 True
2 True
dtype: boolIn case you are interested, there is a case allow_bool_index_ops and allow_bool are both True, for example
>>> ps.Series([1, 2, 3]) + ps.Series([True, False, True])
0 2
1 2
2 4
dtype: int64
>>> ps.Series([1, 2, 3]) + True
0 2
1 3
2 4
dtype: int64And there is a case allow_bool_index_ops and allow_bool are both False, for example
>> ps.Series([True, False, True]) - ps.Series([True, False, True])
Traceback (most recent call last):
...
TypeError: Subtraction can not be applied to booleans and the given type.
>>> ps.Series([True, False, True]) - True
Traceback (most recent call last):
...
TypeError: Subtraction can not be applied to booleans and the given type.That's why allow_bool_index_ops might be needed. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first example, pandas works for both cases:
>>> pd.__version__
'1.2.4'
>>> pd.Series([True, False, True]) + pd.Series([True, False, True])
0 True
1 False
2 True
dtype: bool
>>> pd.Series([True, False, True]) + True
0 True
1 True
2 True
dtype: boolso allow_bool_index_ops and allow_bool should be True for this case as well. Then we can consolidate the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! https://issues.apache.org/jira/browse/SPARK-35681 is created for supporting arithmetic operators (+, *) among bool Series/Index, the parameters will be consolidated at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I investigated Nan None True evaluation before at https://github.com/apache/spark/pull/24234/files#diff-2086df5f0cf7291ee71bbde83662c0e34600c53c04ee0e377ceadf61e6c496deR365-R382. might be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arithmetic operators (+, *) among bool Series/Index has been implemented in this PR; tickets have been adjusted.
allow_bool_index_ops parameter discussed above has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HyukjinKwon, that's helpful!
|
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should allow bools and behave like &?
>>> pd.Series([True, False, True]) * pd.Series([True, False, False])
0 True
1 False
2 False
dtype: bool
>>> pd.Series([True, False, True]) * True
0 True
1 False
2 True
dtype: bool
>>> pd.Series([True, False, True]) * False
0 False
1 False
2 False
dtype: boolThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! pd.Series([True, False, True]) * True and pd.Series([True, False, True]) * False are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+/* between bool Series/Index are supported now.
|
Test build #139507 has finished for PR 32785 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
afe4044 to
c578614
Compare
|
Test build #139521 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
Bool index_ops and bool spark_type is optional for transform_boolean_operand_to_numeric Black lint
5c0f559 to
1f9f0e4
Compare
|
Test build #139586 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
|
Kubernetes integration test status failure |
|
Test build #139593 has finished for PR 32785 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Thanks! merging to master. |
What changes were proposed in this pull request?
Completing arithmetic operators involving bool literals, Series, and Index consists of two main tasks:
Why are the changes needed?
Arithmetic operators involving bool literals, Series, and Index are incomplete now.
We ought to match pandas' behaviors.
Does this PR introduce any user-facing change?
Yes.
Newly supported operations example:
Before the change, operations above are not supported, raising a TypeError such as
How was this patch tested?
Unit tests.
Keyword: SPARK-35337