New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605
Conversation
Test build #75700 has finished for PR 17605 at commit
|
LGTM thanks for adding this. |
@holdenk Do you think it could be merged? |
Test build #76159 has finished for PR 17605 at commit
|
Test build #76161 has finished for PR 17605 at commit
|
Test build #76164 has finished for PR 17605 at commit
|
LGTM too. |
@@ -171,6 +171,40 @@ def __init__(self, jc): | |||
__ge__ = _bin_op("geq") | |||
__gt__ = _bin_op("gt") | |||
|
|||
_eqNullSafe_doc = """ | |||
Equality test that is safe for null values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to document, unlike Pandas, NaN is not treated as NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a note is enough, or should we add an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an example is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Done.
e5e4081
to
043880b
Compare
Test build #76308 has finished for PR 17605 at commit
|
Test build #76313 has finished for PR 17605 at commit
|
+----------------+---------------+----------------+ | ||
|(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| | ||
+----------------+---------------+----------------+ | ||
| false| true| false| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Pandas/numpy, the nan
's don’t compare equal, i.e., np.nan
!= np.nan
, but in Spark we treat them as equal. Shall we document it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already covered by SQL guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics). Maybe a link would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
965396e
to
673bf70
Compare
Test build #76337 has finished for PR 17605 at commit
|
Test build #76338 has finished for PR 17605 at commit
|
Test build #76339 has finished for PR 17605 at commit
|
LGTM |
1 similar comment
LGTM |
Thanks! Merging to master. |
Thanks. |
What changes were proposed in this pull request?
Adds Python bindings for
Column.eqNullSafe
How was this patch tested?
Manual tests, existing unit tests, doc build.