[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

zero323 · 2017-04-11T08:52:17Z

What changes were proposed in this pull request?

Adds Python bindings for Column.eqNullSafe

How was this patch tested?

Manual tests, existing unit tests, doc build.

SparkQA · 2017-04-11T09:21:55Z

Test build #75700 has finished for PR 17605 at commit 4fe8413.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2017-04-11T19:11:31Z

LGTM thanks for adding this.

zero323 · 2017-04-19T17:19:59Z

@holdenk Do you think it could be merged?

SparkQA · 2017-04-26T00:29:30Z

Test build #76159 has finished for PR 17605 at commit 4b8a352.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-26T01:08:16Z

Test build #76161 has finished for PR 17605 at commit 3254f8e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-26T01:33:25Z

Test build #76164 has finished for PR 17605 at commit 9d75094.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-04-26T10:18:28Z

LGTM too.

gatorsmile · 2017-04-29T07:36:56Z

python/pyspark/sql/column.py

@@ -171,6 +171,40 @@ def __init__(self, jc):
    __ge__ = _bin_op("geq")
    __gt__ = _bin_op("gt")

+    _eqNullSafe_doc = """
+    Equality test that is safe for null values.


We might need to document, unlike Pandas, NaN is not treated as NULL.

Do you think a note is enough, or should we add an example?

Yeah, an example is needed.

@gatorsmile Done.

SparkQA · 2017-04-29T21:28:27Z

Test build #76308 has finished for PR 17605 at commit 043880b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-30T01:22:06Z

Test build #76313 has finished for PR 17605 at commit a658969.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-05-01T00:10:20Z

python/pyspark/sql/column.py

+    +----------------+---------------+----------------+
+    |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
+    +----------------+---------------+----------------+
+    |           false|           true|           false|


In Pandas/numpy, the nan's don’t compare equal, i.e., np.nan != np.nan, but in Spark we treat them as equal. Shall we document it too?

I think this is already covered by SQL guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics). Maybe a link would be better?

Sounds good to me.

SparkQA · 2017-05-01T04:18:47Z

Test build #76337 has finished for PR 17605 at commit 965396e.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-01T04:23:47Z

Test build #76338 has finished for PR 17605 at commit 673bf70.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-01T04:52:26Z

Test build #76339 has finished for PR 17605 at commit 99f836e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-05-01T06:29:20Z

LGTM

viirya · 2017-05-01T06:30:39Z

LGTM

gatorsmile · 2017-05-01T16:44:36Z

Thanks! Merging to master.

zero323 · 2017-05-01T17:54:57Z

Thanks.

zero323 force-pushed the SPARK-20290 branch from 4fe8413 to 4b8a352 Compare April 26, 2017 00:24

zero323 force-pushed the SPARK-20290 branch from 4b8a352 to 3254f8e Compare April 26, 2017 00:32

HyukjinKwon mentioned this pull request Apr 26, 2017

[SPARK-20463] Add support for IS [NOT] DISTINCT FROM. #17764

Closed

gatorsmile reviewed Apr 29, 2017

View reviewed changes

zero323 force-pushed the SPARK-20290 branch 2 times, most recently from e5e4081 to 043880b Compare April 29, 2017 21:02

zero323 added 4 commits April 30, 2017 02:51

Add PySpark wrapper for eqNullSafe

6cd969f

Add detailed docstring

771cd07

Add example with None literal

1e2f821

Clarify NaN behavior

a658969

zero323 force-pushed the SPARK-20290 branch from 043880b to a658969 Compare April 30, 2017 00:52

viirya reviewed May 1, 2017

View reviewed changes

zero323 force-pushed the SPARK-20290 branch 2 times, most recently from 965396e to 673bf70 Compare May 1, 2017 04:17

Link to NaN Semantics

99f836e

zero323 force-pushed the SPARK-20290 branch from 673bf70 to 99f836e Compare May 1, 2017 04:23

asfgit closed this in f0169a1 May 1, 2017

zero323 deleted the SPARK-20290 branch May 8, 2017 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

zero323 commented Apr 11, 2017

SparkQA commented Apr 11, 2017

holdenk commented Apr 11, 2017

zero323 commented Apr 19, 2017

SparkQA commented Apr 26, 2017

SparkQA commented Apr 26, 2017

SparkQA commented Apr 26, 2017

HyukjinKwon commented Apr 26, 2017

gatorsmile Apr 29, 2017

zero323 Apr 29, 2017

gatorsmile Apr 29, 2017

zero323 Apr 29, 2017

SparkQA commented Apr 29, 2017

SparkQA commented Apr 30, 2017

viirya May 1, 2017

zero323 May 1, 2017

viirya May 1, 2017

SparkQA commented May 1, 2017

SparkQA commented May 1, 2017

SparkQA commented May 1, 2017

cloud-fan commented May 1, 2017

viirya commented May 1, 2017

gatorsmile commented May 1, 2017

zero323 commented May 1, 2017

[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

Conversation

zero323 commented Apr 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 11, 2017

holdenk commented Apr 11, 2017

zero323 commented Apr 19, 2017

SparkQA commented Apr 26, 2017

SparkQA commented Apr 26, 2017

SparkQA commented Apr 26, 2017

HyukjinKwon commented Apr 26, 2017

gatorsmile Apr 29, 2017

Choose a reason for hiding this comment

zero323 Apr 29, 2017

Choose a reason for hiding this comment

gatorsmile Apr 29, 2017

Choose a reason for hiding this comment

zero323 Apr 29, 2017

Choose a reason for hiding this comment

SparkQA commented Apr 29, 2017

SparkQA commented Apr 30, 2017

viirya May 1, 2017

Choose a reason for hiding this comment

zero323 May 1, 2017

Choose a reason for hiding this comment

viirya May 1, 2017

Choose a reason for hiding this comment

SparkQA commented May 1, 2017

SparkQA commented May 1, 2017

SparkQA commented May 1, 2017

cloud-fan commented May 1, 2017

viirya commented May 1, 2017

gatorsmile commented May 1, 2017

zero323 commented May 1, 2017