Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605

Closed
wants to merge 5 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Apr 11, 2017

What changes were proposed in this pull request?

Adds Python bindings for Column.eqNullSafe

How was this patch tested?

Manual tests, existing unit tests, doc build.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75700 has finished for PR 17605 at commit 4fe8413.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Apr 11, 2017

LGTM thanks for adding this.

@zero323
Copy link
Member Author

zero323 commented Apr 19, 2017

@holdenk Do you think it could be merged?

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76159 has finished for PR 17605 at commit 4b8a352.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76161 has finished for PR 17605 at commit 3254f8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76164 has finished for PR 17605 at commit 9d75094.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

LGTM too.

@@ -171,6 +171,40 @@ def __init__(self, jc):
__ge__ = _bin_op("geq")
__gt__ = _bin_op("gt")

_eqNullSafe_doc = """
Equality test that is safe for null values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to document, unlike Pandas, NaN is not treated as NULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a note is enough, or should we add an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, an example is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Done.

@zero323 zero323 force-pushed the SPARK-20290 branch 2 times, most recently from e5e4081 to 043880b Compare April 29, 2017 21:02
@SparkQA
Copy link

SparkQA commented Apr 29, 2017

Test build #76308 has finished for PR 17605 at commit 043880b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2017

Test build #76313 has finished for PR 17605 at commit a658969.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

+----------------+---------------+----------------+
|(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
+----------------+---------------+----------------+
| false| true| false|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Pandas/numpy, the nan's don’t compare equal, i.e., np.nan != np.nan, but in Spark we treat them as equal. Shall we document it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already covered by SQL guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics). Maybe a link would be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@zero323 zero323 force-pushed the SPARK-20290 branch 2 times, most recently from 965396e to 673bf70 Compare May 1, 2017 04:17
@SparkQA
Copy link

SparkQA commented May 1, 2017

Test build #76337 has finished for PR 17605 at commit 965396e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 1, 2017

Test build #76338 has finished for PR 17605 at commit 673bf70.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 1, 2017

Test build #76339 has finished for PR 17605 at commit 99f836e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

1 similar comment
@viirya
Copy link
Member

viirya commented May 1, 2017

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in f0169a1 May 1, 2017
@zero323
Copy link
Member Author

zero323 commented May 1, 2017

Thanks.

@zero323 zero323 deleted the SPARK-20290 branch May 8, 2017 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants