Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 2, 2017

What changes were proposed in this pull request?

This PR proposes to add both isNotDistinctFrom and isDistinctFrom to both Scala and Python column APIs.

IS [NOT] DISTINCT FROM syntax is now supported in favour of #17764

Adding a Python API was initially suggested in that PR but that PR turned to SQL syntax change only. Per #17764 (comment) I assume we want this.

How was this patch tested?

Doctests for Python and unit tests in ColumnExpressionSuite.

https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics
.. versionadded:: 2.3.0
"""
_isNotDistinctFrom_doc = _eqNullSafe_doc.replace("eqNullSafe", "isNotDistinctFrom")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017-05-02 12 41 27

This is the same with eqNullSafe but only the word eqNullSafe was replaced to isNotDistinctFrom.

.. _NaN Semantics:
https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics
.. versionadded:: 2.3.0
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017-05-02 12 41 02

testData2.collect().toSeq.filter(r => r.getInt(0) == 1))

checkAnswer(
testData2.filter($"a" === $"b"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test below:

   checkAnswer(
      testData2.filter($"a" === 1),
      testData2.collect().toSeq.filter(r => r.getInt(0) == 1))

    checkAnswer(
      testData2.filter($"a" === $"b"),
      testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))

looked to me identical with the test for === above and not testing <=>. So I removed this as a duplicated test.

testData2.filter($"a" === $"b"),
testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=!= test looked actually testing <=>. I switched this to <=> and created a test for =!= below separately.

@HyukjinKwon
Copy link
Member Author

cc @gatorsmile and @ptkool, could you take a look and see if it makes sense please?

@SparkQA
Copy link

SparkQA commented May 2, 2017

Test build #76370 has finished for PR 17827 at commit 008cec4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 2, 2017

Test build #76374 has started for PR 17827 at commit 6d658d4.

@HyukjinKwon
Copy link
Member Author

retest this please


/**
* Equality test that is safe for null values.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like eqNullSafe, they are normally used for JAVA APIs.

@gatorsmile
Copy link
Member

IS [NOT] DISTINCT FROM is part of ANSI SQL, and thus, we decide to support it. I am not sure whether we need to add them into JAVA and Python column APIs after we already have eqNullSafe

@HyukjinKwon
Copy link
Member Author

Yea, that is what I initially thought. I am closing this.

@HyukjinKwon HyukjinKwon closed this May 2, 2017
@SparkQA
Copy link

SparkQA commented May 2, 2017

Test build #76376 has finished for PR 17827 at commit 6d658d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon deleted the SPARK-20552 branch January 2, 2018 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants