[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

cloud-fan · 2015-07-01T05:05:56Z

also improve tests for binary comparison.

AmplabJenkins · 2015-07-01T05:08:11Z

Merged build triggered.

AmplabJenkins · 2015-07-01T05:08:20Z

Merged build started.

SparkQA · 2015-07-01T05:11:36Z

Test build #36227 has started for PR 7143 at commit d19e9c0.

SparkQA · 2015-07-01T06:58:09Z

Test build #36227 has finished for PR 7143 at commit d19e9c0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-01T06:58:41Z

Merged build finished. Test PASSed.

cloud-fan · 2015-07-01T07:29:35Z

There is another place that need to handle binary type, I will fix it in another PR.
But I'm wondering whether we should watch out every place that need column equals, or create a wrapper class for binary type?
cc @marmbrus @davies

davies · 2015-07-01T15:13:24Z

The hard part is that BinaryType could be used in ArrayType and MapType, we need to also fix them.

As @marmbrus suggested, it's better to create a wrapper for BinaryType internal, let it handle hashCode and equality check. We can call it Binary, it's a large change.

marmbrus · 2015-07-01T19:04:31Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala

+  binaryComparisonTest(">", GreaterThan, Seq(false, false, true))
+  binaryComparisonTest(">=", GreaterThanOrEqual, Seq(false, true, true))
+  binaryComparisonTest("===", EqualTo, Seq(false, true, false))
+  binaryComparisonTest("<=>", EqualNullSafe, Seq(false, true, false))


This test is not very easy to read (the original was also pretty confusing IMHO). When writing tests, it is great if I can tell that they are correct by looking at as few lines of code as possible. This means two things:

(which you are already fixing) Its better to avoid indirection unless we are actually testing that part of the code. For example, don't create a row and a bound reference and then an expression that uses the bound reference). Instead just create an expression that compares literals.

Avoid having the expression and the answer far away from each other (even if it means slightly more typing):

This is very clearly correct, and I don't have to look all over the file the validate it:
checkEvaluation(Literal(1) > Literal(2), false)

In contrast, in order to understand if Seq(false, true, false) is correct I have to trace up to the function and manually line up and understand all of the code in lines 139-146.

marmbrus · 2015-07-01T19:06:07Z

Good catch on this bug. I do agree that we probably need to create an internal Binary type at some point.

AmplabJenkins · 2015-07-02T07:33:15Z

Merged build triggered.

AmplabJenkins · 2015-07-02T07:33:20Z

Merged build started.

SparkQA · 2015-07-02T07:34:39Z

Test build #36364 has started for PR 7143 at commit 28a5b76.

SparkQA · 2015-07-02T09:09:51Z

Test build #36364 has finished for PR 7143 at commit 28a5b76.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait ExpectsInputTypes
- trait AutoCastInputTypes
- abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
- abstract class BinaryOperator extends BinaryExpression
- abstract class BinaryArithmetic extends BinaryOperator
- case class UnHex(child: Expression) extends UnaryExpression with Serializable
- abstract class BinaryComparison extends BinaryOperator with Predicate

AmplabJenkins · 2015-07-02T09:10:27Z

Merged build finished. Test PASSed.

davies · 2015-07-02T17:06:12Z

LGTM

marmbrus · 2015-07-02T17:08:15Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala

+      checkEvaluation(smallValues(i) <=> largeValues(i), false)
+      checkEvaluation(equalValues1(i) <=> equalValues2(i), true)
+      checkEvaluation(largeValues(i) <=> smallValues(i), false)
+    }


This is much clearer :)

marmbrus reviewed Jul 1, 2015
View reviewed changes

cloud-fan added 2 commits July 2, 2015 15:31

fix equalNullSafe

04ef4b0

improve test

28a5b76

cloud-fan force-pushed the binary branch from d19e9c0 to 28a5b76 Compare July 2, 2015 07:31

asfgit closed this in afa021e Jul 2, 2015

marmbrus reviewed Jul 2, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

cloud-fan commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

SparkQA commented Jul 1, 2015

SparkQA commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

cloud-fan commented Jul 1, 2015

davies commented Jul 1, 2015

marmbrus Jul 1, 2015

marmbrus commented Jul 1, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

davies commented Jul 2, 2015

marmbrus Jul 2, 2015

[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

Conversation

cloud-fan commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

SparkQA commented Jul 1, 2015

SparkQA commented Jul 1, 2015

AmplabJenkins commented Jul 1, 2015

cloud-fan commented Jul 1, 2015

davies commented Jul 1, 2015

marmbrus Jul 1, 2015

Choose a reason for hiding this comment

marmbrus commented Jul 1, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

davies commented Jul 2, 2015

marmbrus Jul 2, 2015

Choose a reason for hiding this comment