Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8747][SQL] fix EqualNullSafe for binary type #7143

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

also improve tests for binary comparison.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36227 has started for PR 7143 at commit d19e9c0.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36227 has finished for PR 7143 at commit d19e9c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@cloud-fan
Copy link
Contributor Author

There is another place that need to handle binary type, I will fix it in another PR.
But I'm wondering whether we should watch out every place that need column equals, or create a wrapper class for binary type?
cc @marmbrus @davies

@davies
Copy link
Contributor

davies commented Jul 1, 2015

The hard part is that BinaryType could be used in ArrayType and MapType, we need to also fix them.

As @marmbrus suggested, it's better to create a wrapper for BinaryType internal, let it handle hashCode and equality check. We can call it Binary, it's a large change.

binaryComparisonTest(">", GreaterThan, Seq(false, false, true))
binaryComparisonTest(">=", GreaterThanOrEqual, Seq(false, true, true))
binaryComparisonTest("===", EqualTo, Seq(false, true, false))
binaryComparisonTest("<=>", EqualNullSafe, Seq(false, true, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not very easy to read (the original was also pretty confusing IMHO). When writing tests, it is great if I can tell that they are correct by looking at as few lines of code as possible. This means two things:

  1. (which you are already fixing) Its better to avoid indirection unless we are actually testing that part of the code. For example, don't create a row and a bound reference and then an expression that uses the bound reference). Instead just create an expression that compares literals.
  2. Avoid having the expression and the answer far away from each other (even if it means slightly more typing):

This is very clearly correct, and I don't have to look all over the file the validate it:
checkEvaluation(Literal(1) > Literal(2), false)

In contrast, in order to understand if Seq(false, true, false) is correct I have to trace up to the function and manually line up and understand all of the code in lines 139-146.

@marmbrus
Copy link
Contributor

marmbrus commented Jul 1, 2015

Good catch on this bug. I do agree that we probably need to create an internal Binary type at some point.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36364 has started for PR 7143 at commit 28a5b76.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36364 has finished for PR 7143 at commit 28a5b76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpectsInputTypes
    • trait AutoCastInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • case class UnHex(child: Expression) extends UnaryExpression with Serializable
    • abstract class BinaryComparison extends BinaryOperator with Predicate

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@davies
Copy link
Contributor

davies commented Jul 2, 2015

LGTM

@asfgit asfgit closed this in afa021e Jul 2, 2015
checkEvaluation(smallValues(i) <=> largeValues(i), false)
checkEvaluation(equalValues1(i) <=> equalValues2(i), true)
checkEvaluation(largeValues(i) <=> smallValues(i), false)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much clearer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants