Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric type is broken. #6505

Closed
wants to merge 7 commits into from

Conversation

cloud-fan
Copy link
Contributor

The origin code has several problems:

  • true <=> 1 will return false as we didn't set a rule to handle it.
  • true = a where a is not Literal and its value is 1, will return false as we only handle literal values.

// Otherwise turn them to Byte types so that there exists and ordering.
case p: BinaryComparison if p.left.dataType == BooleanType &&
p.right.dataType == BooleanType =>
p.makeCopy(Array(Cast(p.left, ByteType), Cast(p.right, ByteType)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean is already comparable, we don't need to cast it to byte.

@SparkQA
Copy link

SparkQA commented May 29, 2015

Test build #33766 has finished for PR 6505 at commit a3f8e9c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2015

Test build #33794 has finished for PR 6505 at commit 18211e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 30, 2015

cc @yhuai

@rxin
Copy link
Contributor

rxin commented May 30, 2015

Note that this no longer merges cleanly.

Seq(true, true, false, false, null, null, null, null, null).map(Row(_)))
checkAnswer(sql("select i <=> b from t"),
Seq(true, true, false, false, false, false, false, false, true).map(Row(_)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case can be simplified a bit, also please drop the temp table at last:

  val data = Seq(
    (1, true),
    (0, false),
    (2, true),
    (2, false),
    (null, true),
    (null, false),
    (0, null),
    (1, null),
    (null, null)
  ).map { case (i, b) =>
    (i.asInstanceOf[Integer], b.asInstanceOf[java.lang.Boolean])
  }

  data.toDF("i", "b").registerTempTable("t")

  try {
    // checkAnswer calls
  } finally {
    sqlCtx.dropTempTable("t")
  }

@SparkQA
Copy link

SparkQA commented May 30, 2015

Test build #33806 has finished for PR 6505 at commit fc0d741.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1331,4 +1331,26 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll {

checkAnswer(sql("SELECT a.`c.b`, `b.$q`[0].`a@!.q`, `q.w`.`w.i&`[0] FROM t"), Row(1, 1, 1))
}

test("SPARK-7952: fix the equality check between boolean type and numeric typegd") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"typegd"?

@liancheng
Copy link
Contributor

LGTM except for a few minor issues. Thanks for fixing this!

@cloud-fan
Copy link
Contributor Author

Hi @liancheng , do we need to drop temp tables at last? If so, we need to change a lot of test cases in SQLQuerySuite.

@liancheng
Copy link
Contributor

@cloud-fan Unlike persisted tables, existing temp tables can be overriden without triggering any error. But in general it's a better practice to clean up the scene. That's why we made SQLTestUtils and provide a bunch of withXxx methods to help automate these boilerplate code. For example, you may make SQLQuerySuite extend SQLTestUtils and use withTempTable("t") { ... } to wrap your test code in the newly added test case.

CaseWhen(Seq(
Or(IsNull(booleanExpr), IsNull(numericExpr)), Literal.create(null, BooleanType),
buildCaseKeyWhen(booleanExpr, numericExpr)
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we can replace this CaseKeyWhen with an If?

@SparkQA
Copy link

SparkQA commented May 30, 2015

Test build #33809 has finished for PR 6505 at commit 9ba2130.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2015

Test build #33810 has finished for PR 6505 at commit 625973c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2015

Test build #33811 has finished for PR 6505 at commit ebc8c61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("SPARK-7952: fix the equality check between boolean and numeric types") {
withTempTable("t") {
Seq(
(1, true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is too confusing as it is.

Maybe you can have tuple4, where the 3rd element is the comparison value for =, and 4th element is the value for <=>. You can also just define the type in the Seq itself, i.e.

Seq[(java.lang.Integer, java.lang.Boolean, java.lang.Boolean)](
  (0, false, true, true)
  ...
)

@yhuai
Copy link
Contributor

yhuai commented May 30, 2015

Can you give some examples to explain what is the problem?

@cloud-fan cloud-fan changed the title [SPARK-7952][SQL] equality check between boolean type and numeric type is broken. [SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric type is broken. May 31, 2015
@SparkQA
Copy link

SparkQA commented May 31, 2015

Test build #33857 has finished for PR 6505 at commit b6401ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2015

Test build #33863 has finished for PR 6505 at commit 77f0f39.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

Is it ready to go? cc @rxin @liancheng @yhuai

@yhuai
Copy link
Contributor

yhuai commented Jun 1, 2015

Can you update the description to explain what is the problem with more information (e.g. a case that we do not handle correctly)?

@cloud-fan
Copy link
Contributor Author

@yhuai , updated.

@liancheng
Copy link
Contributor

LGTM, thanks for working on this!

@rxin
Copy link
Contributor

rxin commented Jun 1, 2015

Merged in master.

@rxin
Copy link
Contributor

rxin commented Jun 1, 2015

@marmbrus if you have time would be great to take a look at this too.

@asfgit asfgit closed this in a0e46a0 Jun 1, 2015
val trueValues = Seq(1, 1L, 1.toByte, 1.toShort, new java.math.BigDecimal(1)).map(Literal(_))
val falseValues = Seq(0, 0L, 0.toByte, 0.toShort, new java.math.BigDecimal(0)).map(Literal(_))
object BooleanEqualization extends Rule[LogicalPlan] {
private val trueValues = Seq(1.toByte, 1.toShort, 1, 1L, new java.math.BigDecimal(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a SQL internal Decimal type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also add a test for this case since it seem broken ATM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This breaks for Decimal. We should probably compare Literal itself, instead of comparing the wrapped value for literals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan can you fix this and add a test case? thanks.

cloud-fan added a commit that referenced this pull request Jun 7, 2015
This PR fixes a bug introduced in #6505.
Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6574 from cloud-fan/fix and squashes the following commits:

b0e3549 [Wenchen Fan] rename to BooleanEquality
1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal
f93c420 [Wenchen Fan] compare literal
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
… numeric type is broken.

The origin code has several problems:
* `true <=> 1` will return false as we didn't set a rule to handle it.
* `true = a` where `a` is not `Literal` and its value is 1, will return false as we only handle literal values.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6505 from cloud-fan/tmp1 and squashes the following commits:

77f0f39 [Wenchen Fan] minor fix
b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments
ebc8c61 [Wenchen Fan] use SQLTestUtils and If
625973c [Wenchen Fan] improve
9ba2130 [Wenchen Fan] address comments
fc0d741 [Wenchen Fan] fix style
2846a04 [Wenchen Fan] fix 7952
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
… numeric type is broken.

The origin code has several problems:
* `true <=> 1` will return false as we didn't set a rule to handle it.
* `true = a` where `a` is not `Literal` and its value is 1, will return false as we only handle literal values.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6505 from cloud-fan/tmp1 and squashes the following commits:

77f0f39 [Wenchen Fan] minor fix
b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments
ebc8c61 [Wenchen Fan] use SQLTestUtils and If
625973c [Wenchen Fan] improve
9ba2130 [Wenchen Fan] address comments
fc0d741 [Wenchen Fan] fix style
2846a04 [Wenchen Fan] fix 7952
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR fixes a bug introduced in apache#6505.
Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6574 from cloud-fan/fix and squashes the following commits:

b0e3549 [Wenchen Fan] rename to BooleanEquality
1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal
f93c420 [Wenchen Fan] compare literal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants