New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric type is broken. #6505
Conversation
// Otherwise turn them to Byte types so that there exists and ordering. | ||
case p: BinaryComparison if p.left.dataType == BooleanType && | ||
p.right.dataType == BooleanType => | ||
p.makeCopy(Array(Cast(p.left, ByteType), Cast(p.right, ByteType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean is already comparable, we don't need to cast it to byte.
Test build #33766 has finished for PR 6505 at commit
|
Test build #33794 has finished for PR 6505 at commit
|
cc @yhuai |
Note that this no longer merges cleanly. |
Seq(true, true, false, false, null, null, null, null, null).map(Row(_))) | ||
checkAnswer(sql("select i <=> b from t"), | ||
Seq(true, true, false, false, false, false, false, false, true).map(Row(_))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can be simplified a bit, also please drop the temp table at last:
val data = Seq(
(1, true),
(0, false),
(2, true),
(2, false),
(null, true),
(null, false),
(0, null),
(1, null),
(null, null)
).map { case (i, b) =>
(i.asInstanceOf[Integer], b.asInstanceOf[java.lang.Boolean])
}
data.toDF("i", "b").registerTempTable("t")
try {
// checkAnswer calls
} finally {
sqlCtx.dropTempTable("t")
}
Test build #33806 has finished for PR 6505 at commit
|
@@ -1331,4 +1331,26 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll { | |||
|
|||
checkAnswer(sql("SELECT a.`c.b`, `b.$q`[0].`a@!.q`, `q.w`.`w.i&`[0] FROM t"), Row(1, 1, 1)) | |||
} | |||
|
|||
test("SPARK-7952: fix the equality check between boolean type and numeric typegd") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"typegd"?
LGTM except for a few minor issues. Thanks for fixing this! |
Hi @liancheng , do we need to drop temp tables at last? If so, we need to change a lot of test cases in |
@cloud-fan Unlike persisted tables, existing temp tables can be overriden without triggering any error. But in general it's a better practice to clean up the scene. That's why we made |
CaseWhen(Seq( | ||
Or(IsNull(booleanExpr), IsNull(numericExpr)), Literal.create(null, BooleanType), | ||
buildCaseKeyWhen(booleanExpr, numericExpr) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that we can replace this CaseKeyWhen
with an If
?
Test build #33809 has finished for PR 6505 at commit
|
Test build #33810 has finished for PR 6505 at commit
|
Test build #33811 has finished for PR 6505 at commit
|
test("SPARK-7952: fix the equality check between boolean and numeric types") { | ||
withTempTable("t") { | ||
Seq( | ||
(1, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test case is too confusing as it is.
Maybe you can have tuple4, where the 3rd element is the comparison value for =, and 4th element is the value for <=>. You can also just define the type in the Seq itself, i.e.
Seq[(java.lang.Integer, java.lang.Boolean, java.lang.Boolean)](
(0, false, true, true)
...
)
Can you give some examples to explain what is the problem? |
Test build #33857 has finished for PR 6505 at commit
|
Test build #33863 has finished for PR 6505 at commit
|
Is it ready to go? cc @rxin @liancheng @yhuai |
Can you update the description to explain what is the problem with more information (e.g. a case that we do not handle correctly)? |
@yhuai , updated. |
LGTM, thanks for working on this! |
Merged in master. |
@marmbrus if you have time would be great to take a look at this too. |
val trueValues = Seq(1, 1L, 1.toByte, 1.toShort, new java.math.BigDecimal(1)).map(Literal(_)) | ||
val falseValues = Seq(0, 0L, 0.toByte, 0.toShort, new java.math.BigDecimal(0)).map(Literal(_)) | ||
object BooleanEqualization extends Rule[LogicalPlan] { | ||
private val trueValues = Seq(1.toByte, 1.toShort, 1, 1L, new java.math.BigDecimal(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a SQL internal Decimal type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also add a test for this case since it seem broken ATM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This breaks for Decimal. We should probably compare Literal itself, instead of comparing the wrapped value for literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan can you fix this and add a test case? thanks.
This PR fixes a bug introduced in #6505. Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #6574 from cloud-fan/fix and squashes the following commits: b0e3549 [Wenchen Fan] rename to BooleanEquality 1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal f93c420 [Wenchen Fan] compare literal
… numeric type is broken. The origin code has several problems: * `true <=> 1` will return false as we didn't set a rule to handle it. * `true = a` where `a` is not `Literal` and its value is 1, will return false as we only handle literal values. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6505 from cloud-fan/tmp1 and squashes the following commits: 77f0f39 [Wenchen Fan] minor fix b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments ebc8c61 [Wenchen Fan] use SQLTestUtils and If 625973c [Wenchen Fan] improve 9ba2130 [Wenchen Fan] address comments fc0d741 [Wenchen Fan] fix style 2846a04 [Wenchen Fan] fix 7952
… numeric type is broken. The origin code has several problems: * `true <=> 1` will return false as we didn't set a rule to handle it. * `true = a` where `a` is not `Literal` and its value is 1, will return false as we only handle literal values. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6505 from cloud-fan/tmp1 and squashes the following commits: 77f0f39 [Wenchen Fan] minor fix b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments ebc8c61 [Wenchen Fan] use SQLTestUtils and If 625973c [Wenchen Fan] improve 9ba2130 [Wenchen Fan] address comments fc0d741 [Wenchen Fan] fix style 2846a04 [Wenchen Fan] fix 7952
This PR fixes a bug introduced in apache#6505. Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6574 from cloud-fan/fix and squashes the following commits: b0e3549 [Wenchen Fan] rename to BooleanEquality 1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal f93c420 [Wenchen Fan] compare literal
The origin code has several problems:
true <=> 1
will return false as we didn't set a rule to handle it.true = a
wherea
is notLiteral
and its value is 1, will return false as we only handle literal values.