-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType. #41162
Conversation
…ts in IN expression have same DataType
…ts in IN expression have same DataType
…lements in IN expression have same DataType.
@cloud-fan @wzhfy , please help review this pr, thanks. |
gentle ping @cloud-fan |
I also think that the different results between 0 in ('00') and 0 = '00' are confusing, and seems hive already fixed this problem. |
|} | ||
""".stripMargin | ||
val codeElseIf = | ||
if (!java.lang.Boolean.parseBoolean(x.isNull.toString)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to add a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…lements in IN expression have same DataType.
…lements in IN expression have same DataType.
I think this is indeed an issue, but it seems a bit weird to special-case the 1-element-in-list case. Thoughts? @gengliangwang @srielau |
BTW can we also check the behavior in other databases like mysql, postgres, oracle, etc.? |
quickly check behavior in mysql, and |
cc @cloud-fan @srielau please for more other thought of this pr? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
See SPARK-43491.
The query results of
in ('00')
and= '00'
are inconsistent.We do this work to ensure when dataTypes of elements in
In
expression are the same, it will behaviour as same as BinaryComparison expression likeEqualTo
when the switch is open(spark.sql.legacy.inExpressionCompatibleWithEqualTo.enabled=true
).Before change (see Filter node in Analyzed Logical Plan)
After change (see Filter node in Analyzed Logical Plan)
Why are the changes needed?
The query results of Spark SQL and Hive SQL are inconsistent with same sql. Spark SQL calculates
0 in ('00')
as false in 3.1.1, which act different from=
keyword, but Hive calculates true in 3.1.0 and false in 2.3.3. Hive has fixed thein
keyword in 3.1.0, but SparkSQL does not.for example, this two query sql should have same result, how ever, the query result is different:
hive 2.3.3
hive 3.1.0
Does this PR introduce any user-facing change?
We add a switch to support
In
expression compatible withEqualTo
expression with false as default value, to make sure it will not change default behavior of Spark SQL.How was this patch tested?
By set spark.sql.legacy.inExpressionCompatibleWithEqualTo.enabled=true/false, to check whether the analyzed logical plan will cast expression as expected. For true, it will generate same Cast logical plan as EqualTo, and false will keep the old Cast logical plan solution.