New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28000: Fix scenarios where 'not in' gives incorrect results due to type coercion #5007
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this code place. @kasakrisz could you pls check? I feel this type of change could cause more issue.
Meanwhile could you please add the jira and the description to the PR
Hello @aturoczy and @kasakrisz , |
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 2 New issues |
Hello @aturoczy and @kasakrisz , |
The change looks good to me. |
LGTM |
There are certain scenarios where "not in" clause gives incorrect results when type coercion cannot take place.
These occur when the in clause contains at least one operand which cannot be type-coerced to the column on which the in clause is being applied to.
JIIRA - HIVE-28000
A similar fix was done in HIVE-24817 but that does not work for the cases mentioned above.
The proposed solution in the PR is to ignore those operands completely inside the "in" clause which cannot be type coerced. We then check if it is indeed the case where none of the operands in the "in" clause can be type coerced and we return false if that is the case.
For example:
select * from my_tbl where id not in ('ABC', 'DEF', '100');
we will ignore all those operands which cannot be type coerced, so on a high level, it will be translated to :
select * from my_tbl where id not in ('100');
select * from my_tbl where id not in ('ABC', 'DEF');
we will ignore all those operands which cannot be type coerced, since none of the operands can be type coerced, on a high level, it is framed as :
select * from my_tbl where id not <expression:false>
which would further imply that
select * from my_tbl where <expression:true>
In case of boolean operands, we only support those operands inside the "in" clause, i.e. only boolean values, otherwise the query fails. As a result, this change does not impact boolean operations.
I have added more q tests and the existing tests are passing in CI/CD.