Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28000: Fix scenarios where 'not in' gives incorrect results due to type coercion #5007

Merged
merged 6 commits into from Jan 31, 2024

Conversation

AnmolSun
Copy link
Contributor

@AnmolSun AnmolSun commented Jan 15, 2024

There are certain scenarios where "not in" clause gives incorrect results when type coercion cannot take place.

These occur when the in clause contains at least one operand which cannot be type-coerced to the column on which the in clause is being applied to.

JIIRA - HIVE-28000

A similar fix was done in HIVE-24817 but that does not work for the cases mentioned above.

The proposed solution in the PR is to ignore those operands completely inside the "in" clause which cannot be type coerced. We then check if it is indeed the case where none of the operands in the "in" clause can be type coerced and we return false if that is the case.

For example:

CREATE TABLE my_tbl (id int);
insert into my_tbl values (100),(200),(300);

select * from my_tbl where id not in ('ABC', 'DEF', '100');
we will ignore all those operands which cannot be type coerced, so on a high level, it will be translated to :
select * from my_tbl where id not in ('100');

select * from my_tbl where id not in ('ABC', 'DEF');
we will ignore all those operands which cannot be type coerced, since none of the operands can be type coerced, on a high level, it is framed as :
select * from my_tbl where id not <expression:false>
which would further imply that
select * from my_tbl where <expression:true>

In case of boolean operands, we only support those operands inside the "in" clause, i.e. only boolean values, otherwise the query fails. As a result, this change does not impact boolean operations.

I have added more q tests and the existing tests are passing in CI/CD.

Copy link

@aturoczy aturoczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code place. @kasakrisz could you pls check? I feel this type of change could cause more issue.

Meanwhile could you please add the jira and the description to the PR

@AnmolSun
Copy link
Contributor Author

AnmolSun commented Jan 17, 2024

Hello @aturoczy and @kasakrisz ,
This patch is still WIP. There are some tests that are failing for now - I haven't got the chance to fix them yet. I'll add more details on the patch once we get a green run in tests.
I'm sorry I missed converting this to a draft earlier.

@AnmolSun AnmolSun marked this pull request as draft January 17, 2024 01:31
@AnmolSun AnmolSun changed the title HIVE-28000: Fix scenarios where 'not in' gives incorrect results due. [WIP] HIVE-28000: Fix scenarios where 'not in' gives incorrect results due. Jan 23, 2024
@AnmolSun AnmolSun marked this pull request as ready for review January 23, 2024 19:31
@AnmolSun AnmolSun marked this pull request as draft January 23, 2024 19:32
Copy link

sonarcloud bot commented Jan 25, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@AnmolSun AnmolSun changed the title [WIP] HIVE-28000: Fix scenarios where 'not in' gives incorrect results due. HIVE-28000: Fix scenarios where 'not in' gives incorrect results due to type coercion Jan 25, 2024
@AnmolSun AnmolSun marked this pull request as ready for review January 25, 2024 06:30
@AnmolSun
Copy link
Contributor Author

Hello @aturoczy and @kasakrisz ,
Could you please help take a look. I have updated the description and we have a green CI/CD run.

@kasakrisz
Copy link
Contributor

The change looks good to me.
@scarlin-cloudera Could you please take a look?

@scarlin-cloudera
Copy link
Contributor

LGTM

@kasakrisz kasakrisz merged commit e08a600 into apache:master Jan 31, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants