Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27480: OFFSET without ORDER BY generates wrong results #4511

Merged
merged 4 commits into from
Sep 8, 2023

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Jul 21, 2023

What changes were proposed in this pull request?

Add an additional shuffle with a single reducer when OFFSET is used without ORDER BY.

Why are the changes needed?

Prevent data integrity issues.
https://issues.apache.org/jira/browse/HIVE-27480

Does this PR introduce any user-facing change?

Execution plans can change but it should be OK since the original one had risks of wrong results.
This PR would also add a new param but the default value wouldn't change the behavior.

Is the change a dependency upgrade?

Not

How was this patch tested?

I added and updated itests

@okumin okumin changed the title [WIP] HIVE-27480: Add test cases [WIP] HIVE-27480: WIP Jul 21, 2023
@sonarcloud
Copy link

sonarcloud bot commented Jul 22, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 8 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@okumin okumin changed the title [WIP] HIVE-27480: WIP HIVE-27480: OFFSET without ORDER BY generates wrong results Jul 24, 2023
@@ -1099,7 +1116,7 @@ limit 1 offset 1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
86 val_86 86 val_86
238 val_238 238 val_238
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK because the test query picks up any single row.

select *
from src src1 left outer join src src2
on src1.key = src2.key
limit 1 offset 1

@okumin
Copy link
Contributor Author

okumin commented Jul 24, 2023

@kasakrisz Could you please take a look when you have a chance? This PR is related to #4471.

@sonarcloud
Copy link

sonarcloud bot commented Sep 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 8 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@kasakrisz kasakrisz merged commit d969572 into apache:master Sep 8, 2023
4 checks passed
@okumin okumin deleted the HIVE-27480-offset-no-order-by-2 branch September 8, 2023 13:41
@okumin
Copy link
Contributor Author

okumin commented Sep 8, 2023

Thanks for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants