Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27637: Compare highest write ID of compaction records when tryin… #4740

Conversation

InvisibleProgrammer
Copy link
Contributor

Compare highest write ID of compaction records when trying to get the potential table/partitions for abort cleanup.

Idea: If there exists a highest write ID of a record in COMPACTION_QUEUE for a table/partition which is greater than the max(aborted write ID) for that table/partition, then we can potentially ignore abort cleanup for such tables/partitions. This is because compaction will perform cleanup of obsolete deltas and aborted deltas hence doing abort cleanup is redundant here.

This is more of an optimisation since it can potentially save some filesystem operations (mainly file-listing during construction of Acid state).

What changes were proposed in this pull request?

Skip abort cleanup for tables/partitions if there is a newer write id on them.

Why are the changes needed?

Reduce redundancy

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

New test added.

Copy link

@aturoczy aturoczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think it is fine. Also the test is also kinda ok. Do you know need anything that remain?

@InvisibleProgrammer
Copy link
Contributor Author

+1 I think it is fine. Also the test is also kinda ok. Do you know need anything that remain?

It seems I have one test to fix.

@InvisibleProgrammer
Copy link
Contributor Author

@SourabhBadhya , can I ask you to review it?

Copy link

sonarcloud bot commented Jan 31, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Contributor

@SourabhBadhya SourabhBadhya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@SourabhBadhya SourabhBadhya marked this pull request as ready for review January 31, 2024 12:13
@SourabhBadhya SourabhBadhya merged commit f343969 into apache:master Feb 2, 2024
6 checks passed
@deniskuzZ
Copy link
Member

deniskuzZ commented Feb 2, 2024

@InvisibleProgrammer , @SourabhBadhya, any measurements on a query performance change after the "inner select"? what is the effect of the optimization? Are we sure it won't cause this query to run forever (massive TXN_COMPONENTS, COMPACTION_QUEUE with diplicates)
That is the critical area and we haven't done any perf tests? I'll certainly exclude it from 4.0 release and don't recommend backporting it
see example #4744

@InvisibleProgrammer
Copy link
Contributor Author

Hi @deniskuzZ , thank you for the review.

Can I ask you to roll back the change? I'm highly against keeping difference between upstream and downstream. Unfortunately, I have no permission to do the rollback.

Also, I'm pretty sure I'm not the first person asked to do performance testing on metastore database queries. Could you please share our performance testing related documentation/rules with me?

Thank you,
Zsolt

@SourabhBadhya
Copy link
Contributor

The commit is reverted via #5058 .
cc @InvisibleProgrammer @deniskuzZ

@InvisibleProgrammer
Copy link
Contributor Author

Thank you, @deniskuzZ , checking HMSBenchmarks and related stuff.

tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Feb 9, 2024
…g to perform abort cleanup (apache#4740) (Zsolt Miskolczi reviewed by Attila Turoczy, Sourabh Badhya)
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Mar 7, 2024
…g to perform abort cleanup (apache#4740) (Zsolt Miskolczi reviewed by Attila Turoczy, Sourabh Badhya)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants