-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27637: Compare highest write ID of compaction records when tryin… #4740
HIVE-27637: Compare highest write ID of compaction records when tryin… #4740
Conversation
545e0da
to
2f0e6dc
Compare
2f0e6dc
to
ec51d3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think it is fine. Also the test is also kinda ok. Do you know need anything that remain?
It seems I have one test to fix. |
@SourabhBadhya , can I ask you to review it? |
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/handler/TestAbortedTxnCleaner.java
Outdated
Show resolved
Hide resolved
...server/src/main/java/org/apache/hadoop/hive/metastore/txn/impl/ReadyToCleanAbortHandler.java
Outdated
Show resolved
Hide resolved
ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
Show resolved
Hide resolved
4f49c78
to
618a889
Compare
d4667c3
to
ffb3cbf
Compare
06c524e
to
cadcdd3
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
@InvisibleProgrammer , @SourabhBadhya, any measurements on a query performance change after the "inner select"? what is the effect of the optimization? Are we sure it won't cause this query to run forever (massive TXN_COMPONENTS, COMPACTION_QUEUE with diplicates) |
Hi @deniskuzZ , thank you for the review. Can I ask you to roll back the change? I'm highly against keeping difference between upstream and downstream. Unfortunately, I have no permission to do the rollback. Also, I'm pretty sure I'm not the first person asked to do performance testing on metastore database queries. Could you please share our performance testing related documentation/rules with me? Thank you, |
The commit is reverted via #5058 . |
Thank you, @deniskuzZ , checking |
…g to perform abort cleanup (apache#4740) (Zsolt Miskolczi reviewed by Attila Turoczy, Sourabh Badhya)
…g to perform abort cleanup (apache#4740) (Zsolt Miskolczi reviewed by Attila Turoczy, Sourabh Badhya)
Compare highest write ID of compaction records when trying to get the potential table/partitions for abort cleanup.
Idea: If there exists a highest write ID of a record in COMPACTION_QUEUE for a table/partition which is greater than the max(aborted write ID) for that table/partition, then we can potentially ignore abort cleanup for such tables/partitions. This is because compaction will perform cleanup of obsolete deltas and aborted deltas hence doing abort cleanup is redundant here.
This is more of an optimisation since it can potentially save some filesystem operations (mainly file-listing during construction of Acid state).
What changes were proposed in this pull request?
Skip abort cleanup for tables/partitions if there is a newer write id on them.
Why are the changes needed?
Reduce redundancy
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
New test added.