Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only requeue compaction when there was activity #759

Merged
merged 1 commit into from Nov 9, 2018

Conversation

@keith-turner
Copy link
Contributor

commented Nov 8, 2018

While working on an experimental compaction strategy for Fluo for apache/fluo#1054 I ran into a situation where Accumulo was calling my compaction strategy hundreds of thousands of times per second. I tracked down the problem and fixed it in this PR.

@@ -41,12 +41,18 @@ public void run() {
return;
}

tablet.majorCompact(reason, queued);
CompactionStats stats = tablet.majorCompact(reason, queued);

This comment has been minimized.

Copy link
@milleruntime

milleruntime Nov 8, 2018

Contributor

I would have thought Spotbugs would have dinged this method returning stats that aren't being used...

// make blocking calls to gather information. Without the following check these strategies would
// endlessly requeue. So only check if a subsequent compaction is needed if the previous
// compaction actually did something.
if (stats != null && stats.getEntriesRead() > 0) {

This comment has been minimized.

Copy link
@milleruntime

milleruntime Nov 8, 2018

Contributor

Won't this prevent compactions from happening when stats.getEntriesRead() == 0?

This comment has been minimized.

Copy link
@keith-turner

keith-turner Nov 8, 2018

Author Contributor

No, tablets are checked periodically to see if they need a compaction by the tablet server. This code is just an optimization to re-queue the tablet for compaction if there is still work to do (without waiting for the periodic check). If we run a compaction and it does nothing, its likely there is no follow on compaction work to be done.

For background, tablets are limited in the number of files they will compact at once. So if a tablet had 100 files to compact and the limit is 20, then when it compacts 20 files it would be nice if immediately re-queued instead of waiting for the tablet server to check all tablets again. With these changes, it will still re-queued immediately in that case.

This comment has been minimized.

Copy link
@milleruntime

milleruntime Nov 8, 2018

Contributor

OK thanks for making that clear.

@keith-turner keith-turner merged commit b6409ef into apache:1.9 Nov 9, 2018

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@keith-turner keith-turner deleted the keith-turner:majc-requeue-check branch Nov 9, 2018

@ctubbsii ctubbsii added v1.9.3 labels Nov 9, 2018

@ctubbsii ctubbsii added this to Done in 1.9.3 Jun 14, 2019

@ctubbsii ctubbsii added this to Done in 2.0.0 Jun 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.