-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes #449 fix two bugs with WAL recovery #458
Conversation
import org.apache.accumulo.tserver.logger.LogFileKey; | ||
import org.apache.accumulo.tserver.logger.LogFileValue; | ||
import org.apache.hadoop.fs.Path; | ||
import org.mortbay.log.Log; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mortbay log is probably not the Log you want
|
||
private static class MultiReaderIterator implements Iterator<Entry<LogFileKey,LogFileValue>> { | ||
|
||
private MultiReader reader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MultiReader should be renamed. Then the MultiReaderIterator as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this Iterator could be moved into "MultiReader"
lastFinish = newFinish; | ||
} | ||
private int findMaxTabletId(KeyExtent extent, List<Path> recoveryLogs) throws IOException { | ||
int tid = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use less abstract name than "tid". We use this name for different things throughout the code.
@@ -16,132 +16,79 @@ | |||
*/ | |||
package org.apache.accumulo.tserver.log; | |||
|
|||
import static com.google.common.base.Preconditions.checkState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just import the class. I found myself looking around for your definition of checkState. Perhaps this isn't a big of deal in an IDE.
} | ||
|
||
private void playbackMutations(MultiReader reader, int tid, LastStartToFinish lastStartToFinish, | ||
public void recover(KeyExtent extent, List<Path> recoveryLogs, Set<String> tabletFiles, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference to have public methods and entry points at the top of classes. Unlike binary data, humans read from the top down :)
try (RecoveryLogsIterator rli = new RecoveryLogsIterator(fs, recoveryLogs, COMPACTION_START, | ||
tid)) { | ||
|
||
DeduplicatingIterator ddi = new DeduplicatingIterator(rli); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could DeduplicatingIterator be Autoclosable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, looks like you need to close it.
return recoverySeq; | ||
} | ||
|
||
private void playbackMutations(List<Path> recoveryLogs, MutationReceiver mr, int tid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement... this method now becomes a lot more clear.
private List<MultiReader> readers; | ||
private UnmodifiableIterator<Entry<LogFileKey,LogFileValue>> iter; | ||
|
||
private static class MultiReaderIterator implements Iterator<Entry<LogFileKey,LogFileValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default method 'remove' should be overridden
} | ||
} | ||
|
||
static class SortCheckIterator implements Iterator<Entry<LogFileKey,LogFileValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default method 'remove' should be overridden
import static org.apache.accumulo.tserver.logger.LogEvents.OPEN; | ||
|
||
import java.io.IOException; | ||
import java.io.UncheckedIOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Java 8 exception
int findLastStartToFinish(MultiReader reader, int fileno, KeyExtent extent, | ||
Set<String> tabletFiles, LastStartToFinish lastStartToFinish) | ||
throws IOException, EmptyMapFileException, UnusedException { | ||
static class DeduplicatingIterator implements Iterator<Entry<LogFileKey,LogFileValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default method 'remove' should be overridden
@@ -106,12 +106,12 @@ private int findMaxTabletId(KeyExtent extent, List<Path> recoveryLogs) throws IO | |||
checkState(key.event == DEFINE_TABLET); // should only fail if bug elsewhere | |||
|
|||
if (key.tablet.equals(extent) || key.tablet.equals(alternative)) { | |||
checkState(key.tid >= 0, "Tid %s for %s is negative", key.tid, extent); | |||
checkState(tabletId == -1 || key.tid >= tabletId); // should only fail if bug in | |||
checkState(key.tabletId >= 0, "Tid %s for %s is negative", key.tabletId, extent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could update the log messages to refer to "tabletId" instead of "Tid", too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a test is failing...
Using these changes, a ~24hr test of CI w/ agitation completed with no data loss. There were 8 nodes runnning tservers and tservers were killed 279 times during the test.
|
* Fix bug where tablet is unloaded, reloaded on tserver, and then tserver dies * Fix bug with out of order logs. Recovery code assumed logs were passed in time order. However, since 1.8.0 they have been passed in random order. Rewrote recovery code to handle out of order logs. The fix was to read all logs in a sorted merged way.
* Fix bug where tablet is unloaded, reloaded on tserver, and then tserver dies * Fix bug with out of order logs. Recovery code assumed logs were passed in time order. However, since 1.8.0 they have been passed in random order. Rewrote recovery code to handle out of order logs. The fix was to read all logs in a sorted merged way.
time order. However, since 1.8.0 they have been passed in random order. Rewrote
recovery code to handle out of order logs. The fix was to read all logs in
a sorted merged way.