-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25065 WAL archival to be done by a separate thread #2501
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not batching to the NN? It is just doing archiving on new background thread.
If we fail to complete an archiving because RS dies, will new RS pick up the archiving?
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
Show resolved
Hide resolved
if (retry > archiveRetries) { | ||
LOG.error("Failed log archiving for the log {},", log.getFirst(), e); | ||
if (this.server != null) { | ||
this.server.abort("Failed log archiving", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This the only reason for passing server? There is an Abortable Interface. Pass in an Abortable Interface instead (the server is an implementation of an Abortable so you could pass server).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this every where.
@@ -86,6 +87,7 @@ | |||
public static final String WAL_ENABLED = "hbase.regionserver.hlog.enabled"; | |||
|
|||
final String factoryId; | |||
final Server server; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, too much if it is only being used to abort. Just pass an Abortable.
} | ||
|
||
@Override | ||
protected void archiveRetriable(Pair<Path, Long> localLogsToArchive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Retriable/Retryable/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits but LGTM otherwise
@@ -482,6 +505,8 @@ protected SyncFuture initialValue() { | |||
this.walTooOldNs = TimeUnit.SECONDS.toNanos(conf.getInt( | |||
SURVIVED_TOO_LONG_SEC_KEY, SURVIVED_TOO_LONG_SEC_DEFAULT)); | |||
this.useHsync = conf.getBoolean(HRegion.WAL_HSYNC_CONF_KEY, HRegion.DEFAULT_WAL_HSYNC); | |||
archiveRetries = this.conf.getInt("hbase.regionserver.walroll.archive.retries", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is no retries?
Will the retry ever work? Will it be regular occurence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally if the WAL FS is having an issue then this archive may also fail. By default we will try it (mandatory) once (so retries are 0) - then if this new config is set to a non-zero value we will try to repeat it for the configured times. Generally we are not going to configure it to a non-zero value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
} | ||
|
||
protected void archiveRetryable(final Pair<Path, Long> log) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just name it archive ? That it retries is an internal affair.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Looks to be fine. Will commit this unless objections. |
🎊 +1 overall
This message was automatically generated. |
a0b0b2c
to
2a12ea8
Compare
💔 -1 overall
This message was automatically generated. |
1 similar comment
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting; | ||
import org.apache.hbase.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why all the extra whitespace?
@@ -482,6 +505,8 @@ protected SyncFuture initialValue() { | |||
this.walTooOldNs = TimeUnit.SECONDS.toNanos(conf.getInt( | |||
SURVIVED_TOO_LONG_SEC_KEY, SURVIVED_TOO_LONG_SEC_DEFAULT)); | |||
this.useHsync = conf.getBoolean(HRegion.WAL_HSYNC_CONF_KEY, HRegion.DEFAULT_WAL_HSYNC); | |||
archiveRetries = this.conf.getInt("hbase.regionserver.walroll.archive.retries", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a new configuration is a hint that this is too "big/complex" to add on a patch release.
|
||
protected void archive(final Pair<Path, Long> log) { | ||
int retry = 1; | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No backoff of any kind in the retry mechanism?
if (AbstractFSWALProvider.getNumRolledLogFiles(log) > 0) { | ||
assertTrue(AbstractFSWALProvider.getLogFileSize(log) > 0); | ||
} else { | ||
for (int i = 0; i < 10; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a flaky test. Please use the waitFor
pattern already provided on the testing utility class.
(Throwable) Mockito.anyObject()); | ||
break; | ||
} catch (WantedButNotInvoked t) { | ||
Threads.sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a flaky test. Please use the waitFor
pattern already provided on the testing utility class.
…separate thread