New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-23340 hmaster /hbase/replication/rs session expired (hbase repl… #2689
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
d901eb9
to
bb09955
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment, will get back with more details.
Please remove binary file changes as part of hbase-server/src/test/resources/...
from this PR.
Thanks
@@ -60,10 +62,29 @@ public void preClean() { | |||
wals = queueStorage.getAllWALs(); | |||
} catch (ReplicationException e) { | |||
LOG.warn("Failed to read zookeeper, skipping checking deletable files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log this warn only if root cause of ReplicationException
is not covered by below mentioned if
condition.
Something like this:
catch (ReplicationException e) {
Throwable cause = e.getCause();
if (cause instanceof KeeperException.ConnectionLossException
|| cause instanceof KeeperException.SessionExpiredException
|| cause instanceof KeeperException.UnknownSessionException) {
try {
reconnectAfterKeeperException();
wals = queueStorage.getAllWALs();
return;
} catch (IOException | ReplicationException ex) {
LOG.warn("Failed to reconnect zookeeper or load all wal in replication queues.", ex);
}
} else {
LOG.warn("Failed to read zookeeper, skipping checking deletable files", e);
}
wals = null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani update...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuibo01 I think you missed putting warning in else
block. Please check the sample provided above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no..i think it does not need else...and we can see the real exception
CleanerChore when runs each time, will try reconnect() if ZK session has expired right? What is the sleepPeriod configured in your environment where you tested this reconnect ZKWatcher changes? |
bb09955
to
32048a4
Compare
…ication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWALs too large (more than 2TB)
32048a4
to
91d54c0
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in ReplicationQueueStorage we just use the same ZKWatcher with HMaster? So if it hits session expired, the HMaster will abort itself? Mind explaining a bit here why HMaster is still alive?
Thanks.
if this, we need to give HMaster/ZK to every Cleaner |
OK, so in ReplicationLogCleaner, we create our own ZKWatcher and ReplicationQueueStorage. Let's see if we could make it use the ones from HMaster? It is a bit confusing a add a updateZooKeeper method... |
I think we could make use of this method.
When creating HFileCleaner, we will set HMaster into the map and the actual cleaners could get HMaster from the map. I think we could do the same for LogCleaners, to get ZKWatcher from HMaster, then there will be no problem if we hit session expired as the HMaster will abort itself so we do not need to recover. Thanks. |
…ication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWALs too large (more than 2TB)