New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24616 Remove BoundedRecoveredHFilesOutputSink dependency on a T… #1955
Conversation
🎊 +1 overall
This message was automatically generated. |
@@ -2037,8 +2037,7 @@ private void startServices() throws IOException { | |||
this.splitLogWorker = new SplitLogWorker(sinkConf, this, | |||
this, walFactory); | |||
splitLogWorker.start(); | |||
} else { | |||
LOG.warn("SplitLogWorker Service NOT started; CoordinatedStateManager is null"); | |||
LOG.debug("SplitLogWorker started"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the log out of the if {} code block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous, we used to log when we did NOT start the splitlogworker service. This goes against our usual pattern of only logging started services (logging all the things we did not start would be an endless list -- smile)
I changed the log here so that it conforms with our usual pattern.
@@ -48,7 +48,7 @@ public void process() { | |||
try { | |||
callable.call(); | |||
} catch (Throwable t) { | |||
LOG.error("Error when call RSProcedureCallable: ", t); | |||
LOG.error("pid=" + this.procId, t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is too little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so that it logs the pid same way we do it every where else. The thread/class/level already provides what we add here. I just removed redundancy.
} | ||
LOG.info("Failed getting {} table descriptor from master; trying local", tableName); | ||
try { | ||
return walSplitter.tableDescriptors.get(tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tableDescriptors may be removed from WalSplitter, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
Looking at it, it usually gets them from the hosting Server, not by RPC to Master so there it should not suffer the issue we see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I get what you are saying now. You added this. Let me undo it.
@@ -191,50 +186,22 @@ public boolean keepRegionEvent(Entry entry) { | |||
return false; | |||
} | |||
|
|||
/** | |||
* @return Returns a base HFile without compressions or encodings; good enough for recovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to get the TableDescriptor first? If it is not possible, then we fallback to write generic HFiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Jira, I added a comment about making sure we will compact all these tiny HFiles created as part of WAL split. If we can make sure that part, I would say it ok to create these tiny files with out any table specific things like compression/DBE etc. Anyways we know all these files are going to get compacted and rewritten once we open the region.
As of now we are not sure whether or when these tiny files will get compacted. In such case I would +1 ur ask. Do this HFile create with defaults as a fall back only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking the Master for the table descriptor first and if it fails, read the fs is what we had before this patch. Problem was that default configuration had it that the RPC query wasn't timing out before the below happened:
2020-06-18 19:53:54,175 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Master not initialized after 200000ms
I could dick around w/ timings inside here so RPC timed out sooner. It uses the server's Connection. I suppose I could make a new Connection everytime we want to query a Table schema and configure it to try once only but that is a lot of work to do at recovery time.
On the compactions, don't we usually pick up the small files first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the timeout can be set per rpc request on the same connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bq. Actually the timeout can be set per rpc request on the same connection.
You are right. Could try this later.
Do we think this a good way forward? |
…ableDescriptor Purge query Master for table descriptors; make do w/ generic options. Logging cleanup. hbase-server/src/main/java/org/apache/hadoop/hbase/wal/BoundedRecoveredHFilesOutputSink.java Undo fetching Table Descriptor. Not reliably available at recovery time.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
The fallback logic on TableDescriptor can be a follow on issue.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Failed in here: ... which is flakey. Let me rerun just in case. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…ableDescriptor (#1955) Purge query Master for table descriptors; make do w/ generic options. Logging cleanup. hbase-server/src/main/java/org/apache/hadoop/hbase/wal/BoundedRecoveredHFilesOutputSink.java Undo fetching Table Descriptor. Not reliably available at recovery time. Signed-off-by: Duo Zhang <zhangduo@apache.org>
Merged. The failures vary and are flakies. |
…ableDescriptor (#1955) Purge query Master for table descriptors; make do w/ generic options. Logging cleanup. hbase-server/src/main/java/org/apache/hadoop/hbase/wal/BoundedRecoveredHFilesOutputSink.java Undo fetching Table Descriptor. Not reliably available at recovery time. Signed-off-by: Duo Zhang <zhangduo@apache.org>
…ableDescriptor (apache#1955) Purge query Master for table descriptors; make do w/ generic options. Logging cleanup. hbase-server/src/main/java/org/apache/hadoop/hbase/wal/BoundedRecoveredHFilesOutputSink.java Undo fetching Table Descriptor. Not reliably available at recovery time. Signed-off-by: Duo Zhang <zhangduo@apache.org>
…ableDescriptor
Logging cleanup.
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/BoundedRecoveredHFilesOutputSink.java
Undo fetching Table Descriptor. Not reliably available at recovery time.