-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16498. Fix NPE for checkBlockReportLease #4057
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @jojochuang @tasanuma @ayushtkn @Hexiaoqiao @ferhui , could you please review this PR. Thanks. |
LOG.debug("Datanode {} is attempting to report but not register yet.", | ||
LOG.warn("Datanode {} is attempting to report but not register yet.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this some normal stuff, do we need to change this to warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ayushtkn for your review. This log output is generated when the NN is restarted, and it does seem normal. I'll change it back later.
🎊 +1 overall
This message was automatically generated. |
Hi @ayushtkn @Hexiaoqiao @ferhui , please take a look at this. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick look, prod change makes sense to me, the datanodeManager.getDatanode(nodeID)
methods shows it can register null, if the node isn't found.
The test is little complex and isn't showing some actual scenario. If this issue is caused by some delay or race condition, will mocking something to create some delays and so help?
Try to get a test which shows the actual scenario, I found it really hard to follow this test.
If nothing helps, the worst would be add some comments explaining things in details in the test
...p-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportLease.java
Outdated
Show resolved
Hide resolved
...p-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportLease.java
Outdated
Show resolved
Hide resolved
final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); | ||
NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + e.getLocalizedMessage()); | ||
throw e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share me the log message and the exception trace post this. We have passed null here in the exception. I feel like it can lead to something like:
Node null is expected to serve this storage
Which doesn't make sense to me. May be some more appropriate message should be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share me the log message and the exception trace post this. We have passed null here in the exception. I feel like it can lead to something like:
Node null is expected to serve this storage
Which doesn't make sense to me. May be some more appropriate message should be there.
@ayushtkn Yes, you are right. The log will be: Node null is expected to serve this storage
. I passed node
in last commit, but there is a spotbug: link. So I passed null
. This is just for adaptation UnregisteredNodeException constructor.
Do you have any suggestions for this. Thank you very much.
Thank you @ayushtkn very much for your review and detailed suggestions. I will update the code. |
💔 -1 overall
This message was automatically generated. |
Hi @ayushtkn , I fixed the problem you mentioned, please have a look. Thanks. |
Hi @Hexiaoqiao @tasanuma @ferhui , could you also please review this? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, | |||
return true; | |||
} | |||
DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); | |||
if (node == null) { | |||
final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); | |||
NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused if we need log here which upper method also do that when meet UnregisteredNodeException. Although it is DEBU level now, we could improve that to WARN, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused if we need log here which upper method also do that when meet UnregisteredNodeException. Although it is DEBU level now, we could improve that to WARN, right?
Thanks @Hexiaoqiao for your review. I will wait for @ayushtkn reply and decide whether to update again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomscut you can remove this exception from here, considering it is being logged in the upper method.
Although it is DEBU level now, we could improve that to WARN, right?
I think debug is also fine, this doesn't denote any problematic stuff, this can happen in normal scenario as well and the datanode will register subsequently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomscut you can remove this exception from here, considering it is being logged in the upper method.
Although it is DEBU level now, we could improve that to WARN, right?
I think debug is also fine, this doesn't denote any problematic stuff, this can happen in normal scenario as well and the datanode will register subsequently
Hi @ayushtkn, do you mean I just remove this log?
NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID +
" is attempting to report storage ID " + nodeID.getDatanodeUuid() +
". But this node is not registered.");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeps,
But if @Hexiaoqiao wants to change the log in the upper method to log. I am good with that as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeps, But if @Hexiaoqiao wants to change the log in the upper method to log. I am good with that as well
Thank you very much for your comments. I will update the code.
💔 -1 overall
This message was automatically generated. |
@tomscut Please check if the failed unit test is related with this changes. |
Hi @Hexiaoqiao , the failed unit test is unrelated to the change, and has been run locally multiple times with good results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Let's wait if @ayushtkn has any furthermore comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…omscut. Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Thanks @Hexiaoqiao and @ayushtkn . |
…d by tomscut. Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
… Contributed by tomscut. Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> (cherry picked from commit 6eea28c) Change-Id: I757513e09f242ff2efbb2ef81e294a3000fd4b39
JIRA: HDFS-16498.
During the restart of Namenode, a Datanode is not registered, but this Datanode triggers FBR, which causes NPE.