-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16870 HDFS client ip should also be logged when NameNode is processing reportBadBlocks #5237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| for (int j = 0; j < nodes.length; j++) { | ||
| NameNode.stateChangeLog.info("*DIR* reportBadBlocks for block: {} on" | ||
| + " datanode: {}", blk, nodes[j].getXferAddr()); | ||
| + " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), Server.getRemoteIp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my personal point of view, reportBadBlocks should be reported by DN. What does this have to do with the client? Server.getRemoteIp() is somewhat expensive, is there a good enough reason for us to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Server.getRemoteIp(), do a getClientMachine() instead, it handles calls via RBF.
Fetching of this IP Address should be outside lock, doing this inside lock will hit performance.
You are logging this inside the loop, so better extract a variable outside.
reportBadBlocks is there in both ClientProtocol as well as in DatanodeProtocol. Can check somewhat here for pointers
Lines 1533 to 1534 in ca3526d
| dfsClient.reportChecksumFailure(src, | |
| reportList.toArray(new LocatedBlock[reportList.size()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of
Server.getRemoteIp(), do agetClientMachine()instead, it handles calls via RBF. Fetching of this IP Address should be outside lock, doing this inside lock will hit performance. You are logging this inside the loop, so better extract a variable outside.reportBadBlocks is there in both ClientProtocol as well as in DatanodeProtocol. Can check somewhat here for pointers
Lines 1533 to 1534 in ca3526d
dfsClient.reportChecksumFailure(src, reportList.toArray(new LocatedBlock[reportList.size()]));
@ayushtkn @slfan1989 Thanks for review, I moved getClientIp out of the for loop and write lock.
and getClientMachine is not accessible here, this function always invoked by an open file.
|
💔 -1 overall
This message was automatically generated. |
…essing reportBadBlocks
43d48f5 to
05137dd
Compare
|
💔 -1 overall
This message was automatically generated. |
|
@brahmareddybattula @ayushtkn |
ayushtkn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change post that changes LGTM
| */ | ||
| void reportBadBlocks(LocatedBlock[] blocks) throws IOException { | ||
| checkOperation(OperationCategory.WRITE); | ||
| InetAddress remoteIp = Server.getRemoteIp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ain't the correct way to do as I previously mentioned, use getClientMachine()
Something like this on top of your present changes
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
index b624ab76cc0..ed95c912171 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -5882,9 +5882,8 @@ private INodeFile checkUCBlock(ExtendedBlock block,
/**
* Client is reporting some bad block locations.
*/
- void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
+ void reportBadBlocks(String clientMachine, LocatedBlock[] blocks) throws IOException {
checkOperation(OperationCategory.WRITE);
- InetAddress remoteIp = Server.getRemoteIp();
writeLock();
try {
checkOperation(OperationCategory.WRITE);
@@ -5894,7 +5893,7 @@ void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
String[] storageIDs = blocks[i].getStorageIDs();
for (int j = 0; j < nodes.length; j++) {
NameNode.stateChangeLog.info("*DIR* reportBadBlocks for block: {} on"
- + " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), remoteIp);
+ + " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), clientMachine);
blockManager.findAndMarkBlockAsCorrupt(blk, nodes[j],
storageIDs == null ? null: storageIDs[j],
"client machine reported it");
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
index b19bfc13acf..eae945c7458 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
@@ -991,7 +991,7 @@ public boolean complete(String src, String clientName,
@Override // ClientProtocol, DatanodeProtocol
public void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
checkNNStartup();
- namesystem.reportBadBlocks(blocks);
+ namesystem.reportBadBlocks(getClientMachine(), blocks);
}
@Override // ClientProtocol
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
There are two scenarios involded for reportBadBlocks
1-HDFS client will report bad block to NameNode once the block size or data is not consistent with meta;
2-DataNode will report bad block to NameNode via heartbeat if Replica stored on Datanode is corrupted or be modified.
As for now, when namenode process reportBadBlock rpc request, only DataNode address is logged.
Client Ip should also be logged to distinguish where the report comes from, which is very useful for trouble shooting.