Skip to content

Conversation

@Daniel-009497
Copy link
Contributor

@Daniel-009497 Daniel-009497 commented Dec 17, 2022

There are two scenarios involded for reportBadBlocks
1-HDFS client will report bad block to NameNode once the block size or data is not consistent with meta;
2-DataNode will report bad block to NameNode via heartbeat if Replica stored on Datanode is corrupted or be modified.

As for now, when namenode process reportBadBlock rpc request, only DataNode address is logged.
Client Ip should also be logged to distinguish where the report comes from, which is very useful for trouble shooting.

for (int j = 0; j < nodes.length; j++) {
NameNode.stateChangeLog.info("*DIR* reportBadBlocks for block: {} on"
+ " datanode: {}", blk, nodes[j].getXferAddr());
+ " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), Server.getRemoteIp());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my personal point of view, reportBadBlocks should be reported by DN. What does this have to do with the client? Server.getRemoteIp() is somewhat expensive, is there a good enough reason for us to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Server.getRemoteIp(), do a getClientMachine() instead, it handles calls via RBF.
Fetching of this IP Address should be outside lock, doing this inside lock will hit performance.
You are logging this inside the loop, so better extract a variable outside.

reportBadBlocks is there in both ClientProtocol as well as in DatanodeProtocol. Can check somewhat here for pointers

dfsClient.reportChecksumFailure(src,
reportList.toArray(new LocatedBlock[reportList.size()]));

Copy link
Contributor Author

@Daniel-009497 Daniel-009497 Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Server.getRemoteIp(), do a getClientMachine() instead, it handles calls via RBF. Fetching of this IP Address should be outside lock, doing this inside lock will hit performance. You are logging this inside the loop, so better extract a variable outside.

reportBadBlocks is there in both ClientProtocol as well as in DatanodeProtocol. Can check somewhat here for pointers

dfsClient.reportChecksumFailure(src,
reportList.toArray(new LocatedBlock[reportList.size()]));

@ayushtkn @slfan1989 Thanks for review, I moved getClientIp out of the for loop and write lock.
and getClientMachine is not accessible here, this function always invoked by an open file.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 42s trunk passed
+1 💚 compile 1m 40s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 29s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 8s trunk passed
+1 💚 mvnsite 1m 32s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 51s trunk passed
+1 💚 shadedclient 26m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 1m 27s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 27s the patch passed
+1 💚 compile 1m 22s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 58s the patch passed
+1 💚 mvnsite 2m 29s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 40s the patch passed
+1 💚 shadedclient 27m 1s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 395m 37s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
516m 54s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/1/artifact/out/Dockerfile
GITHUB PR #5237
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d20bfd7407bd 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 43d48f5
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/1/testReport/
Max. process+thread count 2077 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 2m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 43s trunk passed
+1 💚 compile 1m 50s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 28s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 8s trunk passed
+1 💚 mvnsite 1m 42s trunk passed
+1 💚 javadoc 1m 10s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 36s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 45s trunk passed
+1 💚 shadedclient 27m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 28s the patch passed
+1 💚 compile 1m 32s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 32s the patch passed
+1 💚 compile 1m 29s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 55s the patch passed
+1 💚 mvnsite 1m 32s the patch passed
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 27s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 43s the patch passed
+1 💚 shadedclient 26m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 503m 22s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 15s The patch does not generate ASF License warnings.
626m 49s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/2/artifact/out/Dockerfile
GITHUB PR #5237
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 30f4e88eb6a3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 05137dd
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/2/testReport/
Max. process+thread count 2087 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5237/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@Daniel-009497
Copy link
Contributor Author

@brahmareddybattula @ayushtkn
Could you pls help to review.
Thanks a lot

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change post that changes LGTM

*/
void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
checkOperation(OperationCategory.WRITE);
InetAddress remoteIp = Server.getRemoteIp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ain't the correct way to do as I previously mentioned, use getClientMachine()
Something like this on top of your present changes

diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
index b624ab76cc0..ed95c912171 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -5882,9 +5882,8 @@ private INodeFile checkUCBlock(ExtendedBlock block,
   /**
    * Client is reporting some bad block locations.
    */
-  void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
+  void reportBadBlocks(String clientMachine, LocatedBlock[] blocks) throws IOException {
     checkOperation(OperationCategory.WRITE);
-    InetAddress remoteIp = Server.getRemoteIp();
     writeLock();
     try {
       checkOperation(OperationCategory.WRITE);
@@ -5894,7 +5893,7 @@ void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
         String[] storageIDs = blocks[i].getStorageIDs();
         for (int j = 0; j < nodes.length; j++) {
           NameNode.stateChangeLog.info("*DIR* reportBadBlocks for block: {} on"
-              + " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), remoteIp);
+              + " datanode: {}" + " client: {}", blk, nodes[j].getXferAddr(), clientMachine);
           blockManager.findAndMarkBlockAsCorrupt(blk, nodes[j],
               storageIDs == null ? null: storageIDs[j],
               "client machine reported it");
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
index b19bfc13acf..eae945c7458 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
@@ -991,7 +991,7 @@ public boolean complete(String src, String clientName,
   @Override // ClientProtocol, DatanodeProtocol
   public void reportBadBlocks(LocatedBlock[] blocks) throws IOException {
     checkNNStartup();
-    namesystem.reportBadBlocks(blocks);
+    namesystem.reportBadBlocks(getClientMachine(), blocks);
   }
 
   @Override // ClientProtocol

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 29, 2025
@github-actions github-actions bot closed this Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants