Skip to content

Conversation

@ZanderXu
Copy link
Contributor

@ZanderXu ZanderXu commented Jun 6, 2022

JIRA: HDFS-16622. addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
Bug code in the for loop:

synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
      DatanodeStorage storage) {
    // Make sure another entry for the same block is first removed.
    // There may only be one such entry.
    for (PerStorageIBR perStorage : pendingIBRs.values()) {
      if (perStorage.remove(rdbi.getBlock()) != null) {
        break;
      }
    }
    getPerStorageIBR(storage).put(rdbi);
  }

Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS when complete one block.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 18s trunk passed
+1 💚 compile 1m 44s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 1m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 21s trunk passed
+1 💚 mvnsite 1m 41s trunk passed
+1 💚 javadoc 1m 22s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 40s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 48s trunk passed
+1 💚 shadedclient 26m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 1m 32s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 1m 32s the patch passed
+1 💚 compile 1m 21s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 21s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
+1 💚 checkstyle 1m 1s the patch passed
+1 💚 mvnsite 1m 29s the patch passed
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 31s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 33s the patch passed
+1 💚 shadedclient 25m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 395m 0s hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
512m 44s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4407/1/artifact/out/Dockerfile
GITHUB PR #4407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e7745f582308 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 91f7ff3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4407/1/testReport/
Max. process+thread count 2066 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4407/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

}
}
getPerStorageIBR(storage).put(rdbi);
if (removedInfo != null &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first feeling is pendingIBRs should keep the freshest rdbis set to report NameNode. But after changes, it will be not the fresh data and also inconsistence with block data on Storage, right?

Copy link
Contributor Author

@ZanderXu ZanderXu Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We encountered the case of concurrent CloseRecovery. The CloseRecovery with small GS early process block on Storage but later being added into pendingIBRs, and CloseRecovery with bigger GS later process block on Storage but early being added into pendingIBRs. As a result, the large GS block is stored on the disk, but small GS block being reported to Namenode. And very unfortunately, the block has one this valid replica, and leads to the block missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZanderXu Thanks for the detailed information. It is an interesting case. IMO, this improvement makes sense to me. Would you mind to add unit test to cover this case?

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZanderXu Thanks for your report, leave one nit comment inline. FYI.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Nov 11, 2025
@github-actions github-actions bot closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants