HDFS-17218. NameNode should process time out excess redundancy blocks #6233

haiyang1987 · 2023-10-28T13:27:37Z

Description of PR

Description of PR
https://issues.apache.org/jira/browse/HDFS-17218

Currently found that DN will lose all pending DNA_INVALIDATE blocks if it restarts.
DN enables asynchronously deletion, it have many pending deletion blocks in memory.
when DN restarts, these cached blocks may be lost. it causes some blocks in the excess map in the namenode to be leaked and this will result in many blocks having more replicas then expected.

Root case
1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates.
2.dn1 heartbeat gets Invalidates command.
3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit.
4.at this time, nn's excessRedundancyMap will still have the block of dn1
5. restart the dn, at this time nn has not determined that the dn is in a dead state.
6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed.

In HeartbeatManager#register(final DatanodeDescriptor d)
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238

//here current dn still is alive(expired heartbeat time has not been exceeded), dn register will not call d.updateHeartbeatState, so torageInfo.hasReceivedBlockReport() still is true

synchronized void register(final DatanodeDescriptor d) {
  if (!d.isAlive()) {
    addDatanode(d);
    //update its timestamp
    d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null);
    stats.add(d);
  }
}

In BlockManager#processReport, the dn restart run FBR, here current dn still is alive,storageInfo.hasReceivedBlockReport() is true, so will call method processReport
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946

if (!storageInfo.hasReceivedBlockReport()) {
        // The first block report can be processed a lot more efficiently than
        // ordinary block reports.  This shortens restart times.
        blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing first "
            + "storage report for {} from datanode {}",
            strBlockReportId, fullBrLeaseId,
            storageInfo.getStorageID(),
            nodeID);
        processFirstBlockReport(storageInfo, newReport);
      } else {
        // Block reports for provided storage are not
        // maintained by DN heartbeats
        if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
          invalidatedBlocks = processReport(storageInfo, newReport);
        }
      }
      storageInfo.receivedBlockReport();
    } finally {
      endTime = Time.monotonicNow();
      namesystem.writeUnlock("processReport");
    }

    if (blockLog.isDebugEnabled()) {
      for (Block b : invalidatedBlocks) {
        blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on node {} size {} " +
                "does not belong to any file.", strBlockReportId, fullBrLeaseId, b,
            node, b.getNumBytes());
      }
    }

In BlockManager#processReport run FBR, since the current DatanodeStorageInfo exists in the triplets in the BlockInfo corresponding to the reported block, so will not add toAdd list, addStoredBlock and processExtraRedundancy logic will not be executed.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085

Collection<Block> processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report) throws IOException {
    // Normal case:
    // Modify the (block-->datanode) map, according to the difference
    // between the old and new block report.
    //
    Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
    Collection<BlockInfo> toRemove = new HashSet<>();
    Collection<Block> toInvalidate = new ArrayList<>();
    Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
    Collection<StatefulBlockInfo> toUC = new ArrayList<>();
    reportDiff(storageInfo, report,
                 toAdd, toRemove, toInvalidate, toCorrupt, toUC);

    DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
    // Process the blocks on each queue
    for (StatefulBlockInfo b : toUC) {
      addStoredBlockUnderConstruction(b, storageInfo);
    }
    for (BlockInfo b : toRemove) {
      removeStoredBlock(b, node);
    }
    int numBlocksLogged = 0;
    for (BlockInfoToAdd b : toAdd) {
      addStoredBlock(b.stored, b.reported, storageInfo, null,
          numBlocksLogged < maxNumBlocksToLog);
      numBlocksLogged++;
    }
    if (numBlocksLogged > maxNumBlocksToLog) {
      blockLog.info("BLOCK* processReport: logged info for {} of {} " +
          "reported.", maxNumBlocksToLog, numBlocksLogged);
    }
    for (Block b : toInvalidate) {
      addToInvalidates(b, node);
    }
    for (BlockToMarkCorrupt b : toCorrupt) {
      markBlockAsCorrupt(b, storageInfo, node);
    }

    return toInvalidate;
  }

so the block of dn1 will always exist in excessRedundancyMap (until HA switch is performed).
In BlockManager#processChosenExcessRedundancy will add the redundancy of the given block stored in the given datanode to the excess map.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4267-L4285

private void processChosenExcessRedundancy(
      final Collection<DatanodeStorageInfo> nonExcess,
      final DatanodeStorageInfo chosen, BlockInfo storedBlock) {
    nonExcess.remove(chosen);
    excessRedundancyMap.add(chosen.getDatanodeDescriptor(), storedBlock);
    //
    // The 'excessblocks' tracks blocks until we get confirmation
    // that the datanode has deleted them; the only way we remove them
    // is when we get a "removeBlock" message.
    //
    // The 'invalidate' list is used to inform the datanode the block
    // should be deleted.  Items are removed from the invalidate list
    // upon giving instructions to the datanodes.
    //
    final Block blockToInvalidate = getBlockOnStorage(storedBlock, chosen);
    addToInvalidates(blockToInvalidate, chosen.getDatanodeDescriptor());
    blockLog.debug("BLOCK* chooseExcessRedundancies: ({}, {}) is added to invalidated blocks set",
        chosen, storedBlock);
  }

but because the dn side has not deleted the block, it will not call processIncrementalBlockReport, so the block of dn can not remove from excessRedundancyMap.

Solution

NameNode add logic to handle excess redundant block timeouts to resolve current issue.
If NN determines that the excess redundancy block in DN has timed out and re-adds it to Invalidates.

hadoop-yetus · 2023-10-28T18:42:05Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 24s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	34m 59s		trunk passed
+1 💚	compile	0m 50s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 52s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 49s		trunk passed
+1 💚	mvnsite	0m 57s		trunk passed
+1 💚	javadoc	0m 53s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 22s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	2m 8s		trunk passed
+1 💚	shadedclient	23m 8s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 44s		the patch passed
+1 💚	compile	0m 50s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 50s		the patch passed
+1 💚	compile	0m 44s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 44s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 39s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 454 unchanged - 0 fixed = 455 total (was 454)
+1 💚	mvnsite	0m 49s		the patch passed
+1 💚	javadoc	0m 40s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 17s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 59s		the patch passed
+1 💚	shadedclient	22m 43s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	214m 28s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 40s		The patch does not generate ASF License warnings.
		313m 17s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
	hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/artifact/out/Dockerfile
GITHUB PR	#6233
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	Linux 3f7b1841cf1b 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `f79f1d2`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/testReport/
Max. process+thread count	3202 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-10-28T18:53:20Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 27s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	34m 35s		trunk passed
+1 💚	compile	0m 54s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 49s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 51s		trunk passed
+1 💚	mvnsite	0m 59s		trunk passed
+1 💚	javadoc	0m 51s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 20s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	2m 5s		trunk passed
+1 💚	shadedclient	22m 47s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 47s		the patch passed
+1 💚	compile	0m 51s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 51s		the patch passed
+1 💚	compile	0m 41s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 41s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 39s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 454 unchanged - 0 fixed = 455 total (was 454)
+1 💚	mvnsite	0m 50s		the patch passed
+1 💚	javadoc	0m 41s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 9s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 58s		the patch passed
+1 💚	shadedclient	22m 30s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	225m 2s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 37s		The patch does not generate ASF License warnings.
		323m 1s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/artifact/out/Dockerfile
GITHUB PR	#6233
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	Linux d0c64319e2e7 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `f79f1d2`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/testReport/
Max. process+thread count	3173 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-10-29T07:11:22Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	8m 46s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	34m 36s		trunk passed
+1 💚	compile	0m 53s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 48s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 47s		trunk passed
+1 💚	mvnsite	0m 53s		trunk passed
+1 💚	javadoc	0m 52s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 11s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 56s		trunk passed
+1 💚	shadedclient	21m 34s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 43s		the patch passed
+1 💚	compile	0m 46s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 46s		the patch passed
+1 💚	compile	0m 40s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 40s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 37s		the patch passed
+1 💚	mvnsite	0m 44s		the patch passed
+1 💚	javadoc	0m 38s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 7s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 51s		the patch passed
+1 💚	shadedclient	21m 49s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	198m 45s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		301m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/artifact/out/Dockerfile
GITHUB PR	#6233
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	Linux 718e87103af0 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `7fa80cc`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/testReport/
Max. process+thread count	3524 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

haiyang1987 · 2023-10-29T08:45:07Z

Hi @Hexiaoqiao @ayushtkn @zhangshuyan0 @ZanderXu excuse me, could you review this PR for me when you are free? Thank you very much~

Hexiaoqiao · 2023-10-30T02:45:17Z

Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.

haiyang1987 · 2023-10-30T04:58:16Z

Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.

Thanks @Hexiaoqiao for your comment, i will update later.

haiyang1987 · 2023-10-30T05:05:20Z

Close the current PR, the code has been submitted PR at #6176

github-actions bot added HDFS trunk labels Oct 28, 2023

HDFS-17218. NameNode should process time out excess redundancy blocks

f79f1d2

haiyang1987 force-pushed the HDFS-17218-nn branch from 70cdf20 to f79f1d2 Compare October 28, 2023 13:29

HDFS-17218. Fix checkstytle

7fa80cc

haiyang1987 closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-17218. NameNode should process time out excess redundancy blocks #6233

HDFS-17218. NameNode should process time out excess redundancy blocks #6233

haiyang1987 commented Oct 28, 2023 •

edited

hadoop-yetus commented Oct 28, 2023

hadoop-yetus commented Oct 28, 2023

hadoop-yetus commented Oct 29, 2023

haiyang1987 commented Oct 29, 2023

Hexiaoqiao commented Oct 30, 2023

haiyang1987 commented Oct 30, 2023

haiyang1987 commented Oct 30, 2023

HDFS-17218. NameNode should process time out excess redundancy blocks #6233

HDFS-17218. NameNode should process time out excess redundancy blocks #6233

Conversation

haiyang1987 commented Oct 28, 2023 • edited

Description of PR

hadoop-yetus commented Oct 28, 2023

hadoop-yetus commented Oct 28, 2023

hadoop-yetus commented Oct 29, 2023

haiyang1987 commented Oct 29, 2023

Hexiaoqiao commented Oct 30, 2023

haiyang1987 commented Oct 30, 2023

haiyang1987 commented Oct 30, 2023

haiyang1987 commented Oct 28, 2023 •

edited