Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17218. NameNode should process time out excess redundancy blocks #6233

Closed
wants to merge 2 commits into from

Conversation

haiyang1987
Copy link
Contributor

@haiyang1987 haiyang1987 commented Oct 28, 2023

Description of PR

Description of PR
https://issues.apache.org/jira/browse/HDFS-17218

Currently found that DN will lose all pending DNA_INVALIDATE blocks if it restarts.
DN enables asynchronously deletion, it have many pending deletion blocks in memory.
when DN restarts, these cached blocks may be lost. it causes some blocks in the excess map in the namenode to be leaked and this will result in many blocks having more replicas then expected.

Root case
1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates.
2.dn1 heartbeat gets Invalidates command.
3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit.
4.at this time, nn's excessRedundancyMap will still have the block of dn1
5. restart the dn, at this time nn has not determined that the dn is in a dead state.
6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed.

In HeartbeatManager#register(final DatanodeDescriptor d)
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238

//here current dn still is alive(expired heartbeat time has not been exceeded), dn register will not call d.updateHeartbeatState, so torageInfo.hasReceivedBlockReport() still is true

synchronized void register(final DatanodeDescriptor d) {
  if (!d.isAlive()) {
    addDatanode(d);
    //update its timestamp
    d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null);
    stats.add(d);
  }
}

In BlockManager#processReport, the dn restart run FBR, here current dn still is alive,storageInfo.hasReceivedBlockReport() is true, so will call method processReport
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946

if (!storageInfo.hasReceivedBlockReport()) {
        // The first block report can be processed a lot more efficiently than
        // ordinary block reports.  This shortens restart times.
        blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing first "
            + "storage report for {} from datanode {}",
            strBlockReportId, fullBrLeaseId,
            storageInfo.getStorageID(),
            nodeID);
        processFirstBlockReport(storageInfo, newReport);
      } else {
        // Block reports for provided storage are not
        // maintained by DN heartbeats
        if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
          invalidatedBlocks = processReport(storageInfo, newReport);
        }
      }
      storageInfo.receivedBlockReport();
    } finally {
      endTime = Time.monotonicNow();
      namesystem.writeUnlock("processReport");
    }

    if (blockLog.isDebugEnabled()) {
      for (Block b : invalidatedBlocks) {
        blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on node {} size {} " +
                "does not belong to any file.", strBlockReportId, fullBrLeaseId, b,
            node, b.getNumBytes());
      }
    }

In BlockManager#processReport run FBR, since the current DatanodeStorageInfo exists in the triplets in the BlockInfo corresponding to the reported block, so will not add toAdd list, addStoredBlock and processExtraRedundancy logic will not be executed.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085

Collection<Block> processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report) throws IOException {
    // Normal case:
    // Modify the (block-->datanode) map, according to the difference
    // between the old and new block report.
    //
    Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
    Collection<BlockInfo> toRemove = new HashSet<>();
    Collection<Block> toInvalidate = new ArrayList<>();
    Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
    Collection<StatefulBlockInfo> toUC = new ArrayList<>();
    reportDiff(storageInfo, report,
                 toAdd, toRemove, toInvalidate, toCorrupt, toUC);

    DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
    // Process the blocks on each queue
    for (StatefulBlockInfo b : toUC) {
      addStoredBlockUnderConstruction(b, storageInfo);
    }
    for (BlockInfo b : toRemove) {
      removeStoredBlock(b, node);
    }
    int numBlocksLogged = 0;
    for (BlockInfoToAdd b : toAdd) {
      addStoredBlock(b.stored, b.reported, storageInfo, null,
          numBlocksLogged < maxNumBlocksToLog);
      numBlocksLogged++;
    }
    if (numBlocksLogged > maxNumBlocksToLog) {
      blockLog.info("BLOCK* processReport: logged info for {} of {} " +
          "reported.", maxNumBlocksToLog, numBlocksLogged);
    }
    for (Block b : toInvalidate) {
      addToInvalidates(b, node);
    }
    for (BlockToMarkCorrupt b : toCorrupt) {
      markBlockAsCorrupt(b, storageInfo, node);
    }

    return toInvalidate;
  }

so the block of dn1 will always exist in excessRedundancyMap (until HA switch is performed).
In BlockManager#processChosenExcessRedundancy will add the redundancy of the given block stored in the given datanode to the excess map.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4267-L4285

private void processChosenExcessRedundancy(
      final Collection<DatanodeStorageInfo> nonExcess,
      final DatanodeStorageInfo chosen, BlockInfo storedBlock) {
    nonExcess.remove(chosen);
    excessRedundancyMap.add(chosen.getDatanodeDescriptor(), storedBlock);
    //
    // The 'excessblocks' tracks blocks until we get confirmation
    // that the datanode has deleted them; the only way we remove them
    // is when we get a "removeBlock" message.
    //
    // The 'invalidate' list is used to inform the datanode the block
    // should be deleted.  Items are removed from the invalidate list
    // upon giving instructions to the datanodes.
    //
    final Block blockToInvalidate = getBlockOnStorage(storedBlock, chosen);
    addToInvalidates(blockToInvalidate, chosen.getDatanodeDescriptor());
    blockLog.debug("BLOCK* chooseExcessRedundancies: ({}, {}) is added to invalidated blocks set",
        chosen, storedBlock);
  }

but because the dn side has not deleted the block, it will not call processIncrementalBlockReport, so the block of dn can not remove from excessRedundancyMap.

Solution

NameNode add logic to handle excess redundant block timeouts to resolve current issue.
If NN determines that the excess redundancy block in DN has timed out and re-adds it to Invalidates.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 59s trunk passed
+1 💚 compile 0m 50s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 52s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 49s trunk passed
+1 💚 mvnsite 0m 57s trunk passed
+1 💚 javadoc 0m 53s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 22s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 8s trunk passed
+1 💚 shadedclient 23m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 50s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 50s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 44s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 39s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 454 unchanged - 0 fixed = 455 total (was 454)
+1 💚 mvnsite 0m 49s the patch passed
+1 💚 javadoc 0m 40s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 59s the patch passed
+1 💚 shadedclient 22m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 214m 28s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
313m 17s
Reason Tests
Failed junit tests hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/artifact/out/Dockerfile
GITHUB PR #6233
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 3f7b1841cf1b 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f79f1d2
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/testReport/
Max. process+thread count 3202 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 35s trunk passed
+1 💚 compile 0m 54s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 49s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 51s trunk passed
+1 💚 mvnsite 0m 59s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 20s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 5s trunk passed
+1 💚 shadedclient 22m 47s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 47s the patch passed
+1 💚 compile 0m 51s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 51s the patch passed
+1 💚 compile 0m 41s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 41s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 39s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 454 unchanged - 0 fixed = 455 total (was 454)
+1 💚 mvnsite 0m 50s the patch passed
+1 💚 javadoc 0m 41s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 9s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 58s the patch passed
+1 💚 shadedclient 22m 30s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 225m 2s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
323m 1s
Reason Tests
Failed junit tests hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/artifact/out/Dockerfile
GITHUB PR #6233
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux d0c64319e2e7 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f79f1d2
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/testReport/
Max. process+thread count 3173 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 8m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 36s trunk passed
+1 💚 compile 0m 53s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 48s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 47s trunk passed
+1 💚 mvnsite 0m 53s trunk passed
+1 💚 javadoc 0m 52s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 11s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 56s trunk passed
+1 💚 shadedclient 21m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 43s the patch passed
+1 💚 compile 0m 46s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 46s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 40s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 37s the patch passed
+1 💚 mvnsite 0m 44s the patch passed
+1 💚 javadoc 0m 38s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 7s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 51s the patch passed
+1 💚 shadedclient 21m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 198m 45s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
301m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/artifact/out/Dockerfile
GITHUB PR #6233
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 718e87103af0 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7fa80cc
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/testReport/
Max. process+thread count 3524 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6233/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@haiyang1987
Copy link
Contributor Author

Hi @Hexiaoqiao @ayushtkn @zhangshuyan0 @ZanderXu excuse me, could you review this PR for me when you are free? Thank you very much~

@Hexiaoqiao
Copy link
Contributor

Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.

@haiyang1987
Copy link
Contributor Author

Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.

Thanks @Hexiaoqiao for your comment, i will update later.

@haiyang1987
Copy link
Contributor Author

Close the current PR, the code has been submitted PR at #6176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants