You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently found that DN will lose all pending DNA_INVALIDATE blocks if it restarts.
DN enables asynchronously deletion, it have many pending deletion blocks in memory.
when DN restarts, these cached blocks may be lost. it causes some blocks in the excess map in the namenode to be leaked and this will result in many blocks having more replicas then expected.
Root case
1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates.
2.dn1 heartbeat gets Invalidates command.
3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit.
4.at this time, nn's excessRedundancyMap will still have the block of dn1
5. restart the dn, at this time nn has not determined that the dn is in a dead state.
6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed.
//here current dn still is alive(expired heartbeat time has not been exceeded), dn register will not call d.updateHeartbeatState, so torageInfo.hasReceivedBlockReport() still is true
if (!storageInfo.hasReceivedBlockReport()) {
// The first block report can be processed a lot more efficiently than
// ordinary block reports. This shortens restart times.
blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing first "
+ "storage report for {} from datanode {}",
strBlockReportId, fullBrLeaseId,
storageInfo.getStorageID(),
nodeID);
processFirstBlockReport(storageInfo, newReport);
} else {
// Block reports for provided storage are not
// maintained by DN heartbeats
if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
invalidatedBlocks = processReport(storageInfo, newReport);
}
}
storageInfo.receivedBlockReport();
} finally {
endTime = Time.monotonicNow();
namesystem.writeUnlock("processReport");
}
if (blockLog.isDebugEnabled()) {
for (Block b : invalidatedBlocks) {
blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on node {} size {} " +
"does not belong to any file.", strBlockReportId, fullBrLeaseId, b,
node, b.getNumBytes());
}
}
Collection<Block> processReport(
final DatanodeStorageInfo storageInfo,
final BlockListAsLongs report) throws IOException {
// Normal case:
// Modify the (block-->datanode) map, according to the difference
// between the old and new block report.
//
Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
Collection<BlockInfo> toRemove = new HashSet<>();
Collection<Block> toInvalidate = new ArrayList<>();
Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
Collection<StatefulBlockInfo> toUC = new ArrayList<>();
reportDiff(storageInfo, report,
toAdd, toRemove, toInvalidate, toCorrupt, toUC);
DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
// Process the blocks on each queue
for (StatefulBlockInfo b : toUC) {
addStoredBlockUnderConstruction(b, storageInfo);
}
for (BlockInfo b : toRemove) {
removeStoredBlock(b, node);
}
int numBlocksLogged = 0;
for (BlockInfoToAdd b : toAdd) {
addStoredBlock(b.stored, b.reported, storageInfo, null,
numBlocksLogged < maxNumBlocksToLog);
numBlocksLogged++;
}
if (numBlocksLogged > maxNumBlocksToLog) {
blockLog.info("BLOCK* processReport: logged info for {} of {} " +
"reported.", maxNumBlocksToLog, numBlocksLogged);
}
for (Block b : toInvalidate) {
addToInvalidates(b, node);
}
for (BlockToMarkCorrupt b : toCorrupt) {
markBlockAsCorrupt(b, storageInfo, node);
}
return toInvalidate;
}
private void processChosenExcessRedundancy(
final Collection<DatanodeStorageInfo> nonExcess,
final DatanodeStorageInfo chosen, BlockInfo storedBlock) {
nonExcess.remove(chosen);
excessRedundancyMap.add(chosen.getDatanodeDescriptor(), storedBlock);
//
// The 'excessblocks' tracks blocks until we get confirmation
// that the datanode has deleted them; the only way we remove them
// is when we get a "removeBlock" message.
//
// The 'invalidate' list is used to inform the datanode the block
// should be deleted. Items are removed from the invalidate list
// upon giving instructions to the datanodes.
//
final Block blockToInvalidate = getBlockOnStorage(storedBlock, chosen);
addToInvalidates(blockToInvalidate, chosen.getDatanodeDescriptor());
blockLog.debug("BLOCK* chooseExcessRedundancies: ({}, {}) is added to invalidated blocks set",
chosen, storedBlock);
}
but because the dn side has not deleted the block, it will not call processIncrementalBlockReport, so the block of dn can not remove from excessRedundancyMap.
Solution
NameNode add logic to handle excess redundant block timeouts to resolve current issue.
If NN determines that the excess redundancy block in DN has timed out and re-adds it to Invalidates.
Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.
Hi @haiyang1987 would you mind to submit PR at #6176 if it is the same improvement, thus we could link the discussion together and will be helpful for end users or reviewers to trace where thought was from and how improvement works. Thanks.
Thanks @Hexiaoqiao for your comment, i will update later.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Description of PR
https://issues.apache.org/jira/browse/HDFS-17218
Currently found that DN will lose all pending DNA_INVALIDATE blocks if it restarts.
DN enables asynchronously deletion, it have many pending deletion blocks in memory.
when DN restarts, these cached blocks may be lost. it causes some blocks in the excess map in the namenode to be leaked and this will result in many blocks having more replicas then expected.
Root case
1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates.
2.dn1 heartbeat gets Invalidates command.
3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit.
4.at this time, nn's excessRedundancyMap will still have the block of dn1
5. restart the dn, at this time nn has not determined that the dn is in a dead state.
6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed.
In HeartbeatManager#register(final DatanodeDescriptor d)
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238
//here current dn still is alive(expired heartbeat time has not been exceeded), dn register will not call d.updateHeartbeatState, so torageInfo.hasReceivedBlockReport() still is true
In BlockManager#processReport, the dn restart run FBR, here current dn still is alive,storageInfo.hasReceivedBlockReport() is true, so will call method processReport
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946
In BlockManager#processReport run FBR, since the current DatanodeStorageInfo exists in the triplets in the BlockInfo corresponding to the reported block, so will not add toAdd list, addStoredBlock and processExtraRedundancy logic will not be executed.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085
so the block of dn1 will always exist in excessRedundancyMap (until HA switch is performed).
In BlockManager#processChosenExcessRedundancy will add the redundancy of the given block stored in the given datanode to the excess map.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4267-L4285
but because the dn side has not deleted the block, it will not call processIncrementalBlockReport, so the block of dn can not remove from excessRedundancyMap.
Solution
NameNode add logic to handle excess redundant block timeouts to resolve current issue.
If NN determines that the excess redundancy block in DN has timed out and re-adds it to Invalidates.