Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
DatanodeStorage storage) {
// Make sure another entry for the same block is first removed.
// There may only be one such entry.
ReceivedDeletedBlockInfo removedInfo = null;
for (PerStorageIBR perStorage : pendingIBRs.values()) {
if (perStorage.remove(rdbi.getBlock()) != null) {
removedInfo = perStorage.remove(rdbi.getBlock());
if (removedInfo != null) {
break;
}
}
getPerStorageIBR(storage).put(rdbi);
if (removedInfo != null &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first feeling is pendingIBRs should keep the freshest rdbis set to report NameNode. But after changes, it will be not the fresh data and also inconsistence with block data on Storage, right?

Copy link
Contributor Author

@ZanderXu ZanderXu Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We encountered the case of concurrent CloseRecovery. The CloseRecovery with small GS early process block on Storage but later being added into pendingIBRs, and CloseRecovery with bigger GS later process block on Storage but early being added into pendingIBRs. As a result, the large GS block is stored on the disk, but small GS block being reported to Namenode. And very unfortunately, the block has one this valid replica, and leads to the block missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZanderXu Thanks for the detailed information. It is an interesting case. IMO, this improvement makes sense to me. Would you mind to add unit test to cover this case?

removedInfo.getBlock().getGenerationStamp()
> rdbi.getBlock().getGenerationStamp()) {
getPerStorageIBR(storage).put(removedInfo);
} else {
getPerStorageIBR(storage).put(rdbi);
}
}

synchronized void notifyNamenodeBlock(ReceivedDeletedBlockInfo rdbi,
Expand Down