-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16622. addRDBI in IncrementalBlockReportManager may remove the b… #4407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lock with bigger GS
|
💔 -1 overall
This message was automatically generated. |
| } | ||
| } | ||
| getPerStorageIBR(storage).put(rdbi); | ||
| if (removedInfo != null && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first feeling is pendingIBRs should keep the freshest rdbis set to report NameNode. But after changes, it will be not the fresh data and also inconsistence with block data on Storage, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We encountered the case of concurrent CloseRecovery. The CloseRecovery with small GS early process block on Storage but later being added into pendingIBRs, and CloseRecovery with bigger GS later process block on Storage but early being added into pendingIBRs. As a result, the large GS block is stored on the disk, but small GS block being reported to Namenode. And very unfortunately, the block has one this valid replica, and leads to the block missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZanderXu Thanks for the detailed information. It is an interesting case. IMO, this improvement makes sense to me. Would you mind to add unit test to cover this case?
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZanderXu Thanks for your report, leave one nit comment inline. FYI.
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
JIRA: HDFS-16622. addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
Bug code in the for loop:
Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS when complete one block.