-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913
Conversation
…lica block logic when set decrease replication
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi Sir @Hexiaoqiao @ayushtkn @tomscut Could you please help me review this pr when you have free time ? Thanks a lot~ |
@haiyang1987 Thanks for your report and detailed description. It makes sense to me. However I am confused we use |
Sir @Hexiaoqiao thanks you help me review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// Create HA Cluster. | ||
try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) | ||
.nnTopology(MiniDFSNNTopology.simpleHATopology()).numDataNodes(10).build()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here num of DataNode is 10, is it necessary? I think it is enough to set 4, what do you think about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, here can set the number of datanode num to 4, which can also meet the test requirements.
I will update PR.
// Create test file. | ||
Path file = new Path("/test"); | ||
long fileLength = 512; | ||
DFSTestUtil.createFile(fs, file, fileLength, (short) 8, 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here set the replication to 4 is enough?
// Process the block only when active NN is out of safe mode. | ||
if (!isPopulatingReplQueues()) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me.
Update PR. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
The failed unit test is not related to this changes. Committed to trunk. |
Thanks sir @Hexiaoqiao @tomscut help me review and merge !!! |
…a block logic when set decrease replication. (apache#5913). Contributed by Haiyang Hu. Reviewed-by: Tao Li <tomscut@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Description of PR
https://issues.apache.org/jira/browse/HDFS-17137
Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication.
At present, when call setReplication to execute the logic of decrease replication,
ActiveNameNode will call the BlockManager#processExtraRedundancyBlock method to select the dn of the redundant replica , will add to the excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be scheduled to delete the block on dn).
Then the StandyNameNode or ObserverNameNode load editlog and apply the SetReplicationOp, if the dn of the replica to be deleted has not yet performed incremental block report,
here also will BlockManager#processExtraRedundancyBlock method be called here to select the dn of the redundant replica and add it to the excessRedundancyMap (here selected the redundant dn may be inconsistent with the dn selected in the active namenode).
In excessRedundancyMap exist dn maybe affects the dn decommission, resulting can not to complete decommission dn operation in Standy/ObserverNameNode.
The specific cases are as follows:
For example a file is 3 replica (d1,d2,d3) and call setReplication set file to 2 replica.
ActiveNameNode select d1 with redundant replicas to add toexcessRedundancyMap and invalidateBlocks.
StandyNameNode replays SetReplicationOp (at this time, d1 has not yet executed incremental block report), so here maybe selected redundant replica dn are inconsistent with ActiveNameNode, such as select d2 to add excessRedundancyMap.
At this time, d1 completes deleting the block for incremental block report.
The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 from in the excessRedundancyMap when processing the incremental block report ).
The DN list for this block in StandyNameNode includes d2 and d3 (can not delete d2 from in the excessRedundancyMap when processing the incremental block report).
At this time, execute the decommission operation on d3.
ActiveNameNode will select a new node d4 to copy the replica, and d4 will run incrementally block report.
The DN list for this block in ActiveNameNode includes d2 and d3(decommissioning status),d4, then d3 can to decommissioned normally.
The DN list for this block in StandyNameNode is d3 (decommissioning status), d2 (redundant status), d4.
since the requirements for two live replica are not met, d3 cannot be decommissioned at this time.
Therefore, StandyNameNode or ObserverNameNode considers not process redundant replicas logic when call setReplication.