Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

Merged
merged 3 commits into from
Aug 8, 2023

Conversation

haiyang1987
Copy link
Contributor

@haiyang1987 haiyang1987 commented Aug 1, 2023

Description of PR

https://issues.apache.org/jira/browse/HDFS-17137

Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication.

At present, when call setReplication to execute the logic of decrease replication,

  • ActiveNameNode will call the BlockManager#processExtraRedundancyBlock method to select the dn of the redundant replica , will add to the excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be scheduled to delete the block on dn).

  • Then the StandyNameNode or ObserverNameNode load editlog and apply the SetReplicationOp, if the dn of the replica to be deleted has not yet performed incremental block report,
    here also will BlockManager#processExtraRedundancyBlock method be called here to select the dn of the redundant replica and add it to the excessRedundancyMap (here selected the redundant dn may be inconsistent with the dn selected in the active namenode).

In excessRedundancyMap exist dn maybe affects the dn decommission, resulting can not to complete decommission dn operation in Standy/ObserverNameNode.

The specific cases are as follows:
For example a file is 3 replica (d1,d2,d3) and call setReplication set file to 2 replica.

  • ActiveNameNode select d1 with redundant replicas to add toexcessRedundancyMap and invalidateBlocks.

  • StandyNameNode replays SetReplicationOp (at this time, d1 has not yet executed incremental block report), so here maybe selected redundant replica dn are inconsistent with ActiveNameNode, such as select d2 to add excessRedundancyMap.

  • At this time, d1 completes deleting the block for incremental block report.

  • The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 from in the excessRedundancyMap when processing the incremental block report ).

  • The DN list for this block in StandyNameNode includes d2 and d3 (can not delete d2 from in the excessRedundancyMap when processing the incremental block report).

  • At this time, execute the decommission operation on d3.

  • ActiveNameNode will select a new node d4 to copy the replica, and d4 will run incrementally block report.

  • The DN list for this block in ActiveNameNode includes d2 and d3(decommissioning status),d4, then d3 can to decommissioned normally.

  • The DN list for this block in StandyNameNode is d3 (decommissioning status), d2 (redundant status), d4.
    since the requirements for two live replica are not met, d3 cannot be decommissioned at this time.

Therefore, StandyNameNode or ObserverNameNode considers not process redundant replicas logic when call setReplication.

…lica block logic when set decrease replication
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 37m 18s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 29s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 1m 30s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 31s trunk passed
+1 💚 mvnsite 1m 44s trunk passed
+1 💚 javadoc 1m 24s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 2m 0s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 4m 24s trunk passed
-1 ❌ shadedclient 9m 56s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 24s /patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
-1 ❌ compile 0m 24s /patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌ javac 0m 24s /patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌ compile 0m 24s /patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.txt hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 0m 24s /patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.txt hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 22s /buildtool-patch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt The patch fails to run checkstyle in hadoop-hdfs
-1 ❌ mvnsite 0m 24s /patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
-1 ❌ javadoc 0m 24s /patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌ javadoc 0m 24s /patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.txt hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.
-1 ❌ spotbugs 0m 25s /patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
+1 💚 shadedclient 5m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 24s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
70m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/artifact/out/Dockerfile
GITHUB PR #5913
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 13f9c2594a48 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 17acef1
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/testReport/
Max. process+thread count 89 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 47s trunk passed
+1 💚 compile 1m 25s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 1m 16s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 11s trunk passed
+1 💚 mvnsite 1m 26s trunk passed
+1 💚 javadoc 1m 10s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 39s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 24s trunk passed
+1 💚 shadedclient 40m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 13s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 1m 16s the patch passed
+1 💚 compile 1m 9s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 1m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 mvnsite 1m 19s the patch passed
+1 💚 javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 27s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 23s the patch passed
+1 💚 shadedclient 40m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 235m 33s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
391m 2s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/artifact/out/Dockerfile
GITHUB PR #5913
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 46ac54cbbc64 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 193dc45
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/testReport/
Max. process+thread count 2516 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@haiyang1987
Copy link
Contributor Author

Hi Sir @Hexiaoqiao @ayushtkn @tomscut Could you please help me review this pr when you have free time ? Thanks a lot~

@Hexiaoqiao
Copy link
Contributor

@haiyang1987 Thanks for your report and detailed description. It makes sense to me. However I am confused we use isPopulatingReplQueues to determine if it is Standby or Observer here.

@haiyang1987
Copy link
Contributor Author

haiyang1987 commented Aug 5, 2023

Sir @Hexiaoqiao thanks you help me review.
Consider here call the isPopulatedReplQueues method, because it can check the state of the current namenode,
and if the current namenode is active, and need to ensure leave first safe mode to execute subsequent processing logic.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when fix the nit comments inline.
@goiri @ayushtkn Would you like to take another review?


// Create HA Cluster.
try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
.nnTopology(MiniDFSNNTopology.simpleHATopology()).numDataNodes(10).build()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here num of DataNode is 10, is it necessary? I think it is enough to set 4, what do you think about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, here can set the number of datanode num to 4, which can also meet the test requirements.
I will update PR.

// Create test file.
Path file = new Path("/test");
long fileLength = 512;
DFSTestUtil.createFile(fs, file, fileLength, (short) 8, 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here set the replication to 4 is enough?

// Process the block only when active NN is out of safe mode.
if (!isPopulatingReplQueues()) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me.

@haiyang1987
Copy link
Contributor Author

Update PR.
Sir @Hexiaoqiao @tomscut please help me review this pr again when you have free time . Thanks a lot~

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 10s trunk passed
+1 💚 compile 1m 27s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 1m 15s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 1m 10s trunk passed
+1 💚 mvnsite 1m 24s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 37s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 25s trunk passed
+1 💚 shadedclient 41m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 12s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 8s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 1m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 59s the patch passed
+1 💚 mvnsite 1m 15s the patch passed
+1 💚 javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 24s the patch passed
+1 💚 shadedclient 41m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 234m 52s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
391m 38s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/artifact/out/Dockerfile
GITHUB PR #5913
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux dc15be12fd96 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fb1b5e5
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/testReport/
Max. process+thread count 2317 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@tomscut tomscut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1.

@Hexiaoqiao Hexiaoqiao changed the title HDFS-17137. Standby/Observer NameNode should not handle redundant rep… HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. Aug 8, 2023
@Hexiaoqiao Hexiaoqiao merged commit 5b81caf into apache:trunk Aug 8, 2023
1 of 2 checks passed
@Hexiaoqiao
Copy link
Contributor

The failed unit test is not related to this changes. Committed to trunk.
Thanks @haiyang1987 for your contribution and @tomscut reviews!

@haiyang1987
Copy link
Contributor Author

Thanks sir @Hexiaoqiao @tomscut help me review and merge !!!

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…a block logic when set decrease replication. (apache#5913). Contributed by Haiyang Hu.

Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants