HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

haiyang1987 · 2023-08-01T13:23:01Z

Description of PR

https://issues.apache.org/jira/browse/HDFS-17137

Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication.

At present, when call setReplication to execute the logic of decrease replication,

ActiveNameNode will call the BlockManager#processExtraRedundancyBlock method to select the dn of the redundant replica , will add to the excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be scheduled to delete the block on dn).
Then the StandyNameNode or ObserverNameNode load editlog and apply the SetReplicationOp, if the dn of the replica to be deleted has not yet performed incremental block report,
here also will BlockManager#processExtraRedundancyBlock method be called here to select the dn of the redundant replica and add it to the excessRedundancyMap (here selected the redundant dn may be inconsistent with the dn selected in the active namenode).

In excessRedundancyMap exist dn maybe affects the dn decommission, resulting can not to complete decommission dn operation in Standy/ObserverNameNode.

The specific cases are as follows:
For example a file is 3 replica (d1,d2,d3) and call setReplication set file to 2 replica.

ActiveNameNode select d1 with redundant replicas to add toexcessRedundancyMap and invalidateBlocks.
StandyNameNode replays SetReplicationOp (at this time, d1 has not yet executed incremental block report), so here maybe selected redundant replica dn are inconsistent with ActiveNameNode, such as select d2 to add excessRedundancyMap.
At this time, d1 completes deleting the block for incremental block report.
The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 from in the excessRedundancyMap when processing the incremental block report ).
The DN list for this block in StandyNameNode includes d2 and d3 (can not delete d2 from in the excessRedundancyMap when processing the incremental block report).
At this time, execute the decommission operation on d3.
ActiveNameNode will select a new node d4 to copy the replica, and d4 will run incrementally block report.
The DN list for this block in ActiveNameNode includes d2 and d3(decommissioning status),d4, then d3 can to decommissioned normally.
The DN list for this block in StandyNameNode is d3 (decommissioning status), d2 (redundant status), d4.
since the requirements for two live replica are not met, d3 cannot be decommissioned at this time.

Therefore, StandyNameNode or ObserverNameNode considers not process redundant replicas logic when call setReplication.

…lica block logic when set decrease replication

hadoop-yetus · 2023-08-01T14:34:45Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 41s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
-1 ❌	mvninstall	37m 18s	/branch-mvninstall-root.txt	root in trunk failed.
+1 💚	compile	1m 29s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	1m 30s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	1m 31s		trunk passed
+1 💚	mvnsite	1m 44s		trunk passed
+1 💚	javadoc	1m 24s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	2m 0s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	4m 24s		trunk passed
-1 ❌	shadedclient	9m 56s		branch has errors when building and testing our client artifacts.
			_ Patch Compile Tests _
-1 ❌	mvninstall	0m 24s	/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch failed.
-1 ❌	compile	0m 24s	/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt	hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌	javac	0m 24s	/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt	hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌	compile	0m 24s	/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.
-1 ❌	javac	0m 24s	/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.
+1 💚	blanks	0m 1s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 22s	/buildtool-patch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	The patch fails to run checkstyle in hadoop-hdfs
-1 ❌	mvnsite	0m 24s	/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch failed.
-1 ❌	javadoc	0m 24s	/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt	hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌	javadoc	0m 24s	/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.
-1 ❌	spotbugs	0m 25s	/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch failed.
+1 💚	shadedclient	5m 53s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	0m 24s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch failed.
+1 💚	asflicense	0m 50s		The patch does not generate ASF License warnings.
		70m 7s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/artifact/out/Dockerfile
GITHUB PR	#5913
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 13f9c2594a48 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `17acef1`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/testReport/
Max. process+thread count	89 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-08-02T08:36:20Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 52s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 47s		trunk passed
+1 💚	compile	1m 25s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	1m 16s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	1m 11s		trunk passed
+1 💚	mvnsite	1m 26s		trunk passed
+1 💚	javadoc	1m 10s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	1m 39s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	3m 24s		trunk passed
+1 💚	shadedclient	40m 59s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 13s		the patch passed
+1 💚	compile	1m 16s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	1m 16s		the patch passed
+1 💚	compile	1m 9s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	1m 9s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 2s		the patch passed
+1 💚	mvnsite	1m 19s		the patch passed
+1 💚	javadoc	0m 57s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	1m 27s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	3m 23s		the patch passed
+1 💚	shadedclient	40m 49s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	235m 33s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 46s		The patch does not generate ASF License warnings.
		391m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/artifact/out/Dockerfile
GITHUB PR	#5913
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 46ac54cbbc64 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `193dc45`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/testReport/
Max. process+thread count	2516 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

haiyang1987 · 2023-08-03T04:57:36Z

Hi Sir @Hexiaoqiao @ayushtkn @tomscut Could you please help me review this pr when you have free time ? Thanks a lot~

Hexiaoqiao · 2023-08-04T02:48:44Z

@haiyang1987 Thanks for your report and detailed description. It makes sense to me. However I am confused we use isPopulatingReplQueues to determine if it is Standby or Observer here.

haiyang1987 · 2023-08-05T04:01:15Z

Sir @Hexiaoqiao thanks you help me review.
Consider here call the isPopulatedReplQueues method, because it can check the state of the current namenode,
and if the current namenode is active, and need to ensure leave first safe mode to execute subsequent processing logic.

Hexiaoqiao

LGTM when fix the nit comments inline.
@goiri @ayushtkn Would you like to take another review?

Hexiaoqiao · 2023-08-07T05:57:55Z

...hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyBlockManagement.java

+
+    // Create HA Cluster.
+    try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
+        .nnTopology(MiniDFSNNTopology.simpleHATopology()).numDataNodes(10).build()) {


Here num of DataNode is 10, is it necessary? I think it is enough to set 4, what do you think about?

yeah, here can set the number of datanode num to 4, which can also meet the test requirements.
I will update PR.

Hexiaoqiao · 2023-08-07T05:58:59Z

...hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyBlockManagement.java

+      // Create test file.
+      Path file = new Path("/test");
+      long fileLength = 512;
+      DFSTestUtil.createFile(fs, file, fileLength, (short) 8, 0L);


Here set the replication to 4 is enough?

tomscut · 2023-08-07T06:31:37Z

...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java

+    // Process the block only when active NN is out of safe mode.
+    if (!isPopulatingReplQueues()) {
+      return;
+    }


The change makes sense to me.

haiyang1987 · 2023-08-07T11:05:30Z

Update PR.
Sir @Hexiaoqiao @tomscut please help me review this pr again when you have free time . Thanks a lot~

hadoop-yetus · 2023-08-07T17:36:57Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 55s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	49m 10s		trunk passed
+1 💚	compile	1m 27s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	compile	1m 15s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	1m 10s		trunk passed
+1 💚	mvnsite	1m 24s		trunk passed
+1 💚	javadoc	1m 9s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	1m 37s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	3m 25s		trunk passed
+1 💚	shadedclient	41m 17s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 12s		the patch passed
+1 💚	compile	1m 17s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javac	1m 17s		the patch passed
+1 💚	compile	1m 8s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	1m 8s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 59s		the patch passed
+1 💚	mvnsite	1m 15s		the patch passed
+1 💚	javadoc	0m 57s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	1m 30s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	3m 24s		the patch passed
+1 💚	shadedclient	41m 4s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	234m 52s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 45s		The patch does not generate ASF License warnings.
		391m 38s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.datanode.TestDirectoryScanner

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/artifact/out/Dockerfile
GITHUB PR	#5913
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux dc15be12fd96 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `fb1b5e5`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/testReport/
Max. process+thread count	2317 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5913/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

tomscut

LGTM.

Hexiaoqiao

LGTM. +1.

Hexiaoqiao · 2023-08-08T07:43:13Z

The failed unit test is not related to this changes. Committed to trunk.
Thanks @haiyang1987 for your contribution and @tomscut reviews!

haiyang1987 · 2023-08-08T11:09:20Z

Thanks sir @Hexiaoqiao @tomscut help me review and merge !!!

…a block logic when set decrease replication. (apache#5913). Contributed by Haiyang Hu. Reviewed-by: Tao Li <tomscut@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>

HDFS-17137. Standby/Observer NameNode should not handle redundant rep…

17acef1

…lica block logic when set decrease replication

github-actions bot added HDFS trunk labels Aug 1, 2023

Trigger notification

193dc45

Hexiaoqiao reviewed Aug 7, 2023

View reviewed changes

tomscut reviewed Aug 7, 2023

View reviewed changes

HDFS-17137. Modify patch based on comments

fb1b5e5

tomscut approved these changes Aug 8, 2023

View reviewed changes

Hexiaoqiao approved these changes Aug 8, 2023

View reviewed changes

Hexiaoqiao changed the title ~~HDFS-17137. Standby/Observer NameNode should not handle redundant rep…~~ HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. Aug 8, 2023

Hexiaoqiao merged commit 5b81caf into apache:trunk Aug 8, 2023
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

haiyang1987 commented Aug 1, 2023 •

edited

Loading

hadoop-yetus commented Aug 1, 2023

hadoop-yetus commented Aug 2, 2023

haiyang1987 commented Aug 3, 2023

Hexiaoqiao commented Aug 4, 2023

haiyang1987 commented Aug 5, 2023 •

edited

Loading

Hexiaoqiao left a comment

Hexiaoqiao Aug 7, 2023

haiyang1987 Aug 7, 2023

Hexiaoqiao Aug 7, 2023

tomscut Aug 7, 2023

haiyang1987 commented Aug 7, 2023

hadoop-yetus commented Aug 7, 2023

tomscut left a comment

Hexiaoqiao left a comment

Hexiaoqiao commented Aug 8, 2023

haiyang1987 commented Aug 8, 2023

HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

HDFS-17137. Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. #5913

Conversation

haiyang1987 commented Aug 1, 2023 • edited Loading

Description of PR

hadoop-yetus commented Aug 1, 2023

hadoop-yetus commented Aug 2, 2023

haiyang1987 commented Aug 3, 2023

Hexiaoqiao commented Aug 4, 2023

haiyang1987 commented Aug 5, 2023 • edited Loading

Hexiaoqiao left a comment

Choose a reason for hiding this comment

Hexiaoqiao Aug 7, 2023

Choose a reason for hiding this comment

haiyang1987 Aug 7, 2023

Choose a reason for hiding this comment

Hexiaoqiao Aug 7, 2023

Choose a reason for hiding this comment

tomscut Aug 7, 2023

Choose a reason for hiding this comment

haiyang1987 commented Aug 7, 2023

hadoop-yetus commented Aug 7, 2023

tomscut left a comment

Choose a reason for hiding this comment

Hexiaoqiao left a comment

Choose a reason for hiding this comment

Hexiaoqiao commented Aug 8, 2023

haiyang1987 commented Aug 8, 2023

haiyang1987 commented Aug 1, 2023 •

edited

Loading

haiyang1987 commented Aug 5, 2023 •

edited

Loading