HDFS-16989. Large scale block transfer causes too many excess blocks #5593

hfutatzhanghb · 2023-04-26T07:05:26Z

detailed description was in https://issues.apache.org/jira/browse/HDFS-16989

hfutatzhanghb · 2023-04-26T07:09:21Z

hi @ayushtkn @tomscut , sorry for disturbing you. now i have not add UT for this change. could you please take a look at
HDFS-16989 and determine whether this change is necessary or not.
thank a lot!

hadoop-yetus · 2023-04-26T14:43:55Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 6s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 45s		trunk passed
+1 💚	compile	1m 35s		trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	1m 24s		trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚	checkstyle	1m 8s		trunk passed
+1 💚	mvnsite	1m 33s		trunk passed
+1 💚	javadoc	1m 12s		trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	1m 37s		trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚	spotbugs	3m 48s		trunk passed
+1 💚	shadedclient	26m 17s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 20s		the patch passed
+1 💚	compile	1m 23s		the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	1m 23s		the patch passed
+1 💚	compile	1m 15s		the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚	javac	1m 15s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 51s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 28 unchanged - 0 fixed = 31 total (was 28)
+1 💚	mvnsite	1m 23s		the patch passed
+1 💚	javadoc	0m 54s		the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	1m 24s		the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
-1 ❌	spotbugs	3m 31s	/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html	hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚	shadedclient	28m 51s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	331m 8s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 50s		The patch does not generate ASF License warnings.
		457m 9s

Reason	Tests
SpotBugs	module:hadoop-hdfs-project/hadoop-hdfs
	org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor$BlockTargetPair defines equals and uses Object.hashCode() At DatanodeDescriptor.java:Object.hashCode() At DatanodeDescriptor.java:[lines 88-91]
Failed junit tests	hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage
	hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
	hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
	hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
	hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
	hadoop.hdfs.TestBlockStoragePolicy
	hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling
	hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
	hadoop.hdfs.server.blockmanagement.TestBlockManager
	hadoop.hdfs.server.namenode.TestHostsFiles
	hadoop.hdfs.server.datanode.TestDeleteBlockPool
	hadoop.hdfs.server.datanode.TestDataNodeMetrics
	hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
	hadoop.hdfs.server.blockmanagement.TestNodeCount
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
	hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
	hadoop.hdfs.server.datanode.TestBlockRecovery2
	hadoop.hdfs.TestFileAppend4
	hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy
	hadoop.hdfs.server.blockmanagement.TestPendingReconstruction
	hadoop.hdfs.server.mover.TestStorageMover
	hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
	hadoop.hdfs.server.datanode.TestDataNodeTcpNoDelay
	hadoop.hdfs.server.namenode.TestFSEditLogLoader
	hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
	hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
	hadoop.hdfs.server.namenode.TestProcessCorruptBlocks

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/artifact/out/Dockerfile
GITHUB PR	#5593
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux c53911d21bc4 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `0af5c72`
Default Java	Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/testReport/
Max. process+thread count	2413 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hfutatzhanghb · 2023-04-27T03:58:12Z

we can reproduce this case by execute hdfs dfs -setrep 3 /largedir on a large capacity dir(1.0PB) with replication factor 2.

hfutatzhanghb · 2023-05-05T06:48:32Z

hi, @goiri ~. could you please take a look at this if you have time， thanks a lot.

ZanderXu · 2023-05-05T07:23:53Z

Thanks @hfutatzhanghb for your report.

Small dfs.namenode.replication.work.multiplier.per.iteration or big dfs.namenode.reconstruction.pending.timeout-sec may solve your problem.

hfutatzhanghb · 2023-05-05T08:03:10Z

Thanks @hfutatzhanghb for your report.

Small dfs.namenode.replication.work.multiplier.per.iteration or big dfs.namenode.reconstruction.pending.timeout-sec may solve your problem.

@ZanderXu thanks for your reply. yes, the two configuration entry may ease this problem, but still have some other problems.

higer data loss risky. when we make dfs.namenode.reconstruction.pending.timeout-sec big, that means in this time period, we do not care about whether the replication task is successful or not. for example, we set pending timeout 2 hours and we have a block only with one replicas left, So, namenode generates replication task, but the target datanode was dead. we have to run the risk of losing our data in this 2 hours.
tuning dfs.namenode.replication.work.multiplier.per.iteration small is not effective when directory is quite large. because namenode generates livenodes * multiplier blocks replication tasks every 3s.

So, can we solve this problem at code level ? What's your opinion.

github-actions · 2025-10-22T00:23:25Z

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

HDFS-16989. Large scale block transfer causes too many excess blocks

0af5c72

github-actions bot added the Stale label Oct 22, 2025

github-actions bot closed this Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDFS-16989. Large scale block transfer causes too many excess blocks #5593

HDFS-16989. Large scale block transfer causes too many excess blocks #5593

Uh oh!

hfutatzhanghb commented Apr 26, 2023 •

edited

Loading

Uh oh!

hfutatzhanghb commented Apr 26, 2023

Uh oh!

hadoop-yetus commented Apr 26, 2023

Uh oh!

hfutatzhanghb commented Apr 27, 2023

Uh oh!

hfutatzhanghb commented May 5, 2023

Uh oh!

ZanderXu commented May 5, 2023

Uh oh!

hfutatzhanghb commented May 5, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDFS-16989. Large scale block transfer causes too many excess blocks #5593

HDFS-16989. Large scale block transfer causes too many excess blocks #5593

Uh oh!

Conversation

hfutatzhanghb commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hfutatzhanghb commented Apr 26, 2023

Uh oh!

hadoop-yetus commented Apr 26, 2023

Uh oh!

hfutatzhanghb commented Apr 27, 2023

Uh oh!

hfutatzhanghb commented May 5, 2023

Uh oh!

ZanderXu commented May 5, 2023

Uh oh!

hfutatzhanghb commented May 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hfutatzhanghb commented Apr 26, 2023 •

edited

Loading

hfutatzhanghb commented May 5, 2023 •

edited

Loading