HDFS-16155: Allow configurable exponential backoff in DFSInputStream refetchLocations #3271

bbeaudreault · 2021-08-05T18:03:29Z

Per https://issues.apache.org/jira/browse/HDFS-16155, we would like the ability to customize the backoff strategy when BlockMissingException occurs. This can happen when the balancer moves blocks, and in low latency clusters the existing backoff is too conservative. Drastically reducing the existing window base config would help but expose the namenode to a potential DDOS if many blocks became missing, because the current backoff would grow slowly.

Adding a configurable exponential component allows for aggressive early retries that back off quickly enough to mitigate stampeding herds. We make the backoff configurable by adding two new configs:

dfs.client.retry.window.multiplier: defaults to 1 to preserve existing behavior. Increasing this can result in a steeper backoff curve when desired
dfs.client.retry.window.max: defaults to Int.MAX to preserve existing behavior. Decreasing this can help put a ceiling on exponential backoffs that could quickly grow to effectively unlimited levels.

As described, the default behavior is maintained and I've added a test case to verify that. Someone looking for a more aggressive initial retry that backs off quickly in case of continuous failure could try setting window.base to 10, window.multiplier to 5, and window.max to 10000. This would result in a quick initial retry of max 50ms, but quickly backoff to a few seconds within 3 retries.

In order to improve the testability of this feature, I pulled out the existing refetchLocations retry configs into a FetchBlockLocationsRetryer class. I also improved the readability of the comment describing the backoff strategy, and fully tested the new retryer in TestFetchBlockLocationsRetryer.

hadoop-yetus · 2021-08-05T23:54:20Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 41s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 44s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 30s		trunk passed
+1 💚	compile	4m 58s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	4m 36s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 13s		trunk passed
+1 💚	mvnsite	2m 19s		trunk passed
+1 💚	javadoc	1m 39s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 10s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 30s		trunk passed
+1 💚	shadedclient	16m 8s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 1s		the patch passed
+1 💚	compile	4m 43s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	4m 43s		the patch passed
+1 💚	compile	4m 27s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	4m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	1m 6s	/results-checkstyle-hadoop-hdfs-project.txt	hadoop-hdfs-project: The patch generated 9 new + 333 unchanged - 0 fixed = 342 total (was 333)
+1 💚	mvnsite	2m 5s		the patch passed
+1 💚	javadoc	1m 23s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 48s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 37s		the patch passed
+1 💚	shadedclient	15m 57s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 21s		hadoop-hdfs-client in the patch passed.
-1 ❌	unit	235m 47s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 39s		The patch does not generate ASF License warnings.
		349m 21s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
	hadoop.tools.TestHdfsConfigFields

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux 9e6a49c0255e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 36ae933e9b812891a13427b3ef69d0c2cb0b79ce
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/testReport/
Max. process+thread count	3297 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-08-06T06:27:03Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 55s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 53s		Maven dependency ordering for branch
+1 💚	mvninstall	23m 58s		trunk passed
+1 💚	compile	6m 11s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	4m 48s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 17s		trunk passed
+1 💚	mvnsite	2m 24s		trunk passed
+1 💚	javadoc	1m 38s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 10s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 34s		trunk passed
+1 💚	shadedclient	16m 24s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 3s		the patch passed
+1 💚	compile	4m 48s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	4m 48s		the patch passed
+1 💚	compile	4m 24s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	4m 24s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	1m 6s	/results-checkstyle-hadoop-hdfs-project.txt	hadoop-hdfs-project: The patch generated 9 new + 333 unchanged - 0 fixed = 342 total (was 333)
+1 💚	mvnsite	2m 8s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 24s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 56s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 36s		the patch passed
+1 💚	shadedclient	15m 55s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 21s		hadoop-hdfs-client in the patch passed.
+1 💚	unit	230m 56s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 52s		The patch does not generate ASF License warnings.
		351m 13s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 069f9296845d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 6c13196446724a371519815361fc5992a5ff3d9b
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/testReport/
Max. process+thread count	3469 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

bbeaudreault · 2021-08-06T11:44:06Z

...oject/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java

Note: If i were adding this feature as brand new, I probably wouldn't include the * failures here -- the base and exponential are good enough IMO. But I needed this to maintain 100% parity with the existing backoff strategy

hadoop-yetus · 2021-08-07T01:42:42Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 40s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 48s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 22s		trunk passed
+1 💚	compile	4m 54s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	4m 33s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 17s		trunk passed
+1 💚	mvnsite	2m 17s		trunk passed
+1 💚	javadoc	1m 38s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 12s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 35s		trunk passed
+1 💚	shadedclient	16m 14s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 1s		the patch passed
+1 💚	compile	4m 47s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	4m 47s		the patch passed
+1 💚	compile	4m 25s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	4m 25s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 7s		the patch passed
+1 💚	mvnsite	2m 3s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 22s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 54s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 37s		the patch passed
+1 💚	shadedclient	16m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 21s		hadoop-hdfs-client in the patch passed.
+1 💚	unit	232m 8s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 47s		The patch does not generate ASF License warnings.
		346m 16s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 35d96c3aa1d5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 400b7995fea10fb926c825e529c6b23142c4619a
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/testReport/
Max. process+thread count	2825 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-08-07T03:53:33Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	17m 20s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 40s		Maven dependency ordering for branch
+1 💚	mvninstall	23m 5s		trunk passed
+1 💚	compile	5m 16s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	4m 52s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 14s		trunk passed
+1 💚	mvnsite	2m 16s		trunk passed
+1 💚	javadoc	1m 35s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 2s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 51s		trunk passed
+1 💚	shadedclient	18m 59s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 1s		the patch passed
+1 💚	compile	5m 10s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	5m 10s		the patch passed
+1 💚	compile	4m 49s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	4m 49s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 8s		the patch passed
+1 💚	mvnsite	2m 7s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 22s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 46s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 54s		the patch passed
+1 💚	shadedclient	18m 28s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 14s		hadoop-hdfs-client in the patch passed.
-1 ❌	unit	337m 15s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 39s		The patch does not generate ASF License warnings.
		476m 17s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.TestDecommissioningStatus
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 686f6ed0e729 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 400b7995fea10fb926c825e529c6b23142c4619a
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/testReport/
Max. process+thread count	1900 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-08-07T18:59:25Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 45s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 47s		Maven dependency ordering for branch
+1 💚	mvninstall	21m 32s		trunk passed
+1 💚	compile	6m 6s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	5m 49s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 28s		trunk passed
+1 💚	mvnsite	2m 43s		trunk passed
+1 💚	javadoc	1m 57s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 19s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	7m 19s		trunk passed
+1 💚	shadedclient	21m 37s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 31s		the patch passed
+1 💚	compile	6m 17s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	6m 17s		the patch passed
+1 💚	compile	5m 41s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	5m 41s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 20s		the patch passed
+1 💚	mvnsite	2m 30s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 43s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 19s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	7m 7s		the patch passed
+1 💚	shadedclient	20m 59s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 31s		hadoop-hdfs-client in the patch passed.
-1 ❌	unit	245m 46s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 48s		The patch does not generate ASF License warnings.
		381m 31s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
	hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy
	hadoop.hdfs.TestRollingUpgrade

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 9d9a8a2fc038 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 5ab105fe27166a1ee1f6bc35576befc333b89025
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/testReport/
Max. process+thread count	3098 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

…refetchLocations

hadoop-yetus · 2021-08-08T03:31:42Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	13m 4s		Maven dependency ordering for branch
+1 💚	mvninstall	26m 12s		trunk passed
+1 💚	compile	6m 3s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	5m 33s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 34s		trunk passed
+1 💚	mvnsite	2m 55s		trunk passed
+1 💚	javadoc	1m 55s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 22s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	6m 57s		trunk passed
+1 💚	shadedclient	22m 1s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 31s		the patch passed
+1 💚	compile	6m 22s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	6m 22s		the patch passed
+1 💚	compile	5m 30s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	5m 30s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 17s		the patch passed
+1 💚	mvnsite	2m 13s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 30s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 57s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	6m 33s		the patch passed
+1 💚	shadedclient	17m 19s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 22s		hadoop-hdfs-client in the patch passed.
+1 💚	unit	245m 32s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 45s		The patch does not generate ASF License warnings.
		381m 1s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 155fcc555150 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d51a70f`
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/testReport/
Max. process+thread count	3377 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Hexiaoqiao

Thanks @bbeaudreault for your works here. It makes sense to me almost. I prefer to keep the default action even if we used FetchBlockLocationsRetryer. After the first review, it seems the action is different here? Please correct me if something I missed. Thanks again.

Hexiaoqiao · 2021-08-25T10:44:36Z

hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml

+
+<property>
+  <name>dfs.client.retry.window.max</name>
+  <value>2147483647</value>


The default value for dfs.client.retry.window.max is too high here. In some corner case, it will sleep very long time?

Thanks so much for the review @Hexiaoqiao. The reason I chose this value was that I wanted the changes in this PR to be totally transparent to existing users -- so the backoff should work exactly as it does today for anyone who upgrades. I don't know how people have tuned their backoffs today, so adding a lower max might affect their configured backoffs. The default case will be well-bounded by the default retries of 3. That said, I agree that there's very little utility in waiting many minutes on a backoff. What if I put this to 30s?

Was this the only concern in terms of the default action? My test case testDefaultRetryPolicy proves that the default case remains unchanged from trunk. The default case was determined based on the comment in DFSInputStream, the old implementation details, and my own testing of the backoff policy prior to this change.

I also created this spreadsheet that helped me to determine how different multiplier values might affect the backoff: https://docs.google.com/spreadsheets/d/1I9ejqDtJ6-krSh-YBt0qHTf3JwZu5zRlrOhbzY0kJAg/edit?usp=sharing

@Hexiaoqiao I've pushed a commit which lowers the window max to 30s. As mentioned above, this may cap some custom backoffs people have configured. But that may be beneficial. It should not affect the default case, given the default of 3 retries does not reach 30s. Let me know if you'd prefer a different default.

bbeaudreault · 2021-08-25T11:54:51Z

...op-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestFetchBlockLocationsRetryer.java

+  }
+
+  @Test
+  public void testDefaultRetryPolicy() {


Per the comment in the original backoff policy:

// Introducing a random factor to the wait time before another retry. // The wait time is dependent on # of failures and a random factor. // At the first time of getting a BlockMissingException, the wait time // is a random number between 0..3000 ms. If the first retry // still fails, we will wait 3000 ms grace period before the 2nd retry. // Also at the second retry, the waiting window is expanded to 6000 ms // alleviating the request rate from the server. Similarly the 3rd retry // will wait 6000ms grace period before retry and the waiting window is // expanded to 9000ms.

The first backoff should be between 0-3000ms.

The second should be 3000 plus a random number between 0-6000ms. So the full range is 3000-9000.

The third retry should be 6000 plus a random number between 0-9000ms. So the full range is 6000-15000ms.

This test proves that this original retry strategy continues to work with the new code. It's hard to test with randomness, so the random factor is disabled. We're left with only the worst case scenario (if rand() returned 1). See the assertions below to see that the results adhere to the original description above.

hadoop-yetus · 2021-08-26T19:29:50Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	12m 46s		Maven dependency ordering for branch
+1 💚	mvninstall	24m 30s		trunk passed
+1 💚	compile	5m 52s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	5m 34s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	1m 15s		trunk passed
+1 💚	mvnsite	2m 27s		trunk passed
+1 💚	javadoc	1m 48s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	2m 13s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	6m 26s		trunk passed
+1 💚	shadedclient	18m 38s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 29s		the patch passed
+1 💚	compile	5m 41s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	5m 41s		the patch passed
+1 💚	compile	5m 7s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	5m 7s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 10s		the patch passed
+1 💚	mvnsite	2m 14s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 30s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	1m 56s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	6m 12s		the patch passed
+1 💚	shadedclient	16m 48s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 28s		hadoop-hdfs-client in the patch passed.
+1 💚	unit	245m 3s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 44s		The patch does not generate ASF License warnings.
		372m 18s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/artifact/out/Dockerfile
GITHUB PR	#3271
JIRA Issue	HDFS-16155
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 37c7e964f56c 4.15.0-151-generic #157-Ubuntu SMP Fri Jul 9 23:07:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `4e1300a`
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/testReport/
Max. process+thread count	3435 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

bbeaudreault · 2021-09-28T12:23:32Z

Any other comments on this patch? As tests demonstrate, it should have no impact on existing use-cases, aside from the requested backoff ceiling. It will enable operators to unlock faster retries if desired, and is much easier to read and test code.

bbeaudreault · 2022-02-22T13:42:56Z

@Hexiaoqiao can this be merged?

Hexiaoqiao · 2022-02-23T04:25:02Z

Thanks @bbeaudreault for your great works here and sorry for the late response. It looks good to me in general. +1 from my side. I would like to wait if any other guys are interested for this improvement.

bbeaudreault · 2022-03-04T15:21:14Z

Thanks for the approval @Hexiaoqiao. Is there a downside to jsut merging this? It's been open for over 6 months, so I doubt anyone else will be jumping in any time soon.

github-actions · 2025-11-26T00:23:58Z

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

bbeaudreault force-pushed the configurable_exponent_trunk branch from 36ae933 to 6c13196 Compare August 6, 2021 00:34

bbeaudreault commented Aug 6, 2021

View reviewed changes

bbeaudreault force-pushed the configurable_exponent_trunk branch 2 times, most recently from 9fdd67a to 400b799 Compare August 6, 2021 19:55

bbeaudreault force-pushed the configurable_exponent_trunk branch from 400b799 to 5ab105f Compare August 7, 2021 12:36

HDFS-16155: Allow configurable exponential backoff in DFSInputStream …

d51a70f

…refetchLocations

bbeaudreault force-pushed the configurable_exponent_trunk branch from 5ab105f to d51a70f Compare August 7, 2021 21:08

Hexiaoqiao reviewed Aug 25, 2021

View reviewed changes

bbeaudreault commented Aug 25, 2021

View reviewed changes

change window max to 30s

4e1300a

github-actions bot added the Stale label Nov 26, 2025

github-actions bot closed this Nov 27, 2025

HDFS-16155: Allow configurable exponential backoff in DFSInputStream refetchLocations #3271

HDFS-16155: Allow configurable exponential backoff in DFSInputStream refetchLocations #3271

Uh oh!

Conversation

bbeaudreault commented Aug 5, 2021

Uh oh!

hadoop-yetus commented Aug 5, 2021

Uh oh!

hadoop-yetus commented Aug 6, 2021

Uh oh!

bbeaudreault Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Aug 7, 2021

Uh oh!

hadoop-yetus commented Aug 7, 2021

Uh oh!

hadoop-yetus commented Aug 7, 2021

Uh oh!

hadoop-yetus commented Aug 8, 2021

Uh oh!

Hexiaoqiao left a comment

Choose a reason for hiding this comment

Uh oh!

Hexiaoqiao Aug 25, 2021

Choose a reason for hiding this comment

Uh oh!

bbeaudreault Aug 25, 2021

Choose a reason for hiding this comment

Uh oh!

bbeaudreault Aug 26, 2021

Choose a reason for hiding this comment

Uh oh!

bbeaudreault Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Aug 26, 2021

Uh oh!

bbeaudreault commented Sep 28, 2021

Uh oh!

bbeaudreault commented Feb 22, 2022

Uh oh!

Hexiaoqiao commented Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbeaudreault commented Mar 4, 2022

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bbeaudreault Aug 6, 2021 •

edited

Loading

bbeaudreault Aug 25, 2021 •

edited

Loading

Hexiaoqiao commented Feb 23, 2022 •

edited

Loading