Skip to content

Conversation

@bbeaudreault
Copy link
Contributor

Per https://issues.apache.org/jira/browse/HDFS-16155, we would like the ability to customize the backoff strategy when BlockMissingException occurs. This can happen when the balancer moves blocks, and in low latency clusters the existing backoff is too conservative. Drastically reducing the existing window base config would help but expose the namenode to a potential DDOS if many blocks became missing, because the current backoff would grow slowly.

Adding a configurable exponential component allows for aggressive early retries that back off quickly enough to mitigate stampeding herds. We make the backoff configurable by adding two new configs:

  • dfs.client.retry.window.multiplier: defaults to 1 to preserve existing behavior. Increasing this can result in a steeper backoff curve when desired
  • dfs.client.retry.window.max: defaults to Int.MAX to preserve existing behavior. Decreasing this can help put a ceiling on exponential backoffs that could quickly grow to effectively unlimited levels.

As described, the default behavior is maintained and I've added a test case to verify that. Someone looking for a more aggressive initial retry that backs off quickly in case of continuous failure could try setting window.base to 10, window.multiplier to 5, and window.max to 10000. This would result in a quick initial retry of max 50ms, but quickly backoff to a few seconds within 3 retries.

In order to improve the testability of this feature, I pulled out the existing refetchLocations retry configs into a FetchBlockLocationsRetryer class. I also improved the readability of the comment describing the backoff strategy, and fully tested the new retryer in TestFetchBlockLocationsRetryer.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 44s Maven dependency ordering for branch
+1 💚 mvninstall 20m 30s trunk passed
+1 💚 compile 4m 58s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 4m 36s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 13s trunk passed
+1 💚 mvnsite 2m 19s trunk passed
+1 💚 javadoc 1m 39s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 10s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 30s trunk passed
+1 💚 shadedclient 16m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 2m 1s the patch passed
+1 💚 compile 4m 43s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 4m 43s the patch passed
+1 💚 compile 4m 27s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 4m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 6s /results-checkstyle-hadoop-hdfs-project.txt hadoop-hdfs-project: The patch generated 9 new + 333 unchanged - 0 fixed = 342 total (was 333)
+1 💚 mvnsite 2m 5s the patch passed
+1 💚 javadoc 1m 23s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 48s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 37s the patch passed
+1 💚 shadedclient 15m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 235m 47s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
349m 21s
Reason Tests
Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
hadoop.tools.TestHdfsConfigFields
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 9e6a49c0255e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 36ae933e9b812891a13427b3ef69d0c2cb0b79ce
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/testReport/
Max. process+thread count 3297 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bbeaudreault bbeaudreault force-pushed the configurable_exponent_trunk branch from 36ae933 to 6c13196 Compare August 6, 2021 00:34
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 53s Maven dependency ordering for branch
+1 💚 mvninstall 23m 58s trunk passed
+1 💚 compile 6m 11s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 4m 48s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 17s trunk passed
+1 💚 mvnsite 2m 24s trunk passed
+1 💚 javadoc 1m 38s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 10s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 34s trunk passed
+1 💚 shadedclient 16m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 2m 3s the patch passed
+1 💚 compile 4m 48s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 4m 48s the patch passed
+1 💚 compile 4m 24s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 4m 24s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 6s /results-checkstyle-hadoop-hdfs-project.txt hadoop-hdfs-project: The patch generated 9 new + 333 unchanged - 0 fixed = 342 total (was 333)
+1 💚 mvnsite 2m 8s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 24s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 56s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 36s the patch passed
+1 💚 shadedclient 15m 55s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hadoop-hdfs-client in the patch passed.
+1 💚 unit 230m 56s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
351m 13s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 069f9296845d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6c13196446724a371519815361fc5992a5ff3d9b
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/testReport/
Max. process+thread count 3469 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor Author

@bbeaudreault bbeaudreault Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If i were adding this feature as brand new, I probably wouldn't include the * failures here -- the base and exponential are good enough IMO. But I needed this to maintain 100% parity with the existing backoff strategy

@bbeaudreault bbeaudreault force-pushed the configurable_exponent_trunk branch 2 times, most recently from 9fdd67a to 400b799 Compare August 6, 2021 19:55
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 48s Maven dependency ordering for branch
+1 💚 mvninstall 20m 22s trunk passed
+1 💚 compile 4m 54s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 4m 33s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 17s trunk passed
+1 💚 mvnsite 2m 17s trunk passed
+1 💚 javadoc 1m 38s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 12s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 35s trunk passed
+1 💚 shadedclient 16m 14s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 2m 1s the patch passed
+1 💚 compile 4m 47s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 4m 47s the patch passed
+1 💚 compile 4m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 4m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 7s the patch passed
+1 💚 mvnsite 2m 3s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 22s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 54s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 37s the patch passed
+1 💚 shadedclient 16m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hadoop-hdfs-client in the patch passed.
+1 💚 unit 232m 8s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
346m 16s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 35d96c3aa1d5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 400b7995fea10fb926c825e529c6b23142c4619a
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/testReport/
Max. process+thread count 2825 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 17m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 40s Maven dependency ordering for branch
+1 💚 mvninstall 23m 5s trunk passed
+1 💚 compile 5m 16s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 4m 52s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 14s trunk passed
+1 💚 mvnsite 2m 16s trunk passed
+1 💚 javadoc 1m 35s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 2s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 51s trunk passed
+1 💚 shadedclient 18m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 2m 1s the patch passed
+1 💚 compile 5m 10s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 5m 10s the patch passed
+1 💚 compile 4m 49s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 4m 49s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 8s the patch passed
+1 💚 mvnsite 2m 7s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 22s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 46s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 54s the patch passed
+1 💚 shadedclient 18m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 14s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 337m 15s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
476m 17s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 686f6ed0e729 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 400b7995fea10fb926c825e529c6b23142c4619a
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/testReport/
Max. process+thread count 1900 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bbeaudreault bbeaudreault force-pushed the configurable_exponent_trunk branch from 400b799 to 5ab105f Compare August 7, 2021 12:36
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 47s Maven dependency ordering for branch
+1 💚 mvninstall 21m 32s trunk passed
+1 💚 compile 6m 6s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 5m 49s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 28s trunk passed
+1 💚 mvnsite 2m 43s trunk passed
+1 💚 javadoc 1m 57s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 19s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 7m 19s trunk passed
+1 💚 shadedclient 21m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 2m 31s the patch passed
+1 💚 compile 6m 17s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 6m 17s the patch passed
+1 💚 compile 5m 41s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 5m 41s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 20s the patch passed
+1 💚 mvnsite 2m 30s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 43s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 19s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 7m 7s the patch passed
+1 💚 shadedclient 20m 59s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 31s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 245m 46s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
381m 31s
Reason Tests
Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy
hadoop.hdfs.TestRollingUpgrade
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 9d9a8a2fc038 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5ab105fe27166a1ee1f6bc35576befc333b89025
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/testReport/
Max. process+thread count 3098 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bbeaudreault bbeaudreault force-pushed the configurable_exponent_trunk branch from 5ab105f to d51a70f Compare August 7, 2021 21:08
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 4s Maven dependency ordering for branch
+1 💚 mvninstall 26m 12s trunk passed
+1 💚 compile 6m 3s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 5m 33s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 34s trunk passed
+1 💚 mvnsite 2m 55s trunk passed
+1 💚 javadoc 1m 55s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 22s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 6m 57s trunk passed
+1 💚 shadedclient 22m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 2m 31s the patch passed
+1 💚 compile 6m 22s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 6m 22s the patch passed
+1 💚 compile 5m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 5m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 17s the patch passed
+1 💚 mvnsite 2m 13s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 57s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 6m 33s the patch passed
+1 💚 shadedclient 17m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 22s hadoop-hdfs-client in the patch passed.
+1 💚 unit 245m 32s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
381m 1s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 155fcc555150 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d51a70f
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/testReport/
Max. process+thread count 3377 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bbeaudreault for your works here. It makes sense to me almost. I prefer to keep the default action even if we used FetchBlockLocationsRetryer. After the first review, it seems the action is different here? Please correct me if something I missed. Thanks again.


<property>
<name>dfs.client.retry.window.max</name>
<value>2147483647</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for dfs.client.retry.window.max is too high here. In some corner case, it will sleep very long time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the review @Hexiaoqiao. The reason I chose this value was that I wanted the changes in this PR to be totally transparent to existing users -- so the backoff should work exactly as it does today for anyone who upgrades. I don't know how people have tuned their backoffs today, so adding a lower max might affect their configured backoffs. The default case will be well-bounded by the default retries of 3. That said, I agree that there's very little utility in waiting many minutes on a backoff. What if I put this to 30s?

Was this the only concern in terms of the default action? My test case testDefaultRetryPolicy proves that the default case remains unchanged from trunk. The default case was determined based on the comment in DFSInputStream, the old implementation details, and my own testing of the backoff policy prior to this change.

I also created this spreadsheet that helped me to determine how different multiplier values might affect the backoff: https://docs.google.com/spreadsheets/d/1I9ejqDtJ6-krSh-YBt0qHTf3JwZu5zRlrOhbzY0kJAg/edit?usp=sharing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexiaoqiao I've pushed a commit which lowers the window max to 30s. As mentioned above, this may cap some custom backoffs people have configured. But that may be beneficial. It should not affect the default case, given the default of 3 retries does not reach 30s. Let me know if you'd prefer a different default.

}

@Test
public void testDefaultRetryPolicy() {
Copy link
Contributor Author

@bbeaudreault bbeaudreault Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment in the original backoff policy:

  // Introducing a random factor to the wait time before another retry.
  // The wait time is dependent on # of failures and a random factor.
  // At the first time of getting a BlockMissingException, the wait time
  // is a random number between 0..3000 ms. If the first retry
  // still fails, we will wait 3000 ms grace period before the 2nd retry.
  // Also at the second retry, the waiting window is expanded to 6000 ms
  // alleviating the request rate from the server. Similarly the 3rd retry
  // will wait 6000ms grace period before retry and the waiting window is
  // expanded to 9000ms.
  • The first backoff should be between 0-3000ms.
  • The second should be 3000 plus a random number between 0-6000ms. So the full range is 3000-9000.
  • The third retry should be 6000 plus a random number between 0-9000ms. So the full range is 6000-15000ms.

This test proves that this original retry strategy continues to work with the new code. It's hard to test with randomness, so the random factor is disabled. We're left with only the worst case scenario (if rand() returned 1). See the assertions below to see that the results adhere to the original description above.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 46s Maven dependency ordering for branch
+1 💚 mvninstall 24m 30s trunk passed
+1 💚 compile 5m 52s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 5m 34s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 15s trunk passed
+1 💚 mvnsite 2m 27s trunk passed
+1 💚 javadoc 1m 48s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 13s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 6m 26s trunk passed
+1 💚 shadedclient 18m 38s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 2m 29s the patch passed
+1 💚 compile 5m 41s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 5m 41s the patch passed
+1 💚 compile 5m 7s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 5m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 10s the patch passed
+1 💚 mvnsite 2m 14s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 56s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 6m 12s the patch passed
+1 💚 shadedclient 16m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 28s hadoop-hdfs-client in the patch passed.
+1 💚 unit 245m 3s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
372m 18s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/artifact/out/Dockerfile
GITHUB PR #3271
JIRA Issue HDFS-16155
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 37c7e964f56c 4.15.0-151-generic #157-Ubuntu SMP Fri Jul 9 23:07:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4e1300a
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/testReport/
Max. process+thread count 3435 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3271/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bbeaudreault
Copy link
Contributor Author

Any other comments on this patch? As tests demonstrate, it should have no impact on existing use-cases, aside from the requested backoff ceiling. It will enable operators to unlock faster retries if desired, and is much easier to read and test code.

@bbeaudreault
Copy link
Contributor Author

@Hexiaoqiao can this be merged?

@Hexiaoqiao
Copy link
Contributor

Hexiaoqiao commented Feb 23, 2022

Thanks @bbeaudreault for your great works here and sorry for the late response. It looks good to me in general. +1 from my side. I would like to wait if any other guys are interested for this improvement.

@bbeaudreault
Copy link
Contributor Author

Thanks for the approval @Hexiaoqiao. Is there a downside to jsut merging this? It's been open for over 6 months, so I doubt anyone else will be jumping in any time soon.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Nov 26, 2025
@github-actions github-actions bot closed this Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants