Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-23951: Avoid high speed recursion trap in AsyncRequestFutureImpl. #1259

Closed
wants to merge 1 commit into from

Conversation

markrmiller
Copy link
Member

While working on branch-2, I ran into an issue where a retryable error kept occurring and code in AsyncRequestFutureImpl would reduce the backoff wait to 0 and extremely rapidly eat up a of thread stack space with recursive retry calls. This little patch stops the backoff wait kill after 3 retries. Chosen kind of arbitrarily, perhaps 5 is the right number, but I find large retry counts tend to hide things and that has made me default to fairly conservative in all my arbitrary number picking.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --findbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 5m 30s branch-2 passed
+1 💚 compile 0m 26s branch-2 passed
+1 💚 shadedjars 4m 20s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 2s the patch passed
+1 💚 compile 0m 27s the patch passed
+1 💚 javac 0m 27s the patch passed
+1 💚 shadedjars 4m 17s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s the patch passed
_ Other Tests _
+1 💚 unit 1m 58s hbase-client in the patch passed.
24m 43s
Subsystem Report/Notes
Docker Client=19.03.7 Server=19.03.7 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #1259
JIRA Issue HBASE-23951
Optional Tests javac javadoc unit shadedjars compile
uname Linux a7f729cb3dd2 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / dacceba
Default Java 1.8.0_232
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/testReport/
Max. process+thread count 623 (vs. ulimit of 10000)
modules C: hbase-client U: hbase-client
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 5m 20s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --findbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 6m 37s branch-2 passed
+1 💚 compile 0m 32s branch-2 passed
+1 💚 shadedjars 4m 42s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 30s hbase-client in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 6m 0s the patch passed
+1 💚 compile 0m 31s the patch passed
+1 💚 javac 0m 31s the patch passed
+1 💚 shadedjars 4m 35s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 28s hbase-client in the patch failed.
_ Other Tests _
+1 💚 unit 2m 11s hbase-client in the patch passed.
32m 39s
Subsystem Report/Notes
Docker Client=19.03.7 Server=19.03.7 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #1259
JIRA Issue HBASE-23951
Optional Tests javac javadoc unit shadedjars compile
uname Linux 91020c7032ca 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / dacceba
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/testReport/
Max. process+thread count 468 (vs. ulimit of 10000)
modules C: hbase-client U: hbase-client
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 5m 36s branch-2 passed
+1 💚 checkstyle 0m 34s branch-2 passed
+0 🆗 spotbugs 1m 6s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 5s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 1s the patch passed
+1 💚 checkstyle 0m 31s hbase-client: The patch generated 0 new + 0 unchanged - 21 fixed = 0 total (was 21)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 15m 44s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 findbugs 1m 8s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
37m 7s
Subsystem Report/Notes
Docker Client=19.03.7 Server=19.03.7 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #1259
JIRA Issue HBASE-23951
Optional Tests dupname asflicense spotbugs findbugs hadoopcheck hbaseanti checkstyle
uname Linux 70f04b541d95 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / dacceba
Max. process+thread count 93 (vs. ulimit of 10000)
modules C: hbase-client U: hbase-client
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1259/1/console
versions git=2.17.1 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM but how does it change our behavior?

@@ -747,9 +771,9 @@ private void resubmit(ServerName oldServer, List<Action> toReplay,
// It should be possible to have some heuristics to take the right decision. Short term,
// we go for one.
boolean retryImmediately = throwable instanceof RetryImmediatelyException;
int nextAttemptNumber = retryImmediately ? numAttempt : numAttempt + 1;
int nextAttemptNumber = numAttempt + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this change our behavior @markrmiller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the JIRA. That helped. I think you need a comment here on why the arbitrary '3'.... even it is just referring back to the issue. Thanks.

@saintstack
Copy link
Contributor

Any feedback on above @markrmiller ?

@markrmiller
Copy link
Member Author

markrmiller commented May 11, 2020

The hope is that we retry the loop very fast a few times and then start putting a small delay between retries so you don't make enough recursive calls to eat all your stack space in just a short time if the call keeps failing.

I'm open to input on what number to use, I picked one out of thin air, let's try fast a couple times and then slow down.

@saintstack
Copy link
Contributor

Closing abandoned PR

@saintstack saintstack closed this Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants