-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-23951: Avoid high speed recursion trap in AsyncRequestFutureImpl. #1259
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM but how does it change our behavior?
@@ -747,9 +771,9 @@ private void resubmit(ServerName oldServer, List<Action> toReplay, | |||
// It should be possible to have some heuristics to take the right decision. Short term, | |||
// we go for one. | |||
boolean retryImmediately = throwable instanceof RetryImmediatelyException; | |||
int nextAttemptNumber = retryImmediately ? numAttempt : numAttempt + 1; | |||
int nextAttemptNumber = numAttempt + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this change our behavior @markrmiller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the JIRA. That helped. I think you need a comment here on why the arbitrary '3'.... even it is just referring back to the issue. Thanks.
Any feedback on above @markrmiller ? |
The hope is that we retry the loop very fast a few times and then start putting a small delay between retries so you don't make enough recursive calls to eat all your stack space in just a short time if the call keeps failing. I'm open to input on what number to use, I picked one out of thin air, let's try fast a couple times and then slow down. |
Closing abandoned PR |
While working on branch-2, I ran into an issue where a retryable error kept occurring and code in AsyncRequestFutureImpl would reduce the backoff wait to 0 and extremely rapidly eat up a of thread stack space with recursive retry calls. This little patch stops the backoff wait kill after 3 retries. Chosen kind of arbitrarily, perhaps 5 is the right number, but I find large retry counts tend to hide things and that has made me default to fairly conservative in all my arbitrary number picking.