-
Notifications
You must be signed in to change notification settings - Fork 478
Fix wait timeout logic for available tservers #3231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaces the retry object with a wait loop. Log messages are generated independent of max wait time.
|
I don't think the I wanted to get this change in to close out the open items for 1.10.3 |
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
EdColeman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks okay, but removing the retry in favor of a wait loop looses some functionality - mainly that there was a small pause at start. It also seems cleaner because the wait times do not need to be calculated. Also, with the retry, there would be a most 3 log messages generated for the wait.
Why did you choose not to use retry?
Implemented PR feedback to switch time checks over to using `System.nanoTime()`. Removed old import for Retry object. Added static import for timeUnit.NANOSECONDS.
I attempted using a Retry with an earlier version of this fix. However, while wait times do not need to be calculated with a Retry, the amount of retries in the given time duration window still need to be calculated. Otherwise it never completes and just keeps retrying. In my first iteration, the retry never matched up to my time value as the increment value would always cause the retry to overshoot the defined property value. ddanielr@f93c327#diff-56945d7261689b2323a668699ab5865a1e48ec8424b0d612091d29ae0f2fc67cR1510 Because of the complications with the Retry object, and the fact that I can add in a small wait at the start if you'd prefer, or I'm happy to dig further into the Retry object with someone and see if I'm just completely missing something. I'm not super concerned about the logs being spammed as this is a block on the main thread so nothing else should really be showing up in the logs until this method has completed. |
|
The following seems close to what is wanted (I built this against 3.0, so there may be changes needed for 1.10) Ignore the time and count values - they were picked for convenience and not what we'd what to use here. I think things to note:
|
This commit switches back to using a retry object without an incremental backoff. This allows the specified max Wait to always equal the exact amount of time that the main thread waits for tservers. It also switches the logging statement to use the retry log instead. This takes advantage of the Retry object's logging interval.
|
Dug into the retry object a bit with @EdColeman and found that removing the incrementing backoff and just using maxRetries makes the Retry object function as expected. I've pushed an updated commit that switches back to the Retry object, but calculates an exact max wait vs an approximate value. |
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
EdColeman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to use the TimeUnit static imports - but LGTM
dlmarion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks ok.
Switches to using static imports for MILLISECONDS and removes the TimeUnit import for improved readability.
This reverts commit e984229.
|
GitHub's UI seems to create a new branch to revert the change if you click the revert button, even if you abort the process and don't actually follow through with reverting anything. I'll delete the unintentionally created revert branch. #3235 is a fix subsequent to this PR that should be finished before merging the 1.10 branch forward into 2.1 and on. |
Replaces the retry object with a wait loop.
Log messages are generated independent of max wait time.
Closes #3159