Skip to content

Conversation

@ddanielr
Copy link
Contributor

@ddanielr ddanielr commented Mar 9, 2023

Replaces the retry object with a wait loop.
Log messages are generated independent of max wait time.

Closes #3159

Replaces the retry object with a wait loop.
Log messages are generated independent of max wait time.
@ddanielr
Copy link
Contributor Author

ddanielr commented Mar 9, 2023

I don't think the sleepInterval is calculated elegantly. Dividing the wait interval into 10 sleep periods seemed to be ok for now, but I welcome better suggestions.

I wanted to get this change in to close out the open items for 1.10.3

Copy link
Contributor

@EdColeman EdColeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks okay, but removing the retry in favor of a wait loop looses some functionality - mainly that there was a small pause at start. It also seems cleaner because the wait times do not need to be calculated. Also, with the retry, there would be a most 3 log messages generated for the wait.

Why did you choose not to use retry?

Implemented PR feedback to switch time checks over to using `System.nanoTime()`.
Removed old import for Retry object.
Added static import for timeUnit.NANOSECONDS.
@ddanielr
Copy link
Contributor Author

ddanielr commented Mar 9, 2023

Overall this looks okay, but removing the retry in favor of a wait loop looses some functionality - mainly that there was a small pause at start. It also seems cleaner because the wait times do not need to be calculated. Also, with the retry, there would be a most 3 log messages generated for the wait.

Why did you choose not to use retry?

I attempted using a Retry with an earlier version of this fix. However, while wait times do not need to be calculated with a Retry, the amount of retries in the given time duration window still need to be calculated. Otherwise it never completes and just keeps retrying.

In my first iteration, the retry never matched up to my time value as the increment value would always cause the retry to overshoot the defined property value.

ddanielr@f93c327#diff-56945d7261689b2323a668699ab5865a1e48ec8424b0d612091d29ae0f2fc67cR1510

Because of the complications with the Retry object, and the fact thatblockForTservers() is not attempting an operation as it's just blocking, I chose to simplify it for a wait loop.

I can add in a small wait at the start if you'd prefer, or I'm happy to dig further into the Retry object with someone and see if I'm just completely missing something.

I'm not super concerned about the logs being spammed as this is a block on the main thread so nothing else should really be showing up in the logs until this method has completed.

@EdColeman
Copy link
Contributor

EdColeman commented Mar 9, 2023

The following seems close to what is wanted (I built this against 3.0, so there may be changes needed for 1.10) Ignore the time and count values - they were picked for convenience and not what we'd what to use here.

     Retry.builder().maxRetries(10).retryAfter(50, MILLISECONDS).incrementBy(100, MILLISECONDS)
                        .maxWait(maxWaitSec, SECONDS).backOffFactor(2.0).logInterval(1, SECONDS).createRetry();

...
  while(retry.canRetry()){
            attempts++;
            retry.logRetry(log, "current counts " + attempts);
            retry.waitForNextAttempt(log, "pause for next attempt");
            retry.useRetry();
        }

I think things to note:

  • maxWait set the max delay and factors in with the backoff, the delay time will increase up to that value with each attempt.
  • maxRetries is a count
  • removing infiniteRetries() and then adding useRetry is required to advance the retry attempt logic

This commit switches back to using a retry object without an
incremental backoff.
This allows the specified max Wait to always equal the exact amount
of time that the main thread waits for tservers.
It also switches the logging statement to use the retry log instead.
This takes advantage of the Retry object's logging interval.
@ddanielr
Copy link
Contributor Author

Dug into the retry object a bit with @EdColeman and found that removing the incrementing backoff and just using maxRetries makes the Retry object function as expected.

I've pushed an updated commit that switches back to the Retry object, but calculates an exact max wait vs an approximate value.
It also switches the logging statement to use the Retry logging method which takes advantage of the Retry object's logging interval.

Copy link
Contributor

@EdColeman EdColeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to use the TimeUnit static imports - but LGTM

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks ok.

Switches to using static imports for MILLISECONDS and removes the
TimeUnit import for improved readability.
@ctubbsii
Copy link
Member

GitHub's UI seems to create a new branch to revert the change if you click the revert button, even if you abort the process and don't actually follow through with reverting anything. I'll delete the unintentionally created revert branch. #3235 is a fix subsequent to this PR that should be finished before merging the 1.10 branch forward into 2.1 and on.

@ddanielr ddanielr deleted the bugfix/change-wait-logic branch March 13, 2023 17:59
asfgit pushed a commit that referenced this pull request Mar 13, 2023
Fix merge of #3231 and #3235 from 1.10 to 2.1
@ctubbsii ctubbsii added this to the 1.10.3 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants