Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC-1059 Changed ES connector to use exponential backoff with jitter #114

Closed
wants to merge 1 commit into from

Conversation

rhauch
Copy link
Member

@rhauch rhauch commented Sep 5, 2017

With lots of ES connector tasks all hitting the ES backend, when the ES backend becomes overloaded all of the tasks will experience timeouts (possibly at nearly the same time) and thus retry. Prior to this change, all tasks would use the same constant backoff time and would thus all retry at about the same point in time and possibly overwhelming the ES backend. This is known as a thundering herd, and when many attempts fail it takes a long time and many attempts to recover.

A solution to this problem is to use expontential backoff to give the ES backend time to recover, except that this alone doesn’t really reduce the thundering herd problem. To solve both problems we use expontential backoff but with jitter, which is a randomization of the sleep times for each of the attempts. This PR adds exponential backoff with jitter.

We need at least two configuration parameters to control this approach. Since retry.backoff.ms already exists, we can co-opt it to define the minimum time to wait during retries, but we need another configuration property to define the maximum time to wait. (We could expose parameters to control the exponential part of the equation, but that’s unnecessarily complicated.) This new algorithm computes the normal exponential backoff based upon the retry.backoff.ms (the minimum value) and the max.retry.backoff.ms value, bounding the result to be within these two values, and then choosing a random value within that range.

Note that to maintain backward compatibility, we always wait for exactly the minimum time if it is equal to or exceeds the maximum time to wait. This might happen if an existing connector configuration defines the retry.backoff.ms value but doesn’t set the newer max.retry.backoff.ms. Note the default value of max.retry.backoff.ms is 10 seconds and hopefully larger than most values of retry.backoff.ms that might be used.

This PR should only be merged onto the 3.3.x branch; see PR #115 for the 3.4.x and master branches.

With lots of ES connector tasks all hitting the ES backend, when the ES backend becomes overloaded all of the tasks will experience timeouts (possibly at nearly the same time) and thus retry. Prior to this change, all tasks would use the same constant backoff time and would thus all retry at about the same point in time and possibly overwhelming the ES backend. This is known as a thundering herd, and when many attempts fail it takes a long time and many attempts to recover.

A solution to this problem is to use expontential backoff to give the ES backend time to recover, except that this alone doesn’t really reduce the thundering herd problem. To solve both problems we use expontential backoff but with jitter, which is a randomization of the sleep times for each of the attempts. This PR adds exponential backoff with jitter.

We need at least two configuration parameters to control this approach. Since `retry.backoff.ms` already exists, we can co-opt it to define the minimum time to wait during retries, but we need another configuration property to define the maximum time to wait. (We could expose parameters to control the exponential part of the equation, but that’s unnecessarily complicated.) This new algorithm computes the normal exponential backoff based upon the `retry.backoff.ms` (the minimum value) and the `max.retry.backoff.ms` value, bounding the result to be within these two values, and then choosing a random value within that range.

Note that to maintain backward compatibility, we always wait for exactly the minimum time if it is equal to or exceeds the maximum time to wait. This might happen if an existing connector configuration defines the `retry.backoff.ms` value but doesn’t set the newer `max.retry.backoff.ms`. Note the default value of `max.retry.backoff.ms` is 10 seconds and hopefully larger than most values of `retry.backoff.ms` that might be used.
Copy link
Member

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the jitter. A few correctness comments and a question about whether we really need max. I'm afraid it has more cons than pros.

protected static long computeRetryWaitTimeInMillis(int retryAttempt, long minRetryBackoffMs, long maxRetryBackoffMs) {
if (minRetryBackoffMs >= maxRetryBackoffMs) return minRetryBackoffMs; // this was the original value and likely what they're setting
if (minRetryBackoffMs < 0) minRetryBackoffMs = 0;
long nextMaxTimeMs = Math.max(minRetryBackoffMs + 1, Math.min(maxRetryBackoffMs, minRetryBackoffMs * 2 ^ retryAttempt));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In java ^ corresponds to XOR and not Math.pow. This makes me think we probably need a test to confirm that the actual waiting time corresponds to exponential backoff.

Since we are raising at a power of 2, I'd suggest bit shifting, such as: minRetryBackoffMs << retryAttempt; (haven't tried it, please confirm, we'll be doing something similar in the S3 connector).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure what I was thinking. Thank goodness for reviews!

@@ -35,7 +36,8 @@
public static final String MAX_RETRIES_CONFIG = "max.retries";
public static final String TYPE_NAME_CONFIG = "type.name";
public static final String TOPIC_INDEX_MAP_CONFIG = "topic.index.map";
public static final String RETRY_BACKOFF_MS_CONFIG = "retry.backoff.ms";
public static final String MIN_RETRY_BACKOFF_MS_CONFIG = "retry.backoff.ms";
public static final String MAX_RETRY_BACKOFF_MS_CONFIG = "max.retry.backoff.ms";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we might not strike the best compromise by introducing another config property and also not renaming the old one (wouldn't be great the break compatibility of course).

Is max.retry.backoff.ms all that necessary? The users will have to figure out that this is a global max (including the backoffs with the given number of retries). And maybe that's not obvious or it becomes a bit tedious to get it right.

Why not stay with the maximum that is implied by the retry.backoff.ms, max.retries and the fact that the backoff is exponential (at a power of 2)? Also max can't have an adjustable default (relative to what min will be). Thus it's destined to require adjustment every time someone changes min.

I feel we don't gain much by introducing it. Just a thought

++retryAttempt;
long sleepTimeMs = computeRetryWaitTimeInMillis(retryAttempt, minRetryBackoffMs, maxRetryBackoffMs);
if (log.isDebugEnabled()) {
log.debug("Failed to execute batch {} of {} records with attempt {}/{}, will attempt retry after {} ms",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit odd that, depending on the log level, we change what this message is: a WARN or a DEBUG message. Shouldn't it be a WARN always? Should we just print the stack track in a separate debug message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that currently the log message outputs the stack trace every time the connector fails and then retries. This was an attempt to change the behavior, but it's more complex than it needs to be.

for (int retryAttempt=1; retryAttempt < retryAttempts; ++retryAttempt) {
long result = BulkProcessor.computeRetryWaitTimeInMillis(retryAttempt, minRetryBackoffMs, maxRetryBackoffMs);
if (maxRetryBackoffMs > 0) assertTrue(result <= maxRetryBackoffMs);
assertTrue(result >= minRetryBackoffMs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's somewhere here that we should check that the wait is truly exponential.

}

protected void assertComputeRetryInRange(int retryAttempts, long minRetryBackoffMs, long maxRetryBackoffMs) {
for (int retryAttempt=1; retryAttempt < retryAttempts; ++retryAttempt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the = that I would expect checkstyle to catch.

@rhauch
Copy link
Member Author

rhauch commented Sep 6, 2017

Superseded by #116

@rhauch rhauch closed this Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants