Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws/retry: Retry token post request attempt not being cleared #1413

Closed
jasdel opened this issue Sep 10, 2021 · 9 comments · Fixed by #1560
Closed

aws/retry: Retry token post request attempt not being cleared #1413

jasdel opened this issue Sep 10, 2021 · 9 comments · Fixed by #1560
Labels
investigating This issue is being investigated and/or work is in progress to resolve the issue.

Comments

@jasdel
Copy link
Contributor

jasdel commented Sep 10, 2021

Noticed this relRetryToken value being returned, but the value is never used again, nor released. This potentially could lead to token leaking preventing subsequent retry attempts being made. This needs more investigation, and cleanup to correctly handle the returned retry token for the attempt.

relRetryToken, reqErr := r.retryer.GetRetryToken(ctx, err)
if reqErr != nil {
return out, attemptResult, reqErr
}

@KaibaLopez KaibaLopez added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Sep 14, 2021
@Skarlso
Copy link

Skarlso commented Sep 17, 2021

I'll take a look if people don't mind. :)

@Skarlso
Copy link

Skarlso commented Sep 17, 2021

Yeah, there are some unused things in the test too. Especially the mockhandler and retryProvider. So there is a gap in testing as well.

In fact the retry errors are not tested... I suspect there might be a bug in there. I'll increase the coverage and fix the unused releaser.

@Skarlso
Copy link

Skarlso commented Sep 18, 2021

Okay, so this is what I found....

This is a tricky situation and please correct me if I'm wrong, but... looking at this code:

	if err == nil {
		return out, attemptResult, err
	}

	retryable := r.retryer.IsErrorRetryable(err)
	if !retryable {
		r.logf(logger, logging.Debug, "request failed with unretryable error %v", err)
		return out, attemptResult, err
	}

	// set retryable to true
	attemptResult.Retryable = true

	if maxAttempts > 0 && attemptNum >= maxAttempts {
		r.logf(logger, logging.Debug, "max retry attempts exhausted, max %d", maxAttempts)
		err = &MaxAttemptsError{
			Attempt: attemptNum,
			Err:     err,
		}
		return out, attemptResult, err
	}

	relRetryToken, reqErr := r.retryer.GetRetryToken(ctx, err)
	if reqErr != nil {
		return out, attemptResult, reqErr
	}

It looks like, this whole thing stops if err == nil. And the release function is a no-op in case of errors.

func (f releaseToken) release(err error) error {
	if err != nil {
		return nil
	}

	return f()
}

Which means, If I see this function correctly, calling the release there after that getToken call basically won't do anything ever. Because it wouldn't even get to this code segment if there was no error. And in the begin, once this function starts again, the same release is called.

I'll write some more tests however to prove that there is some kind of problem. I wrote a small test already which fiddles with this code path. Still, I might be incorrect, but please double check.

Cheers!

@Skarlso
Copy link

Skarlso commented Sep 18, 2021

Ok, confirmed that never the less, it's possible that because the retry action does take the capacity and there is a new try, the initial token releases only 1.

I made an initial attempt at fixing this particular problem where when the tokens are all taken up and a new Handling is tried, it will exhaust all capacities because the retry's release was not called.

@Skarlso
Copy link

Skarlso commented Sep 18, 2021

Attached initial PR for fixing the releasing of the taken capacity after a successful retry.

@jasdel
Copy link
Contributor Author

jasdel commented Sep 20, 2021

Thanks for taking a look at this, and creating a PR @Skarlso. We'll review that PR and post feedback.

@TomerHeber
Copy link

TomerHeber commented Dec 8, 2021

I have run into the same issue. (And was about to open an issue).
As a workaround, I've created my own RateLimiter.

func newCustomRateLimiter() retry.RateLimiter {
	return &customRateLimiter{ratelimit.NewTokenRateLimit(5000)}
}

func (c *customRateLimiter) GetToken(ctx context.Context, cost uint) (releaseToken func() error, err error) {
	b := backoff.NewExponentialBackOff()
	b.MaxInterval = time.Minute * 10
	b.InitialInterval = time.Second * 5
	b.RandomizationFactor = 0.5
	b.Multiplier = 2
	t := backoff.NewTicker(b)
	defer t.Stop()

	for {
		f, err := c.rt.GetToken(ctx, cost)
		if err == nil {
			return f, nil
		}
		c.rt.AddTokens(retry.DefaultRetryCost)
		<-t.C
	}
}

func (c *customRateLimiter) AddTokens(v uint) error {
	return c.rt.AddTokens(v)
}

	aws_config.WithRetryer(func() aws.Retryer {
			// Retry until cannot retry or successful.
			return retry.AddWithMaxAttempts(retry.NewStandard(func(so *retry.StandardOptions) {
				// Use exponential backoff.
				so.Backoff = retry.NewExponentialJitterBackoff(3 * time.Minute)
				so.RateLimiter = newCustomRateLimiter()
			}), 0)
		}),

Obviously, it's not an ideal solution. But simple enough to get it working for now.

@Skarlso
Copy link

Skarlso commented Dec 8, 2021

Yeah, would be nice if someone reviewed my code for this fix. :)

jasdel added a commit to jasdel/aws-sdk-go-v2 that referenced this issue Jan 14, 2022
Updates the Retry middleware to release the retry token, on subsequent
attempts. This fixes aws#1413, and is based on PR aws#1424.
jasdel added a commit to jasdel/aws-sdk-go-v2 that referenced this issue Jan 14, 2022
Updates the Retry middleware to release the retry token, on subsequent
attempts. This fixes aws#1413, and is based on PR aws#1424.
jasdel added a commit that referenced this issue Jan 14, 2022
Updates the Retry middleware to release the retry token, on subsequent
attempts. This fixes #1413, and is based on PR #1424.
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

jrichard8 pushed a commit to jrichard8/aws-sdk-go-v2 that referenced this issue Feb 14, 2022
Updates the Retry middleware to release the retry token, on subsequent
attempts. This fixes aws#1413, and is based on PR aws#1424.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating This issue is being investigated and/or work is in progress to resolve the issue.
Projects
None yet
4 participants