-
Notifications
You must be signed in to change notification settings - Fork 25
fix: Clean up usage retry logic #1950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mnorbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some great clean up!
| req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token)) | ||
| return nil | ||
| }), | ||
| cqapi.WithHTTPClient(retryClient.StandardClient()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the default cqapi.NewClientWithResponses does anyway, we just set the retry params beforehand.
|
|
||
| if err := u.updateUsageWithRetryAndBackoff(ctx, totals, tables); err != nil { | ||
| log.Warn().Err(err).Msg("failed to update usage") | ||
| u.logger.Warn().Err(err).Msg("failed to update usage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up some "untethered" logger usage.
premium/usage.go
Outdated
| } | ||
|
|
||
| retryDuration, err := u.calculateRetryDuration(resp.StatusCode(), resp.HTTPResponse.Header, queryStartTime, retry) | ||
| retryDuration, err := u.calculateMarketplaceRetryDuration(statusCode, queryStartTime, retry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS client should have its own retry mechanism inside anyway - which won't help us as we want the usage timestamp to point to the next second and not stay in the past.
marianogappa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find anything wrong with the PR so I put some boring nits there. I think it boils down to testing it in practice and rubber stamping it with an approval if it works. Would you mind putting some evidence syncs in the description?
premium/usage.go
Outdated
| payload.Tables = &tables | ||
| var de *types.DuplicateRequestException | ||
| if errors.As(lastErr, &de) { | ||
| jitter := time.Duration(rand.Intn(1000)) * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this already used in this code. Wouldn't it make more sense for the jitter max to be proportional to the backoff, say 5% of it, instead of hardcoded to 1 second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the backoff is constant at 1 sec, due to how we want the data to be registered to the next second (to prevent DuplicateRequestException, which happens because presumably we're running multiple instances and everybody's reporting their usage at the same second)
Because of this, the jitter max value can be increased to 2sec or more (as we only care about 1-second blocks...), but having it lower also makes it sense that it would finish quicker, with more retries, but we have plenty of retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other part where this code lives is for the "regular" retries (not DuplicateRequestException) which is implemented on top of the (presumed already existing) AWS SDK backoff mechanisms. We could remove that cases and set up the AWS SDK backoff params instead, but I'm not an expert on that. This also protects against similar errors (if any that we've potentially missed, or is undocumented) that is not covered by the errors.As.
mnorbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking good 🚀
|
@marianogappa I don't know how to test this on AWS marketplace, but I've also added a marketplace "DuplicateRequest" test (as well as the addition of retry logic tests) |
| func WithMaxRetries(maxRetries int) UsageClientOptions { | ||
| return func(updater *BatchUpdater) { | ||
| updater.maxRetries = maxRetries | ||
| if maxRetries > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that retries can never be turned off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxRetries = 1 would be "single request, no retries" but I didn't want to be pedantic about the loop and the descriptions.
🤖 I have created a release *beep* *boop* --- ## [4.68.0](v4.67.1...v4.68.0) (2024-10-31) ### Features * Add Time configtype ([#1905](#1905)) ([f57c3eb](f57c3eb)) * Support for quota query interval header ([#1948](#1948)) ([bfce6fe](bfce6fe)) * Test `MeterUsage` API call on initial setup of client ([#1906](#1906)) ([78df77d](78df77d)) ### Bug Fixes * Clean up usage retry logic ([#1950](#1950)) ([ca982f9](ca982f9)) * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.25.0 ([#1946](#1946)) ([b8e3e10](b8e3e10)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
I may have used the term "clean up" loosely.