Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmagent: remote write respect Retry-After in header #6124

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jiekun
Copy link
Contributor

@jiekun jiekun commented Apr 17, 2024

Describe Your Changes

related issue: #6097

Changed

  • Remote write retry policy in vmagent is changed into:
    1. Calculate next retry duration by the backoff policy (x2) and the max retry duration limit (1m). (Current situation)
    2. If Retry-After exists in the response header, use max(Retry-After duration, next retry duration), which may exceed the max retry duration limit (1m). (Added in this PR)

Docs

  • CHANGELOG.md.

Note regarding backoff and Retry-After

  • If Retry-After is lower than 1s (which is the initial value of backoff policy retry), it will be ignored.
  • If Retry-After is lower than 1m, the next retry will still act like the current backoff policy, multiplied by 2 until reaching 1m.
  • If Retry-After is higher than 1m, the next retry always uses the duration retrieved from Retry-After.

This modification will reduce the number of retries when Retry-After is greater than 1m. When Retry-After is less than 1m, it will behave like the existing back-off policy (with a different initial retry duration).

Also see the test cases for more info.


Checklist

The following checks are mandatory:

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 58.32%. Comparing base (8aaa828) to head (850ad71).
Report is 748 commits behind head on master.

Files Patch % Lines
app/vmagent/remotewrite/client.go 82.35% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6124      +/-   ##
==========================================
- Coverage   60.37%   58.32%   -2.06%     
==========================================
  Files         411      609     +198     
  Lines       76609    82852    +6243     
==========================================
+ Hits        46253    48322    +2069     
- Misses      27794    31311    +3517     
- Partials     2562     3219     +657     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

var retryAfterDuration time.Duration

// Retry-After could be in "Mon, 02 Jan 2006 15:04:05 GMT" format.
parsedTime, err := time.Parse(http.TimeFormat, retryAfterString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think header parsing should be a separate function. It can be tested separately and calculateRetryDuration will be simplified signifacntly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now seperated from calculateRetryDuration and called in sendBlockHTTP

// testFunc evaluate if the result of calculateRetryDuration is
// 1. >= expectMinDuration
// 2. <= expectMinDuration + 10% (see timeutil.AddJitterToDuration)
testFunc := func(name string, retryAfterString string, retryDuration, expectMinDuration time.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about emulating retryDuration persistence in this test? This would make it similar to real use case - consecutive calls to calculateRetryDuration slowly increase the retryDuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this idea, and it brings up some scenarios that I had already forgotten.

Please review the test cases in the Call calculateRetryDuration for multiple times block.


// calculateRetryAfterDuration calculate the retry duration.
// 1. Calculate next retry duration by backoff policy (x2) and max retry duration limit.
// 2. If Retry-After exists in response header, use max(Retry-After duration, next retry duration).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 2. If Retry-After exists in response header, use max(Retry-After duration, next retry duration).

This function doesn't know about this header. This info needs to be placed elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have simplified it and removed unrelated content.

@jiekun jiekun force-pushed the feature/respect-retry-after branch from cea4491 to 3a6c7bd Compare May 22, 2024 14:10
@zekker6 zekker6 requested a review from hagen1778 May 24, 2024 14:54
@jiekun jiekun force-pushed the feature/respect-retry-after branch from 19accfd to 850ad71 Compare May 29, 2024 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants