Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7 backports 2020-10-08 - API rate limiting fixes #13477

Merged
merged 4 commits into from
Oct 9, 2020

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Oct 8, 2020

#13450 -- Follow-up fixes for the API rate limiter

Once this PR is merged, you can update the PR labels via:

$ for pr in 13450 ; do contrib/backporting/set-labels.py $pr done 1.7; done

[ upstream commit ecb2369 ]

The following log message was duplicated and printed twice for two
different meanings.

Fixes: 3141e65 ("rate: Add API rate limiting system")
Signed-off-by: Thomas Graf <thomas@cilium.io>
[ upstream commit 208c69c ]

So far, the rate limiting has been enforced before enforcing parallel
requests. In theory, this is better because we can return error code
429 earlier to tell callers to slow down.

In practice, many callers will retry immediately anyway which means that
all the limiting will happen by the rate limiter. The rate limiter
relies on reservations that need to be canceled. If too many requests
happen in parallel, reservations can't be canceled quickly enough.

By swapping the enforcement of parallel requests with the rate limiter,
all requests will block for at least MaxWaitDuration if more than the
allowed number of parallel requests are pending which will naturally
pace the callers.

Swapping the enforcement order requires the acquired semaphore to be
released in error cases of the rate limiter. This requires to change the
structure of Wait() to have a single error handling structure. By
reusing finishRequest(), the metrics handler has to be adjusted slightly
to account for new outcomes as it now bumps the metric for canceled
requests as well. What remains unchanged is that only successful API
requests are used to calculate the mean processing duration.

Fixes: 3141e65 ("rate: Add API rate limiting system")
Signed-off-by: Thomas Graf <thomas@cilium.io>
[ upstream commit 75159c6 ]

Simulate stress by feeding many parallel requests through the API rate
limiter. The maximum wait duration will soon be hit for requests causing
them to fail.

Without the fix, this test would often fail. If it succeeded, the number
of retries is in the range of 60-80K. With the fix, the test succeeds
consistently and retries are in the range of 6-8K. That's a 10x
reduction of retries.

Signed-off-by: Thomas Graf <thomas@cilium.io>
[ upstream commit 269aa1b ]

Test that failing requests will not impact the rate limiter. This is
already working correctly so this test only adds additional coverage.

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf requested a review from a team as a code owner October 8, 2020 22:39
@maintainer-s-little-helper maintainer-s-little-helper bot added the kind/backports This PR provides functionality previously merged into master. label Oct 8, 2020
@tgraf
Copy link
Member Author

tgraf commented Oct 8, 2020

test-backport-1.7

@joestringer
Copy link
Member

joestringer commented Oct 8, 2020

If you don't have the section in the PR description like this then the release notes will not be generated correctly:

```upstream-prs
$ for pr in ...
```

See https://docs.cilium.io/en/latest/contributing/release/backports/#via-github-web-interface for more details.

@christarazi
Copy link
Member

test-upstream-k8s

@tgraf tgraf merged commit 55f0c5a into v1.7 Oct 9, 2020
@tgraf tgraf deleted the pr/backport/1.7-ratelimit branch October 9, 2020 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants