Automatic rate limiting of endpoint API calls #13319

tgraf · 2020-09-29T08:54:19Z

Summary

When Cilium is subject to CPU or memory constraints, it is possible for Cilium to receive too many API calls which results in required work. Eventually, Cilium may never get out of the situation as the backpressure is insufficient. Several parallelization efforts are controller by the number of available cores on the system which can lead to wrong assumptions if Cilium is constrained to 1-2 cores.

API Rate Limiting

The API rate limiting system is capable to enforce both rate limiting
and maximum parallel requests. Instead of enforcing static limits, the
system is capable to automatically adjust rate limits and allowed
parallel requests by comparing the provided estimated processing
duration with the mean processing duration observed.

Usage:

var requestLimiter = rate.NewAPILimiter("myRequest", rate.APILimiterParameters{
	rate.SkipInitial:      5,
	rate.RateLimit:        1.0, // 1 request/s
	rate.ParallelRequests: 2,
}, nil)

func myRequestHandler() error {
	req, err := requestLimiter.Wait(context.Background())
	if err != nil {
		// request timed out whie waiting
		return err
	}
	defer req.Done() // Signal that request has been processed

	// process request ....

	return nil
}

Configuration parameters:

EstimatedProcessingDuration time.Duration

EstimatedProcessingDuration is the estimated duration an API call
will take. This value is used if AutoAdjust is enabled to
automatically adjust rate limits to stay as close as possible to the
estimated processing duration.
AutoAdjust bool

AutoAdjust enables automatic adjustment of the values
ParallelRequests, RateLimit, and RateBurst in order to keep the mean
processing duration close to EstimatedProcessingDuration
MeanOver int

MeanOver is the number of entries to keep in order to calculate the
mean processing and wait duration
ParallelRequests int

ParallelRequests is the parallel requests allowed. If AutoAdjust is
enabled, the value will adjust automatically.
MaxParallelRequests int

MaxParallelRequests is the maximum parallel requests allowed. If
AutoAdjust is enabled, then the ParalelRequests will never grow above
MaxParallelRequests.
MinParallelRequests int

MinParallelRequests is the minimum parallel requests allowed. If
AutoAdjust is enabled, then the ParallelRequests will never fall
below MinParallelRequests.
RateLimit rate.Limit

RateLimit is the initial number of API requests allowed per second.
If AutoAdjust is enabled, the value will adjust automatically.
RateBurst int

RateBurst is the initial allowed burst of API requests allowed. If
AutoAdjust is enabled, the value will adjust automatically.
MinWaitDuration time.Duration

MinWaitDuration is the minimum time an API request always has to wait
before the Wait() function returns an error.
MaxWaitDuration time.Duration

MaxWaitDuration is the maximum time an API request is allowed to wait
before the Wait() function returns an error.

Log bool

Log enables info logging of processed API requests. This should only
be used for low frequency API calls.

Example:

level="info" msg="Processing API request with rate limiter" maxWaitDuration=10ms name=foo parallelRequests=2 subsys=rate uuid=933267c5-01db-11eb-93bb-08002720ea43
level="info" msg="API call has been processed" name=foo processingDuration=10.020573ms subsys=rate totalDuration=10.047051ms uuid=933265c7-01db-11eb-93bb-08002720ea43 waitDurationTotal="18.665µs"
level=warning msg="Not processing API request. Wait duration for maximum parallel requests exceeds maximum" maxWaitDuration=10ms maxWaitDurationParallel=10ms name=foo parallelRequests=2 subsys=rate uuid=933269d2-01db-11eb-93bb-08002720ea43

DelayedAdjustmentFactor float64

DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be
applied to RateBurst and MaxWaitDuration defined as a value between
0.0..1.0. This is used to steer a slower reaction of the RateBurst
and ParallelRequests compared to RateLimit.
SkipInitial int

SkipInitial is the number of API calls to skip before applying rate
limiting. This is useful to define a learning phase in the beginning
to allow for auto adjustment before imposing wait durations on API
calls.
MaxAdjustmentFactor float64

MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust
is enabled. Base values will not adjust more than by this factor.

The configuration of API rate limiters is typically provided as-code to
establish defaults. A string based configuration option can then be used
to adjust defaults. This allows to expose configuration of rate limiting
using a single option flag:

l, err = NewAPILimiterSet(map[string]string{
	"foo": "rate-limit:2/m,rate-burst:2",
}, map[string]APILimiterParameters{
	"foo": {
		RateLimit:  rate.Limit(1.0 / 60.0),
		AutoAdjust: true,
	},
}, nil)

New Default API Rate Limits

API Call	Limit	Burst	Max Parallel	Min Parallel	Max Wait Duration	Auto Adjust	Estimated Processing Duration
PUT /endpoint/{id}	0.5/s	4	4		15s	True	2s
DELETE /endpoint/{id}			4	4		True	200ms
GET /endpoint/{id}/*	4/s	4	4	2	10s	True	200ms
PATCH /endpoint/{id}*	0.5/s	4	4		15s	True	1s
GET /endpoint	1/s	4	2	2		True	300ms

tgraf · 2020-09-29T11:50:53Z

test-me-please

tgraf · 2020-09-29T19:51:43Z

test-me-please

pkg/rate/api_limiter.go

tklauser

Some small nits while skimming through the implementation. Great documentation - in godoc as well as the Cilium docs!

Though a few things around the estimated duration I was wondering about and didn't find an answer:

How were the estimated process durations determined?
How should a potential future user of this API estimate the duration? Might make sense to add instructions to the docs?
What happens if the estimation is off significantly?

daemon/cmd/api_limits.go

Documentation/configuration/api-rate-limiting.rst

pkg/rate/api_limiter.go

PinchasLev · 2020-09-30T14:31:43Z

daemon/cmd/endpoint.go

@@ -137,11 +148,19 @@ func NewGetEndpointIDHandler(d *Daemon) GetEndpointIDHandler {
 func (h *getEndpointID) Handle(params GetEndpointIDParams) middleware.Responder {
 	log.WithField(logfields.EndpointID, params.ID).Debug("GET /endpoint/{id} request")

+	r, err := h.d.apiLimiterSet.Wait(params.HTTPRequest.Context(), apiRequestEndpointGet)
+	if err != nil {
+		return api.Error(responseTooManyRequests, err)


@tgraf IIUC there are primarily two paths for a failed API request: 1. the request exceeds the maximum wait time, 2. the request exceeds the rate limiting settings. The latter will return immediately with a return code of responseTooManyRequests (429), whereas the former will return with another failure.

My questions, as a user are: does Cilium somehow update the status on the pod to reflect the cause of the error? Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests? Again, my understanding is that the user has two options: either to retry after a short amount of time, or to adjust the rate-limiting configuration and then retry immediately. My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?

does Cilium somehow update the status on the pod to reflect the cause of the error?

The error will surface in kubectl describe pod and kubectl get events so you will see that CNI creation failed because of too many requests. It should give you better visibility than before instead of kubelet/runtime timing out and retrying silently.

Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests?

In general it should resolve itself automatically as kubelet will continue to retry. It will avoid the dangerous CNI ADD / CNI DEL cycle after which some resources have already been invested and then have to be undone.

Again, my understanding is that the user has two options: either to retry after a short amount of time,

kubelet will take care of this. The difference will be that it is done with visibility and it fails upfront without an investment of CPU resources already committed.

or to adjust the rate-limiting configuration and then retry immediately.

Adjustment of rate-limiting configuration should only be needed if the automatic adjustment doesn't allow for the required number of pod creations (or other API calls) even if low processing durations are observed. If enough resources are available, the default rate limits will automatically be relaxed further. An example where things could not work out well automatically: if an API call is estimated to take 2 seconds with 4 parallel requests but it only takes 0.5 seconds and you really need 8 parallel requests at all times. The auto adjustment won't allow for this and manual interaction is required to adjust the limits.

Another example is if the endpoint creation is much slower than expected, let's say it takes 10s instead of 2s. Then only one parallel endpoint request will be allowed as ParallelRequests will be shrunk down automatically.

Based on the metrics, it should be possible to tell immediately whether automatically adjusted limits are reasonable or not. We will be happy to optimize with you.

My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?

Agreed. The user should not really see this. Similar to how a k8s user typically also doesn't see if the k8s apiserver returns 429. It does. Retries are done automatically. If too many retries are needed, the platform operator has to step in and resize the VM, adjust the limits, and so on. It will be the same with these limits. Observation should be done via the metrics so you see the mean waiting duration, if that gets out of hand, you can check why it is the case and determine whether to adjust the limits or fix resource constraints.

tgraf · 2020-09-30T15:00:51Z

test-me-please

christarazi

LGTM! A few minor comments.

pkg/rate/api_limiter.go

The API rate limiting system is capable to enforce both rate limiting and maximum parallel requests. Instead of enforcing static limits, the system is capable to automatically adjust rate limits and allowed parallel requests by comparing the provided estimated processing duration with the mean processing duration observed. Usage: ``` var requestLimiter = rate.NewAPILimiter("myRequest", rate.APILimiterParameters{ rate.SkipInitial: 5, rate.RateLimit: 1.0, // 1 request/s rate.ParallelRequests: 2, }, nil) func myRequestHandler() error { req, err := requestLimiter.Wait(context.Background()) if err != nil { // request timed out whie waiting return err } defer req.Done() // Signal that request has been processed // process request .... return nil } ``` Configuration parameters: - EstimatedProcessingDuration time.Duration EstimatedProcessingDuration is the estimated duration an API call will take. This value is used if AutoAdjust is enabled to automatically adjust rate limits to stay as close as possible to the estimated processing duration. - AutoAdjust bool AutoAdjust enables automatic adjustment of the values ParallelRequests, RateLimit, and RateBurst in order to keep the mean processing duration close to EstimatedProcessingDuration - MeanOver int MeanOver is the number of entries to keep in order to calculate the mean processing and wait duration - ParallelRequests int ParallelRequests is the parallel requests allowed. If AutoAdjust is enabled, the value will adjust automatically. - MaxParallelRequests int MaxParallelRequests is the maximum parallel requests allowed. If AutoAdjust is enabled, then the ParalelRequests will never grow above MaxParallelRequests. - MinParallelRequests int MinParallelRequests is the minimum parallel requests allowed. If AutoAdjust is enabled, then the ParallelRequests will never fall below MinParallelRequests. - RateLimit rate.Limit RateLimit is the initial number of API requests allowed per second. If AutoAdjust is enabled, the value will adjust automatically. - RateBurst int RateBurst is the initial allowed burst of API requests allowed. If AutoAdjust is enabled, the value will adjust automatically. - MinWaitDuration time.Duration MinWaitDuration is the minimum time an API request always has to wait before the Wait() function returns an error. - MaxWaitDuration time.Duration MaxWaitDuration is the maximum time an API request is allowed to wait before the Wait() function returns an error. - Log bool Log enables info logging of processed API requests. This should only be used for low frequency API calls. Example: ``` level="info" msg="Processing API request with rate limiter" maxWaitDuration=10ms name=foo parallelRequests=2 subsys=rate uuid=933267c5-01db-11eb-93bb-08002720ea43 level="info" msg="API call has been processed" name=foo processingDuration=10.020573ms subsys=rate totalDuration=10.047051ms uuid=933265c7-01db-11eb-93bb-08002720ea43 waitDurationTotal="18.665µs" level=warning msg="Not processing API request. Wait duration for maximum parallel requests exceeds maximum" maxWaitDuration=10ms maxWaitDurationParallel=10ms name=foo parallelRequests=2 subsys=rate uuid=933269d2-01db-11eb-93bb-08002720ea43 ``` - DelayedAdjustmentFactor float64 DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be applied to RateBurst and MaxWaitDuration defined as a value between 0.0..1.0. This is used to steer a slower reaction of the RateBurst and ParallelRequests compared to RateLimit. - SkipInitial int SkipInitial is the number of API calls to skip before applying rate limiting. This is useful to define a learning phase in the beginning to allow for auto adjustment before imposing wait durations on API calls. - MaxAdjustmentFactor float64 MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust is enabled. Base values will not adjust more than by this factor. The configuration of API rate limiters is typically provided as-code to establish defaults. A string based configuration option can then be used to adjust defaults. This allows to expose configuration of rate limiting using a single option flag: ``` go l, err = NewAPILimiterSet(map[string]string{ "foo": "rate-limit:2/m,rate-burst:2", }, map[string]APILimiterParameters{ "foo": { RateLimit: rate.Limit(1.0 / 60.0), AutoAdjust: true, }, }, nil) ``` Signed-off-by: Thomas Graf <thomas@cilium.io>

Add default rate limiting for all endpoint related API calls with automatic adjustment based on estimated processing duration. Metrics are provided to monitor the rate limiting system: ``` cilium_api_limiter_adjustment_factor api_call="endpoint-create" 0.695787 cilium_api_limiter_processed_requests_total api_call="endpoint-create" outcome="success" 7.000000 cilium_api_limiter_processing_duration_seconds api_call="endpoint-create" value="estimated" 2.000000 cilium_api_limiter_processing_duration_seconds api_call="endpoint-create" value="mean" 2.874443 cilium_api_limiter_rate_limit api_call="endpoint-create" value="burst" 4.000000 cilium_api_limiter_rate_limit api_call="endpoint-create" value="limit" 0.347894 cilium_api_limiter_requests_in_flight api_call="endpoint-create" value="in-flight" 0.000000 cilium_api_limiter_requests_in_flight api_call="endpoint-create" value="limit" 0.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="max" 15.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="mean" 0.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="min" 0.000000 ``` Signed-off-by: Thomas Graf <thomas@cilium.io>

tgraf · 2020-10-02T13:58:21Z

Changes made:

The adjustment factor is now applied to the base value instead of the latest calculation. This resolves an issue with continuous correction when the estimated duration cannot be achieved. With this, the default delayed adjustment has been changed from 0.25 to 0.5 due to slower correction.
The SkipInitial logic now applies to scheduled requests and not processed requests to only skip N requests instead of waiting for N requests to complete. If many requests got scheduled quickly, the two values could deviate a lot.
The documentation now mentions the default values

Signed-off-by: Thomas Graf <thomas@cilium.io>

tgraf · 2020-10-02T14:32:57Z

test-me-please

tgraf added kind/bug This is a bug in the Cilium logic. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 29, 2020

tgraf requested a review from a team September 29, 2020 08:54

tgraf requested review from a team as code owners September 29, 2020 08:54

tgraf marked this pull request as draft September 29, 2020 08:54

tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 4768bd3 to 15ee965 Compare September 29, 2020 09:42

tgraf marked this pull request as ready for review September 29, 2020 19:50

tgraf requested a review from a team as a code owner September 29, 2020 19:50

tgraf added priority/high This is considered vital to an upcoming release. priority/release-blocker labels Sep 29, 2020

maintainer-s-little-helper bot added this to Needs backport from master in 1.8.4 Sep 29, 2020

tgraf added the needs-backport/1.7 label Sep 29, 2020

maintainer-s-little-helper bot added this to Needs backport from master in 1.7.10 Sep 29, 2020

nebril approved these changes Sep 30, 2020

View reviewed changes

pkg/rate/api_limiter.go Outdated Show resolved Hide resolved

tklauser approved these changes Sep 30, 2020

View reviewed changes

daemon/cmd/api_limits.go Outdated Show resolved Hide resolved

Documentation/configuration/api-rate-limiting.rst Show resolved Hide resolved

pkg/rate/api_limiter.go Show resolved Hide resolved

PinchasLev reviewed Sep 30, 2020

View reviewed changes

tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from ed1a400 to dc7beb0 Compare September 30, 2020 14:59

tgraf changed the title ~~API rate limiting~~ Automatic rate limiting of endpoint API calls Sep 30, 2020

joestringer removed this from Needs backport from master in 1.8.4 Sep 30, 2020

joestringer removed this from Needs backport from master in 1.7.10 Sep 30, 2020

joestringer added this to Needs backport from master in 1.7.11 Sep 30, 2020

joestringer added this to Needs backport from master in 1.8.5 Sep 30, 2020

christarazi reviewed Oct 1, 2020

View reviewed changes

pkg/rate/api_limiter.go Outdated Show resolved Hide resolved

pkg/rate/api_limiter.go Outdated Show resolved Hide resolved

tgraf added 2 commits October 2, 2020 15:55

tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 6e384e1 to c598865 Compare October 2, 2020 13:56

tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from c598865 to 0906012 Compare October 2, 2020 13:59

doc: Document API rate limiting

ec0f695

Signed-off-by: Thomas Graf <thomas@cilium.io>

tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 0906012 to ec0f695 Compare October 2, 2020 14:04

tgraf merged commit d315ec3 into master Oct 2, 2020

tgraf deleted the pr/tgraf/endpoint-create-rate-limit branch October 2, 2020 16:43

christarazi mentioned this pull request Oct 2, 2020

v1.7 backports 2020-10-02 #13392

Merged

christarazi added backport-pending/1.7 and removed needs-backport/1.7 labels Oct 2, 2020

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.11 Oct 2, 2020

qmonnet mentioned this pull request Oct 6, 2020

v1.8 backports 2020-10-06 #13421

Merged

qmonnet added backport-pending/1.8 and removed needs-backport/1.8 labels Oct 6, 2020

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.5 Oct 6, 2020

qmonnet added backport-done/1.8 and removed backport-pending/1.8 labels Oct 7, 2020

maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.5 Oct 7, 2020

qmonnet added backport-done/1.7 and removed backport-pending/1.7 labels Oct 7, 2020

maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.11 Oct 7, 2020

tgraf mentioned this pull request Oct 8, 2020

1.7 backports 2020-10-08 - API rate limiting fixes #13477

Merged

This was referenced Oct 28, 2020

Prepare for release v1.7.11 #13790

Merged

Prepare for release v1.8.5 #13803

Merged

pchaigno added release-blocker/1.7 and removed priority/release-blocker labels Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic rate limiting of endpoint API calls #13319

Automatic rate limiting of endpoint API calls #13319

tgraf commented Sep 29, 2020 •

edited

Loading

tgraf commented Sep 29, 2020

tgraf commented Sep 29, 2020

tklauser left a comment

PinchasLev Sep 30, 2020 •

edited

Loading

tgraf Sep 30, 2020

tgraf commented Sep 30, 2020

christarazi left a comment •

edited

Loading

tgraf commented Oct 2, 2020

tgraf commented Oct 2, 2020

Automatic rate limiting of endpoint API calls #13319

Automatic rate limiting of endpoint API calls #13319

Conversation

tgraf commented Sep 29, 2020 • edited Loading

Summary

API Rate Limiting

New Default API Rate Limits

tgraf commented Sep 29, 2020

tgraf commented Sep 29, 2020

tklauser left a comment

Choose a reason for hiding this comment

PinchasLev Sep 30, 2020 • edited Loading

Choose a reason for hiding this comment

tgraf Sep 30, 2020

Choose a reason for hiding this comment

tgraf commented Sep 30, 2020

christarazi left a comment • edited Loading

Choose a reason for hiding this comment

tgraf commented Oct 2, 2020

tgraf commented Oct 2, 2020

tgraf commented Sep 29, 2020 •

edited

Loading

PinchasLev Sep 30, 2020 •

edited

Loading

christarazi left a comment •

edited

Loading